From kxu at openjdk.org Tue Oct 1 02:12:18 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 1 Oct 2024 02:12:18 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: - extract pattern matching to separate functions - WIP: extract pattern matching to separate functions - WIP: refactor as suggested by review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/0de4feea..6e65e13f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=05-06 Stats: 171 lines in 2 files changed: 44 ins; 54 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From kxu at openjdk.org Tue Oct 1 02:12:21 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 1 Oct 2024 02:12:21 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v6] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 13:33:52 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: >> >> - resolve conflicts >> - resolve conflicts >> - Arithmetic canonicalization v3 (#3) >> >> * 8340144: C1: remove unused Compilation::_max_spills >> >> Reviewed-by: thartmann, shade >> >> * 8340176: Replace usage of -noclassgc with -Xnoclassgc in test/jdk/java/lang/management/MemoryMXBean/LowMemoryTest2.java >> >> Reviewed-by: kevinw, lmesnik >> >> * 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX >> >> Reviewed-by: kvn, thartmann, sviswanathan >> >> * 8340184: Bug in CompressedKlassPointers::is_in_encodable_range >> >> Reviewed-by: coleenp, rkennke, jsjolen >> >> * 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() >> >> Reviewed-by: roland, chagedorn, jkarthikeyan >> >> * 8340119: Remove oopDesc::size_might_change() >> >> Reviewed-by: stefank, iwalulya >> >> * 8340009: Improve the output from assert_different_registers >> >> Reviewed-by: aboldtch, dholmes, shade, mli >> >> * 8340273: Remove CounterHalfLifeTime >> >> Reviewed-by: chagedorn, dholmes >> >> * 8338566: Lazy creation of exception instances is not thread safe >> >> Reviewed-by: shade, kvn, dlong >> >> * 8339648: ZGC: Division by zero in rule_major_allocation_rate >> >> Reviewed-by: aboldtch, lucy, tschatzl >> >> * 8329816: Add SLEEF version 3.6.1 >> >> Reviewed-by: erikj, mli, luhenry >> >> * 8315273: (fs) Path.toRealPath(LinkOption.NOFOLLOW_LINKS) fails when "../../" follows a link (win) >> >> Reviewed-by: djelinski >> >> * 8339574: Behavior of File.is{Directory,File,Hidden} is not documented with respect to symlinks >> >> Reviewed-by: djelinski, alanb >> >> * 8340200: Misspelled constant `AttributesProcessingOption.DROP_UNSTABLE_ATRIBUTES` >> >> Reviewed-by: liach >> >> * 8339934: Simplify Math.scalb(double) method >> >> Reviewed-by: darcy >> >> * 8339790: Support Intel APX setzucc instruction >> >> Reviewed-by: sviswanathan, jkarthikeyan, kvn >> >> * 8340323: Test jdk/classfile/OptionsTest.java fails after JDK-8340200 >> >> Reviewed-by: alanb >> >> * 8338686: App classpath mismatch if a jar from the Class-Path attribute is on the classpath >> >> Reviewed-by: dholmes, iklam >> >> * 8337563: NMT: rename MEMFLAGS to MemTag >> >> ... > > src/hotspot/share/opto/addnode.cpp line 422: > >> 420: // Convert (a + a) + a to 3 * a >> 421: // Look for LHS pattern: AddNode(a, a) >> 422: if (in1_op == Op_Add(bt) && in1->in(1) == in1->in(2)) { > > It seems each of the if blocks in this method could be its own method that returns true and `multiplier` (passed by reference, I suppose) if pattern matching succeeds. Refactored to do so. Thanks for the input! > src/hotspot/share/opto/addnode.cpp line 487: > >> 485: // AddNode(LShiftNode(a, CON1), LShiftNode(a, CON2)/a) >> 486: // AddNode(LShiftNode(a, CON1)/a, LShiftNode(a, CON2)) >> 487: for (int i = 0; i < 2; i++) { > > I wouldn't use a loop here. I would put the loop body into its own method and call it twice, once with `lhs`, `lhs_base` as arguments, once with `rhs`, `rhs_base`. I refactored even further to combine checking for optimized `mul`s and extracting multipliers to use the same logic. This code is now obsolete. > src/hotspot/share/opto/addnode.cpp line 540: > >> 538: >> 539: PhaseIterGVN* igvn = phase->is_IterGVN(); >> 540: if (igvn != nullptr) { > > Why do you need that? > I think it's fine to return a new node from Ideal. You are right. This is leftover code from last version. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782026217 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782026033 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782025208 From jbhateja at openjdk.org Tue Oct 1 05:09:25 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Oct 2024 05:09:25 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v19] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Merge stashing and re-commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/28b29bc6..952920ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=17-18 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From roland at openjdk.org Tue Oct 1 07:23:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 07:23:42 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 02:12:18 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: > > - extract pattern matching to separate functions > - WIP: extract pattern matching to separate functions > - WIP: refactor as suggested by review Thanks for making the changes. It's easier to follow the various steps the way it is now. src/hotspot/share/opto/addnode.cpp line 409: > 407: // Convert a + a + ... + a into a*n > 408: Node* AddNode::convert_serial_additions(PhaseGVN* phase, bool can_reshape, BasicType bt) { > 409: if (find_power_of_two_addition_pattern(this, bt, nullptr) != nullptr) { Can you a comment that explain the need for this (what you replied in the PR comment essentially)? src/hotspot/share/opto/addnode.cpp line 498: > 496: > 497: // swap LShiftNode to lhs for easier matching > 498: if (!lhs->is_LShift()) { Can you use `Op_LShift(bt)` here? src/hotspot/share/opto/addnode.cpp line 503: > 501: > 502: // AddNode(LShiftNode(a, CON), *)? > 503: if (!lhs->is_LShift() || !lhs->in(2)->is_Con()) { Same here. src/hotspot/share/opto/addnode.cpp line 527: > 525: > 526: // AddNode(LShiftNode(a, CON), LShiftNode(a, CON2))? > 527: if (rhs->is_LShift() && lhs->in(1) == rhs->in(1) && rhs->in(2)->is_Con()) { same here. src/hotspot/share/opto/addnode.cpp line 549: > 547: Node* AddNode::find_power_of_two_subtraction_pattern(Node* n, BasicType bt, jlong* multiplier) { > 548: // Look for pattern: SubNode(LShiftNode(a, CON), a) > 549: if (n->Opcode() == Op_Sub(bt) && n->in(1)->is_LShift() && n->in(1)->in(2)->is_Con()) { same here. ------------- PR Review: https://git.openjdk.org/jdk/pull/20754#pullrequestreview-2339315520 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782238602 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782239220 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782239478 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782239740 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782239936 From dnsimon at openjdk.org Tue Oct 1 08:03:47 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 08:03:47 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 06:05:15 GMT, Doug Simon wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava Closing this so @tzezula can open a new one for the same issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21171#issuecomment-2385061953 From dnsimon at openjdk.org Tue Oct 1 08:03:48 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 08:03:48 GMT Subject: Withdrawn: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 22:48:00 GMT, Doug Simon wrote: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21171 From duke at openjdk.org Tue Oct 1 08:05:42 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 1 Oct 2024 08:05:42 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:47:47 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with two additional commits since the last revision: > > - num_8b_elems_in_vec --> nof_vec_elems > - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2385065561 From duke at openjdk.org Tue Oct 1 08:40:14 2024 From: duke at openjdk.org (Raphael Mosaner) Date: Tue, 1 Oct 2024 08:40:14 GMT Subject: RFR: 8337493: [JVMCI] Number of libgraal threads might be too low Message-ID: The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. ------------- Commit messages: - Use the same number of JVMCI threads as C2 threads per default. Changes: https://git.openjdk.org/jdk/pull/21279/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21279&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337493 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21279/head:pull/21279 PR: https://git.openjdk.org/jdk/pull/21279 From dnsimon at openjdk.org Tue Oct 1 08:47:39 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 08:47:39 GMT Subject: RFR: 8337493: [JVMCI] Number of libgraal threads might be too low In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 08:33:49 GMT, Raphael Mosaner wrote: > The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. > With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. > > This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. LGTM. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21279#pullrequestreview-2339514983 From roland at openjdk.org Tue Oct 1 09:42:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 09:42:37 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert In-Reply-To: <85C55cSsG9xUcs6GV3nCiq-idMJhEOywFN3atCFOO78=.53e95ff8-e9f4-4c86-a97d-5bff2cb009d3@github.com> References: <85C55cSsG9xUcs6GV3nCiq-idMJhEOywFN3atCFOO78=.53e95ff8-e9f4-4c86-a97d-5bff2cb009d3@github.com> Message-ID: <4cGhoDWjhLY3K9PY5CydEPh0mdwDn6EPVMAWQWU4U3M=.60c7cf50-32ca-4bf4-8d53-13c1ae5dabac@github.com> On Mon, 30 Sep 2024 07:02:10 GMT, Tobias Hartmann wrote: >> When converting a `LongCountedLoop` into a loop nest, c2 needs jvm >> state to add predicates to the inner loop. For that, it peels an >> iteration of the loop and uses the state of the safepoint at the end >> of the loop. That's only legal if there's no side effect between the >> safepoint and the backedge that goes back into the loop. The assert >> failure here happens in code that checks that. >> >> That code compares the memory states at the safepoint and at the >> backedge. If they are the same then there's no side effect. To check >> consistency, the `MergeMem` at the safepoint is cloned. As the logic >> iterates over the backedge state, it clears every component of the >> state it encounters from the `MergeMem`. Once done, the cloned >> `MergeMem` should be "empty". In the case of this failure, no side >> effect is found but the cloned `MergeMem` is not empty. That happens >> because of EA: it adds edges to the `MergeMem` at the safepoint that >> it doesn't add to the backedge `Phis`. >> >> So it's the verification code that fails and I propose dealing with >> this by ignoring memory state added by EA in the verification code. > > src/hotspot/share/opto/loopnode.cpp line 708: > >> 706: for (MergeMemStream mms(mem->as_MergeMem()); mms.next_non_empty(); ) { >> 707: // Loop invariant memory state won't be reset by no_side_effect_since_safepoint(). Do it here. >> 708: // Escape Analysis can add state to mm that it doesn't add to the backedge memory Phis, breaking verification > > Where exactly does that happen in EA? When an allocation is non escaping and made scalar replaceable, new slices are allocated for the fields of the allocation and the memory graph is updated so allocation/stores/loads to the new slices are connected together. In the process, `MergeMem` nodes need to be updated as well. In this case, I'm not sure this particular `MergeMem` node needs to be updated by EA but it's harmless in any case. The verification code doesn't expect "more" state to be recorded at the safepoint because of the `MergeMem` than at the backedge. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21009#discussion_r1782460709 From jbhateja at openjdk.org Tue Oct 1 09:51:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Oct 2024 09:51:27 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v14] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/42ca80c5..7327736f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=12-13 Stats: 126 lines in 4 files changed: 60 ins; 65 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Tue Oct 1 09:55:39 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Oct 2024 09:55:39 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Mon, 30 Sep 2024 22:39:09 GMT, Sandhya Viswanathan wrote: >> I think you have to do the masking before conversion - `vec.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle()` is not the same as `vec.toShuffle()` for all inputs. >> >> >> jshell> IntVector indexes = IntVector.fromArray(IntVector.SPECIES_256, new int[] {0, 1, 8, 9, 16, 17, 24, 25}, 0); >> indexes ==> [0, 1, 8, 9, 16, 17, 24, 25] >> >> jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1) >> $19 ==> [0, 1, 8, 9, 0, 1, 8, 9] >> >> jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1).toShuffle() >> $20 ==> Shuffle[0, 1, -8, -7, 0, 1, -8, -7] >> >> jshell> indexes.toShuffle() >> $21 ==> Shuffle[0, 1, -8, -7, -8, -7, -8, -7] > > Thanks for the example. Yes, you have a point there. So we would need to do: > src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); > This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); Yes, this may save additional allocation penalty of result array allocation which may slightly improve fall back performance, but logical operation cannot be directly applied over floating point vectors. so, we will need an explicit conversion to integral vector, which is why I opted for current fallback implementation which is in line with rest of the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1782480053 From duke at openjdk.org Tue Oct 1 11:02:56 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 1 Oct 2024 11:02:56 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread Message-ID: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. ------------- Commit messages: - Using tristate CompilerThread::_can_call_java. - rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava - added CompilerThreadCanCallJavaScope Changes: https://git.openjdk.org/jdk/pull/21285/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340733 Stats: 160 lines in 8 files changed: 134 ins; 2 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From duke at openjdk.org Tue Oct 1 11:14:33 2024 From: duke at openjdk.org (duke) Date: Tue, 1 Oct 2024 11:14:33 GMT Subject: RFR: 8337493: [JVMCI] Number of libgraal threads might be too low In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 08:33:49 GMT, Raphael Mosaner wrote: > The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. > With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. > > This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. @rmosaner Your change (at version 9e0a318831b5df4137104438626f22bb508cbc42) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21279#issuecomment-2385496949 From rcastanedalo at openjdk.org Tue Oct 1 11:24:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Oct 2024 11:24:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 16:56:30 GMT, Vladimir Kozlov wrote: > Good. Thanks, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2385515540 From duke at openjdk.org Tue Oct 1 11:48:39 2024 From: duke at openjdk.org (Raphael Mosaner) Date: Tue, 1 Oct 2024 11:48:39 GMT Subject: Integrated: 8337493: [JVMCI] Number of libgraal threads might be too low In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 08:33:49 GMT, Raphael Mosaner wrote: > The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. > With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. > > This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. This pull request has now been integrated. Changeset: 7cc7c080 Author: Raphael Mosaner Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/7cc7c080b5dbab61914512bf63227944697c0cbe Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8337493: [JVMCI] Number of libgraal threads might be too low Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/21279 From roland at openjdk.org Tue Oct 1 13:22:23 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 13:22:23 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: References: Message-ID: > When converting a `LongCountedLoop` into a loop nest, c2 needs jvm > state to add predicates to the inner loop. For that, it peels an > iteration of the loop and uses the state of the safepoint at the end > of the loop. That's only legal if there's no side effect between the > safepoint and the backedge that goes back into the loop. The assert > failure here happens in code that checks that. > > That code compares the memory states at the safepoint and at the > backedge. If they are the same then there's no side effect. To check > consistency, the `MergeMem` at the safepoint is cloned. As the logic > iterates over the backedge state, it clears every component of the > state it encounters from the `MergeMem`. Once done, the cloned > `MergeMem` should be "empty". In the case of this failure, no side > effect is found but the cloned `MergeMem` is not empty. That happens > because of EA: it adds edges to the `MergeMem` at the safepoint that > it doesn't add to the backedge `Phis`. > > So it's the verification code that fails and I propose dealing with > this by ignoring memory state added by EA in the verification code. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8336702 - test indentation - fix & test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21009/files - new: https://git.openjdk.org/jdk/pull/21009/files/463d6a21..a4263e28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21009&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21009&range=00-01 Stats: 193974 lines in 1550 files changed: 175338 ins; 10446 del; 8190 mod Patch: https://git.openjdk.org/jdk/pull/21009.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21009/head:pull/21009 PR: https://git.openjdk.org/jdk/pull/21009 From roland at openjdk.org Tue Oct 1 13:22:24 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 13:22:24 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: <0sw5s6nN8FKInMD7qNCuBBa4w2uK-FBV505eke63dA4=.1fc70e4e-01e1-4763-ade6-98f841f84b9f@github.com> References: <0sw5s6nN8FKInMD7qNCuBBa4w2uK-FBV505eke63dA4=.1fc70e4e-01e1-4763-ade6-98f841f84b9f@github.com> Message-ID: On Wed, 18 Sep 2024 12:05:57 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8336702 >> - test indentation >> - fix & test > > test/hotspot/jtreg/compiler/longcountedloops/TestSafePointWithEAState.java line 59: > >> 57: float n; >> 58: h(float n) { this.n = n; } >> 59: } > > Java indentation is supposed to be 4 spaces ;) > Adding some explicit brackets would also be nice, but that is more subjective. Right. Fixed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21009#discussion_r1782784463 From yzheng at openjdk.org Tue Oct 1 13:24:10 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 1 Oct 2024 13:24:10 GMT Subject: RFR: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI Message-ID: This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler ------------- Commit messages: - [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI Changes: https://git.openjdk.org/jdk/pull/21287/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21287&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341333 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21287.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21287/head:pull/21287 PR: https://git.openjdk.org/jdk/pull/21287 From roland at openjdk.org Tue Oct 1 13:36:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 13:36:22 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v3] In-Reply-To: References: Message-ID: > The list of interfaces for a `TypeInterfaces` is contained in a `GrowableArray` that's allocated in the type arena. When `hashcons()` deletes a `TypeInterfaces` object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the `GrowableArray`, not the `TypeInterfaces` object. > > This patch changes the array of interfaces stored in `TypeInterfaces` into a pointer to a `GrowableArray`. `TypeInterfaces::make` calls `hashcons` with a temporary copy of the array of interfaces allocated in the current thread's resource area. This way if `hascons` deletes the `TypeInterfaces`, it is the last thing allocated in the type arena and memory can be reclaimed. Memory for the `GrowableArray` is freed as well on return from `TypeInterfaces::make`. If the newly allocated `TypeInterfaces` survives `hashcons` then a permanent array of interfaces is allocated in the type arena and linked from the `TypeInterfaces` object. > > I ran into this issue while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. > > When I opened the PR initially, the fix I proposed was to let `GrowableArray` try to reclaim memory from an arena when destroyed. With that patch, some gtests failed when a `GrowableArray` is created by a copy constructor and the backing store for the array is shared. As a consequence, I reconsidered the fix and thought it was safer to go with a fix that only affects `TypeInterfaces`. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - comment - Merge branch 'master' into JDK-8340824 - more - more - single memory area - Revert "type interfaces footprint" This reverts commit 43e2e91c6aaf029e62760e641e207d9d17a3a943. - type interfaces footprint - Revert "fix" This reverts commit 3598dc08625269aa0a6ecff2a6903c4217b801ee. - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21163/files - new: https://git.openjdk.org/jdk/pull/21163/files/43e2e91c..de23a5a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21163&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21163&range=01-02 Stats: 32999 lines in 625 files changed: 26119 ins; 3741 del; 3139 mod Patch: https://git.openjdk.org/jdk/pull/21163.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21163/head:pull/21163 PR: https://git.openjdk.org/jdk/pull/21163 From roland at openjdk.org Tue Oct 1 13:36:23 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 13:36:23 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 18:51:49 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/type.cpp line 3270: >> >>> 3268: } >>> 3269: >>> 3270: const TypeInterfaces* TypeInterfaces::make(const GrowableArray* interfaces) { >> >> I think you can make `_interface` a `ciInstanceKlass**` and do this: >> >> void* ptr = Type::operator new(sizeof(TypeInterfaces) + sizeof(ciInstanceKlass*) * interfaces->length()) >> >> Then `delete ptr` should drop the whole thing. > > A `GrowableArrayFromArray` would be mostly compatible with the interface of `GrowableArray`, too. Ah! nice. I wasn't aware of `GrowableArrayFromArray`. Updated change follows your suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21163#discussion_r1782833378 From dnsimon at openjdk.org Tue Oct 1 13:56:35 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 13:56:35 GMT Subject: RFR: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:17:53 GMT, Yudi Zheng wrote: > This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler LGTM ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21287#pullrequestreview-2340426626 From yzheng at openjdk.org Tue Oct 1 14:02:46 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 1 Oct 2024 14:02:46 GMT Subject: RFR: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:17:53 GMT, Yudi Zheng wrote: > This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21287#issuecomment-2386049645 From yzheng at openjdk.org Tue Oct 1 14:02:47 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 1 Oct 2024 14:02:47 GMT Subject: Integrated: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI In-Reply-To: References: Message-ID: <5TjZNvwPLhZIj9JMOSlhDJNbZ19sA4k9hsu40hw4Glk=.05bf8bd5-5b85-4d68-a65a-73a0aa8a1f42@github.com> On Tue, 1 Oct 2024 13:17:53 GMT, Yudi Zheng wrote: > This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler This pull request has now been integrated. Changeset: 2120a841 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/2120a8414ef9c34d5875d33ac9a16594908fe403 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/21287 From mbaesken at openjdk.org Tue Oct 1 14:43:46 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 1 Oct 2024 14:43:46 GMT Subject: RFR: 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' Message-ID: When running ubsan-enabled optimized binaries on Linux x86_64, test compiler/startup/StartupOutput.java triggers this ubsan issue : jdk/src/hotspot/share/ci/ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' #0 0x7fe7443fc88d in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1660 #1 0x7fe746c22047 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 #2 0x7fe7447dd429 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 #3 0x7fe7445c614d in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 #4 0x7fe744259ceb in Runtime1::generate_blob(BufferBlob*, int, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:230 #5 0x7fe74425a273 in Runtime1::generate_blob_for(BufferBlob*, Runtime1::StubID) src/hotspot/share/c1/c1_Runtime1.cpp:259 #6 0x7fe74425a273 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:268 #7 0x7fe743fc04a1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 #8 0x7fe743fc04a1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 #9 0x7fe7446aaad7 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 #10 0x7fe7446b83cf in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 #11 0x7fe74516edca in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:758 #12 0x7fe7469d3c9a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 #13 0x7fe746048cd1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 #14 0x7fe74b1e66e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 052f7e2a0045f08cb7e7a291f8066a4b7be2521d) #15 0x7fe74aaf158e in clone (/lib64/libc.so.6+0x11858e) (BuildId: cfb059a57e69ac95d5dadab831626b3bd48a4309) ------------- Commit messages: - JDK-8340109 Changes: https://git.openjdk.org/jdk/pull/21288/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21288&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340109 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21288/head:pull/21288 PR: https://git.openjdk.org/jdk/pull/21288 From mdoerr at openjdk.org Tue Oct 1 14:54:36 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 1 Oct 2024 14:54:36 GMT Subject: RFR: 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 13:37:18 GMT, Richard Reingruber wrote: > With `-XX:+PrintInterpreter` the instructions of the generated interpreter are printed at startup. > > But also after startup when printing an interpreted frame the instructions of the corresponding `InterpreterCodelet` are also printed. This can become a problem with `-Xlog:continuations=trace` because large sections of the interpreter are printed repeatedly and redundantly. > > With this change the instructions are only printed when printing all codelets, i.e. normally once at start-up. > > I've tested with the reproducer from the JBS-Issue. LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21158#pullrequestreview-2340609133 From coleenp at openjdk.org Tue Oct 1 15:01:37 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 1 Oct 2024 15:01:37 GMT Subject: RFR: 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 13:37:18 GMT, Richard Reingruber wrote: > With `-XX:+PrintInterpreter` the instructions of the generated interpreter are printed at startup. > > But also after startup when printing an interpreted frame the instructions of the corresponding `InterpreterCodelet` are also printed. This can become a problem with `-Xlog:continuations=trace` because large sections of the interpreter are printed repeatedly and redundantly. > > With this change the instructions are only printed when printing all codelets, i.e. normally once at start-up. > > I've tested with the reproducer from the JBS-Issue. Looks fine. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21158#pullrequestreview-2340628678 From kvn at openjdk.org Tue Oct 1 15:47:36 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Oct 2024 15:47:36 GMT Subject: RFR: 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 14:37:50 GMT, Matthias Baesken wrote: > When running ubsan-enabled optimized binaries on Linux x86_64, test > compiler/startup/StartupOutput.java > triggers this ubsan issue : > > > jdk/src/hotspot/share/ci/ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fe7443fc88d in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1660 > #1 0x7fe746c22047 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fe7447dd429 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fe7445c614d in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fe744259ceb in Runtime1::generate_blob(BufferBlob*, int, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:230 > #5 0x7fe74425a273 in Runtime1::generate_blob_for(BufferBlob*, Runtime1::StubID) src/hotspot/share/c1/c1_Runtime1.cpp:259 > #6 0x7fe74425a273 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:268 > #7 0x7fe743fc04a1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fe743fc04a1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fe7446aaad7 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fe7446b83cf in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fe74516edca in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:758 > #12 0x7fe7469d3c9a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #13 0x7fe746048cd1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fe74b1e66e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 052f7e2a0045f08cb7e7a291f8066a4b7be2521d) > #15 0x7fe74aaf158e in clone (/lib64/libc.so.6+0x11858e) (BuildId: cfb059a57e69ac95d5dadab831626b3bd48a4309) Good. I would say it is trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21288#pullrequestreview-2340770636 From kvn at openjdk.org Tue Oct 1 16:04:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Oct 2024 16:04:37 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: On Tue, 1 Oct 2024 10:57:58 GMT, Tom?? Zezula wrote: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. `/compiler' part of changes is fine. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21285#pullrequestreview-2340808550 From kvn at openjdk.org Tue Oct 1 16:36:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Oct 2024 16:36:35 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 22:52:18 GMT, Dean Long wrote: >> Instead of bailout in alternative approach we can change `cha_monomorphic_target` to `nullptr` in code which is looking for it in previous lines. `target` will be used for call and we will loose a little performance when JVMTI is used instead of skipping compilation. Am I missing something? > > @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. > > Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2386474610 From rehn at openjdk.org Tue Oct 1 18:00:37 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 1 Oct 2024 18:00:37 GMT Subject: RFR: 8341146: RISC-V: Unnecessary fences used for load-acquire in template interpreter In-Reply-To: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> References: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> Message-ID: On Sun, 29 Sep 2024 10:52:25 GMT, Feilong Jiang wrote: > Hi, please consider. > > RISC-V does not currently have plain load and store opcodes with aq or rl annotations, load-acquire and > store-release operations are implemented using fences instead. Initially, we followed the RISC-V spec > and placed FENCE RW,RW fence in front of load-acquire operation when porting the template interpreter. > The purpose is to enforce a store-release-to-load-acquire ordering (where there must be a FENCE RW,RW > between the store-release and load-acquire). But it turns out these fences are unnecessary for our use > cases in the template interpreter. In fact, we only need to do a single FENCE R,RW after a normal memory > load in order to implement a load-acquire operation. We should remove those unnecessary fences for both > performance reasons and for consistency with the rest of the port (i.e., C1 and C2 JIT). > > Testing: > - [x] JCstress > - [x] hs-tier1 - hs-tier4 > - [x] ~5% improvement on SPECJbb2005 score (-Xint -XX:+UseParallelGC) Thank you! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21248#pullrequestreview-2341039591 From sviswanathan at openjdk.org Tue Oct 1 18:05:44 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Oct 2024 18:05:44 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 1 Oct 2024 09:53:02 GMT, Jatin Bhateja wrote: >> Thanks for the example. Yes, you have a point there. So we would need to do: >> src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); > >> This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); > > Yes, this may save additional allocation penalty of result array allocation which may slightly improve fall back performance, but logical operation cannot be directly applied over floating point vectors. so, we will need an explicit conversion to integral vector, which is why I opted for current fallback implementation which is in line with rest of the code. I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1783278063 From sviswanathan at openjdk.org Tue Oct 1 18:12:39 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Oct 2024 18:12:39 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v14] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <5wF2qLX9Z_tquvURMW0HVnrmMla1awxtz6C0UYI0lh4=.94df7340-1a27-4d93-80fa-d4c561641a97@github.com> On Tue, 1 Oct 2024 09:51:27 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/hotspot/share/opto/vectorIntrinsics.cpp line 2797: > 2795: > 2796: Node* operation = lowerSelectFromOp ? > 2797: LowerSelectFromTwoVectorOperation(gvn(), opd1, opd2, opd3, vt) : Thanks for bringing the lowering right here. It opens up an optimization opportunity: currently for float/double we have two casts for index (e.g. from float -> int at line 2786 and from int -> byte at line 2661 as part of LowerSelectFromTwoVectorOperation. Could this be done by one cast? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1783296741 From vlivanov at openjdk.org Tue Oct 1 21:25:35 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Oct 2024 21:25:35 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 00:35:47 GMT, Dean Long wrote: > JVMTI can add and delete methods Can you elaborate on that point, please? JVMTI spec states that redefinition/retransformation "must not add, remove or rename fields or methods" [1] [2]. [1] https://docs.oracle.com/en/java/javase/23/docs/specs/jvmti.html#RedefineClasses [2] https://docs.oracle.com/en/java/javase/23/docs/specs/jvmti.html#RetransformClasses ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2387101310 From vlivanov at openjdk.org Tue Oct 1 21:29:34 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Oct 2024 21:29:34 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 16:33:45 GMT, Vladimir Kozlov wrote: >> @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. >> >> Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. > > @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? I like @vnkozlov suggestion to null out `cha_monomorphic_target`. Moreover, the validation can be performed inside `ciMethod::find_monomorphic_target()` which is used to compute `cha_monomorphic_target`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2387105860 From kxu at openjdk.org Tue Oct 1 21:31:12 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 1 Oct 2024 21:31:12 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: update comments, use explicit opcode comparisons for LShift nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/6e65e13f..af6f8084 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=06-07 Stats: 8 lines in 1 file changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From vlivanov at openjdk.org Tue Oct 1 21:39:36 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Oct 2024 21:39:36 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 00:35:47 GMT, Dean Long wrote: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. src/hotspot/share/ci/ciMethod.cpp line 800: > 798: Method* m1 = this->get_Method(); > 799: Method* m2 = m->get_Method(); > 800: guarantee(!m1->is_private() && !m1->is_deleted(), "see usage note"); Some changes inside `ciMethod::equals` look irrelevant to checking method equality (e.g., asserting that a method is not private). Alternatively, if you decide to keep the current shape of the fix, the code can be moved closer to the use site as a helper function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1783559452 From sviswanathan at openjdk.org Tue Oct 1 22:51:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Oct 2024 22:51:43 GMT Subject: Integrated: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: 83dcb02d Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/83dcb02d776448aa04f3f41df489bd4355443a4d Stats: 697 lines in 47 files changed: 549 ins; 34 del; 114 mod 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes Reviewed-by: jbhateja, psandoz ------------- PR: https://git.openjdk.org/jdk/pull/20634 From vlivanov at openjdk.org Tue Oct 1 23:38:41 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Oct 2024 23:38:41 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v3] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:36:22 GMT, Roland Westrelin wrote: >> The list of interfaces for a `TypeInterfaces` is contained in a `GrowableArray` that's allocated in the type arena. When `hashcons()` deletes a `TypeInterfaces` object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the `GrowableArray`, not the `TypeInterfaces` object. >> >> This patch changes the array of interfaces stored in `TypeInterfaces` into a pointer to a `GrowableArray`. `TypeInterfaces::make` calls `hashcons` with a temporary copy of the array of interfaces allocated in the current thread's resource area. This way if `hascons` deletes the `TypeInterfaces`, it is the last thing allocated in the type arena and memory can be reclaimed. Memory for the `GrowableArray` is freed as well on return from `TypeInterfaces::make`. If the newly allocated `TypeInterfaces` survives `hashcons` then a permanent array of interfaces is allocated in the type arena and linked from the `TypeInterfaces` object. >> >> I ran into this issue while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. >> >> When I opened the PR initially, the fix I proposed was to let `GrowableArray` try to reclaim memory from an arena when destroyed. With that patch, some gtests failed when a `GrowableArray` is created by a copy constructor and the backing store for the array is shared. As a consequence, I reconsidered the fix and thought it was safer to go with a fix that only affects `TypeInterfaces`. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - comment > - Merge branch 'master' into JDK-8340824 > - more > - more > - single memory area > - Revert "type interfaces footprint" > > This reverts commit 43e2e91c6aaf029e62760e641e207d9d17a3a943. > - type interfaces footprint > - Revert "fix" > > This reverts commit 3598dc08625269aa0a6ecff2a6903c4217b801ee. > - fix Looks good. It feels a bit weird to see `GrowableArray` used to represent a read-only data structure, but I understand that you still benefit from some helper methods it provides. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21163#pullrequestreview-2341626070 From vlivanov at openjdk.org Tue Oct 1 23:55:49 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Oct 2024 23:55:49 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v8] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 14:45:01 GMT, Tobias Holenstein wrote: >> We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 >> The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` >> >> We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 >> - with **base** = `147 CheckCastPP` >> - `118 ConP === 0 [[[ 106 101 71 ] #null` >> type >> >> Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. >> 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 >> >> 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` >> https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 >> and insert a null check >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 >> In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ >> >> ### Fix (updated on 18th Sep 2024) >> T... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > add second uncast (Vladimirs suggestion) Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20033#pullrequestreview-2341636531 From tholenstein at openjdk.org Tue Oct 1 23:55:50 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 1 Oct 2024 23:55:50 GMT Subject: Integrated: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: References: Message-ID: On Thu, 4 Jul 2024 12:17:51 GMT, Tobias Holenstein wrote: > We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 > The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` > > We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 > - with **base** = `147 CheckCastPP` > - `118 ConP === 0 [[[ 106 101 71 ] #null` > type > > Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. > 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 > > 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` > https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 > and insert a null check > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ > > ### Fix (updated on 18th Sep 2024) > The fix modifies the `LibraryCallKit::classify_unsafe_addr()`... This pull request has now been integrated. Changeset: 8d6d37fe Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/8d6d37fea133380d4143f5db38ad3790efa84f68 Stats: 117 lines in 3 files changed: 114 ins; 1 del; 2 mod 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access Reviewed-by: thartmann, kvn, vlivanov, epeter, roland ------------- PR: https://git.openjdk.org/jdk/pull/20033 From qamai at openjdk.org Wed Oct 2 01:36:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 2 Oct 2024 01:36:38 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v3] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:36:22 GMT, Roland Westrelin wrote: >> The list of interfaces for a `TypeInterfaces` is contained in a `GrowableArray` that's allocated in the type arena. When `hashcons()` deletes a `TypeInterfaces` object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the `GrowableArray`, not the `TypeInterfaces` object. >> >> This patch changes the array of interfaces stored in `TypeInterfaces` into a pointer to a `GrowableArray`. `TypeInterfaces::make` calls `hashcons` with a temporary copy of the array of interfaces allocated in the current thread's resource area. This way if `hascons` deletes the `TypeInterfaces`, it is the last thing allocated in the type arena and memory can be reclaimed. Memory for the `GrowableArray` is freed as well on return from `TypeInterfaces::make`. If the newly allocated `TypeInterfaces` survives `hashcons` then a permanent array of interfaces is allocated in the type arena and linked from the `TypeInterfaces` object. >> >> I ran into this issue while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. >> >> When I opened the PR initially, the fix I proposed was to let `GrowableArray` try to reclaim memory from an arena when destroyed. With that patch, some gtests failed when a `GrowableArray` is created by a copy constructor and the backing store for the array is shared. As a consequence, I reconsidered the fix and thought it was safer to go with a fix that only affects `TypeInterfaces`. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - comment > - Merge branch 'master' into JDK-8340824 > - more > - more > - single memory area > - Revert "type interfaces footprint" > > This reverts commit 43e2e91c6aaf029e62760e641e207d9d17a3a943. > - type interfaces footprint > - Revert "fix" > > This reverts commit 3598dc08625269aa0a6ecff2a6903c4217b801ee. > - fix Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21163#pullrequestreview-2341768112 From roland at openjdk.org Wed Oct 2 07:13:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 07:13:51 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v3] In-Reply-To: References: Message-ID: <9vr6Uk48dB75INt4SYSyQ-qoLfkEg4--WyWjHtI4nWc=.ff42aac4-12d0-49ed-8921-d0b34896ca6c@github.com> On Tue, 1 Oct 2024 23:35:42 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - comment >> - Merge branch 'master' into JDK-8340824 >> - more >> - more >> - single memory area >> - Revert "type interfaces footprint" >> >> This reverts commit 43e2e91c6aaf029e62760e641e207d9d17a3a943. >> - type interfaces footprint >> - Revert "fix" >> >> This reverts commit 3598dc08625269aa0a6ecff2a6903c4217b801ee. >> - fix > > Looks good. > > It feels a bit weird to see `GrowableArray` used to represent a read-only data structure, but I understand that you still benefit from some helper methods it provides. @iwanowww @merykitty thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/21163#issuecomment-2387773938 From roland at openjdk.org Wed Oct 2 07:13:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 07:13:51 GMT Subject: Integrated: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 15:53:06 GMT, Roland Westrelin wrote: > The list of interfaces for a `TypeInterfaces` is contained in a `GrowableArray` that's allocated in the type arena. When `hashcons()` deletes a `TypeInterfaces` object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the `GrowableArray`, not the `TypeInterfaces` object. > > This patch changes the array of interfaces stored in `TypeInterfaces` into a pointer to a `GrowableArray`. `TypeInterfaces::make` calls `hashcons` with a temporary copy of the array of interfaces allocated in the current thread's resource area. This way if `hascons` deletes the `TypeInterfaces`, it is the last thing allocated in the type arena and memory can be reclaimed. Memory for the `GrowableArray` is freed as well on return from `TypeInterfaces::make`. If the newly allocated `TypeInterfaces` survives `hashcons` then a permanent array of interfaces is allocated in the type arena and linked from the `TypeInterfaces` object. > > I ran into this issue while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. > > When I opened the PR initially, the fix I proposed was to let `GrowableArray` try to reclaim memory from an arena when destroyed. With that patch, some gtests failed when a `GrowableArray` is created by a copy constructor and the backing store for the array is shared. As a consequence, I reconsidered the fix and thought it was safer to go with a fix that only affects `TypeInterfaces`. This pull request has now been integrated. Changeset: 90c944fe Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/90c944fefe4a7827c08a8e6a81f137c3157a749b Stats: 89 lines in 2 files changed: 14 ins; 11 del; 64 mod 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() Reviewed-by: vlivanov, qamai ------------- PR: https://git.openjdk.org/jdk/pull/21163 From lucy at openjdk.org Wed Oct 2 07:54:39 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 2 Oct 2024 07:54:39 GMT Subject: RFR: 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 14:37:50 GMT, Matthias Baesken wrote: > When running ubsan-enabled optimized binaries on Linux x86_64, test > compiler/startup/StartupOutput.java > triggers this ubsan issue : > > > jdk/src/hotspot/share/ci/ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fe7443fc88d in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1660 > #1 0x7fe746c22047 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fe7447dd429 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fe7445c614d in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fe744259ceb in Runtime1::generate_blob(BufferBlob*, int, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:230 > #5 0x7fe74425a273 in Runtime1::generate_blob_for(BufferBlob*, Runtime1::StubID) src/hotspot/share/c1/c1_Runtime1.cpp:259 > #6 0x7fe74425a273 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:268 > #7 0x7fe743fc04a1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fe743fc04a1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fe7446aaad7 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fe7446b83cf in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fe74516edca in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:758 > #12 0x7fe7469d3c9a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #13 0x7fe746048cd1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fe74b1e66e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 052f7e2a0045f08cb7e7a291f8066a4b7be2521d) > #15 0x7fe74aaf158e in clone (/lib64/libc.so.6+0x11858e) (BuildId: cfb059a57e69ac95d5dadab831626b3bd48a4309) Looks good. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21288#pullrequestreview-2342102150 From mbaesken at openjdk.org Wed Oct 2 08:00:44 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 2 Oct 2024 08:00:44 GMT Subject: RFR: 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 14:37:50 GMT, Matthias Baesken wrote: > When running ubsan-enabled optimized binaries on Linux x86_64, test > compiler/startup/StartupOutput.java > triggers this ubsan issue : > > > jdk/src/hotspot/share/ci/ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fe7443fc88d in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1660 > #1 0x7fe746c22047 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fe7447dd429 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fe7445c614d in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fe744259ceb in Runtime1::generate_blob(BufferBlob*, int, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:230 > #5 0x7fe74425a273 in Runtime1::generate_blob_for(BufferBlob*, Runtime1::StubID) src/hotspot/share/c1/c1_Runtime1.cpp:259 > #6 0x7fe74425a273 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:268 > #7 0x7fe743fc04a1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fe743fc04a1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fe7446aaad7 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fe7446b83cf in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fe74516edca in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:758 > #12 0x7fe7469d3c9a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #13 0x7fe746048cd1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fe74b1e66e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 052f7e2a0045f08cb7e7a291f8066a4b7be2521d) > #15 0x7fe74aaf158e in clone (/lib64/libc.so.6+0x11858e) (BuildId: cfb059a57e69ac95d5dadab831626b3bd48a4309) Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21288#issuecomment-2387851841 From mbaesken at openjdk.org Wed Oct 2 08:00:44 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 2 Oct 2024 08:00:44 GMT Subject: Integrated: 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: References: Message-ID: <_qqrQZuWDvRfqPZR7hoclhiQ6HJIw4sgRxewbxefosY=.b544c413-4a4f-4d1d-a923-f9c88ce0e7a9@github.com> On Tue, 1 Oct 2024 14:37:50 GMT, Matthias Baesken wrote: > When running ubsan-enabled optimized binaries on Linux x86_64, test > compiler/startup/StartupOutput.java > triggers this ubsan issue : > > > jdk/src/hotspot/share/ci/ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fe7443fc88d in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1660 > #1 0x7fe746c22047 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fe7447dd429 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fe7445c614d in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fe744259ceb in Runtime1::generate_blob(BufferBlob*, int, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:230 > #5 0x7fe74425a273 in Runtime1::generate_blob_for(BufferBlob*, Runtime1::StubID) src/hotspot/share/c1/c1_Runtime1.cpp:259 > #6 0x7fe74425a273 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:268 > #7 0x7fe743fc04a1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fe743fc04a1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fe7446aaad7 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fe7446b83cf in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fe74516edca in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:758 > #12 0x7fe7469d3c9a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #13 0x7fe746048cd1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fe74b1e66e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 052f7e2a0045f08cb7e7a291f8066a4b7be2521d) > #15 0x7fe74aaf158e in clone (/lib64/libc.so.6+0x11858e) (BuildId: cfb059a57e69ac95d5dadab831626b3bd48a4309) This pull request has now been integrated. Changeset: efe3573b Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/efe3573b9b4ecec0630fdc1c61c765713a5b68e6 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' Reviewed-by: kvn, lucy ------------- PR: https://git.openjdk.org/jdk/pull/21288 From roland at openjdk.org Wed Oct 2 08:04:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 08:04:37 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: References: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> Message-ID: On Tue, 17 Sep 2024 09:33:38 GMT, Christian Hagedorn wrote: >>> If this is your intention, then please ignore this message. >> >> Yes, this is my intention. >> >> --- >> >> My previous approach of identifying optimized `Mul->shift + add/sub` (e.g., `a*6` becomes `(a<<1) + (a<<2)` by `MulNode::Ideal()`) was inherently flawed. I was solely determining this with the number of terms. It is not reliable. In the `TestLargeTreeOfSubNodes` example, it replaces already optimized Mul nodes and a new Mul node and repeats the process, causing performance regression (and timeouts). >> >> The new approach matches the exact patterns of optimized `MulNode`s. Additionally, a recursion depth limit of 5 (a rather arbitrary number) is in effect during *iterative* GVN to mitigate the risk of exhausting resources. Untransformed nodes are added to the worklist and will be eventually transformed. >> >> Please note, in the case of `TestLargeTreeOfSubNodes` with flags mentioned above, the compilation is skipped without a large enough `-XX:MaxLabelRootDepth`. This is the same behaviour as the current master. >> >> Please re-review once GHA is confirmed passing. Thanks! > >> Please note, in the case of TestLargeTreeOfSubNodes with flags mentioned above, the compilation is skipped without a large enough -XX:MaxLabelRootDepth. This is the same behaviour as the current master. > > Have you found out why this is the case? I thought that the original fix which added `TestLargeTreeOfSubNodes` wanted to fix the problem of running out of nodes. > > I gave your patch another spin. We still see various failures and timeouts. For example: > > `compiler/intrinsics/sha/TestDigest.java` times out with various flag combinations (for example `-server -Xmixed`). Here is the stack at the timeout: > > > Thread 7 (Thread 0x7fc808490700 (LWP 22433)): > #0 0x00007fc80d648051 in Node::find_integer_type(BasicType) const () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #1 0x00007fc80c793214 in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #2 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > ... > #90 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #91 0x00007fc80c793082 in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #92 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #93 0x00007fc80c793351 in AddNode::convert_serial_additions(PhaseGVN*, bool, BasicType) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #94 0x00007fc80c7937c5 in AddNode... @chhagedorn would you mind running the latest version patch through testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2387860251 From fjiang at openjdk.org Wed Oct 2 09:17:39 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 2 Oct 2024 09:17:39 GMT Subject: RFR: 8341146: RISC-V: Unnecessary fences used for load-acquire in template interpreter In-Reply-To: References: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> Message-ID: On Tue, 1 Oct 2024 17:57:53 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> RISC-V does not currently have plain load and store opcodes with aq or rl annotations, load-acquire and >> store-release operations are implemented using fences instead. Initially, we followed the RISC-V spec >> and placed FENCE RW,RW fence in front of load-acquire operation when porting the template interpreter. >> The purpose is to enforce a store-release-to-load-acquire ordering (where there must be a FENCE RW,RW >> between the store-release and load-acquire). But it turns out these fences are unnecessary for our use >> cases in the template interpreter. In fact, we only need to do a single FENCE R,RW after a normal memory >> load in order to implement a load-acquire operation. We should remove those unnecessary fences for both >> performance reasons and for consistency with the rest of the port (i.e., C1 and C2 JIT). >> >> Testing: >> - [x] JCstress >> - [x] hs-tier1 - hs-tier4 >> - [x] ~5% improvement on SPECJbb2005 score (-Xint -XX:+UseParallelGC) > > Thank you! Thanks! @robehn @RealFYang ------------- PR Comment: https://git.openjdk.org/jdk/pull/21248#issuecomment-2387996811 From fjiang at openjdk.org Wed Oct 2 09:17:40 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 2 Oct 2024 09:17:40 GMT Subject: Integrated: 8341146: RISC-V: Unnecessary fences used for load-acquire in template interpreter In-Reply-To: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> References: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> Message-ID: On Sun, 29 Sep 2024 10:52:25 GMT, Feilong Jiang wrote: > Hi, please consider. > > RISC-V does not currently have plain load and store opcodes with aq or rl annotations, load-acquire and > store-release operations are implemented using fences instead. Initially, we followed the RISC-V spec > and placed FENCE RW,RW fence in front of load-acquire operation when porting the template interpreter. > The purpose is to enforce a store-release-to-load-acquire ordering (where there must be a FENCE RW,RW > between the store-release and load-acquire). But it turns out these fences are unnecessary for our use > cases in the template interpreter. In fact, we only need to do a single FENCE R,RW after a normal memory > load in order to implement a load-acquire operation. We should remove those unnecessary fences for both > performance reasons and for consistency with the rest of the port (i.e., C1 and C2 JIT). > > Testing: > - [x] JCstress > - [x] hs-tier1 - hs-tier4 > - [x] ~5% improvement on SPECJbb2005 score (-Xint -XX:+UseParallelGC) This pull request has now been integrated. Changeset: a4ca6267 Author: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/a4ca6267e17815153f8fa119db19b97b1da2bd84 Stats: 9 lines in 1 file changed: 0 ins; 9 del; 0 mod 8341146: RISC-V: Unnecessary fences used for load-acquire in template interpreter Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/21248 From mli at openjdk.org Wed Oct 2 10:15:54 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 2 Oct 2024 10:15:54 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/9566d51f...14483b83 Hi, have some comments on riscv part code. I'm not sure if the same comments also apply to other code, please have a look if necessary. src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 55: > 53: } > 54: for (RegSetIterator reg = no_preserve.begin(); *reg != noreg; ++reg) { > 55: stub->dont_preserve(*reg); Could `no_preserve` and `preserve` overlap? If false, then seems it's not necessary to do `dont_preserve` for `no_preserve` If true, seems it's not safe to `dont_preserve` these regs? I'm not sure. src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 169: > 167: predicate(UseG1GC && n->as_LoadStore()->barrier_data() != 0); > 168: match(Set res (CompareAndExchangeP mem (Binary oldval newval))); > 169: effect(TEMP res, TEMP tmp1, TEMP tmp2); should `res` be `TEMP_DEF`? src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 201: > 199: predicate(UseG1GC && needs_acquiring_load_reserved(n) && n->as_LoadStore()->barrier_data() != 0); > 200: match(Set res (CompareAndExchangeP mem (Binary oldval newval))); > 201: effect(TEMP res, TEMP tmp1, TEMP tmp2); should `res` be `TEMP_DEF`? src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 233: > 231: predicate(UseG1GC && n->as_LoadStore()->barrier_data() != 0); > 232: match(Set res (CompareAndExchangeN mem (Binary oldval newval))); > 233: effect(TEMP res, TEMP tmp1, TEMP tmp2, TEMP tmp3); should `res` be `TEMP_DEF`? src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 263: > 261: predicate(UseG1GC && needs_acquiring_load_reserved(n) && n->as_LoadStore()->barrier_data() != 0); > 262: match(Set res (CompareAndExchangeN mem (Binary oldval newval))); > 263: effect(TEMP res, TEMP tmp1, TEMP tmp2, TEMP tmp3); should `res` be `TEMP_DEF`? And same comment for following instructs? ------------- PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2342455263 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784240549 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784209154 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784210589 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784211728 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784212185 From thartmann at openjdk.org Wed Oct 2 10:44:38 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 10:44:38 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:22:23 GMT, Roland Westrelin wrote: >> When converting a `LongCountedLoop` into a loop nest, c2 needs jvm >> state to add predicates to the inner loop. For that, it peels an >> iteration of the loop and uses the state of the safepoint at the end >> of the loop. That's only legal if there's no side effect between the >> safepoint and the backedge that goes back into the loop. The assert >> failure here happens in code that checks that. >> >> That code compares the memory states at the safepoint and at the >> backedge. If they are the same then there's no side effect. To check >> consistency, the `MergeMem` at the safepoint is cloned. As the logic >> iterates over the backedge state, it clears every component of the >> state it encounters from the `MergeMem`. Once done, the cloned >> `MergeMem` should be "empty". In the case of this failure, no side >> effect is found but the cloned `MergeMem` is not empty. That happens >> because of EA: it adds edges to the `MergeMem` at the safepoint that >> it doesn't add to the backedge `Phis`. >> >> So it's the verification code that fails and I propose dealing with >> this by ignoring memory state added by EA in the verification code. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8336702 > - test indentation > - fix & test Looks good to me. Testing passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21009#pullrequestreview-2342590569 From thartmann at openjdk.org Wed Oct 2 10:44:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 10:44:39 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: <4cGhoDWjhLY3K9PY5CydEPh0mdwDn6EPVMAWQWU4U3M=.60c7cf50-32ca-4bf4-8d53-13c1ae5dabac@github.com> References: <85C55cSsG9xUcs6GV3nCiq-idMJhEOywFN3atCFOO78=.53e95ff8-e9f4-4c86-a97d-5bff2cb009d3@github.com> <4cGhoDWjhLY3K9PY5CydEPh0mdwDn6EPVMAWQWU4U3M=.60c7cf50-32ca-4bf4-8d53-13c1ae5dabac@github.com> Message-ID: On Tue, 1 Oct 2024 09:40:14 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 708: >> >>> 706: for (MergeMemStream mms(mem->as_MergeMem()); mms.next_non_empty(); ) { >>> 707: // Loop invariant memory state won't be reset by no_side_effect_since_safepoint(). Do it here. >>> 708: // Escape Analysis can add state to mm that it doesn't add to the backedge memory Phis, breaking verification >> >> Where exactly does that happen in EA? > > When an allocation is non escaping and made scalar replaceable, new slices are allocated for the fields of the allocation and the memory graph is updated so allocation/stores/loads to the new slices are connected together. In the process, `MergeMem` nodes need to be updated as well. In this case, I'm not sure this particular `MergeMem` node needs to be updated by EA but it's harmless in any case. The verification code doesn't expect "more" state to be recorded at the safepoint because of the `MergeMem` than at the backedge. Okay, thanks for the details! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21009#discussion_r1784285660 From thartmann at openjdk.org Wed Oct 2 11:02:36 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 11:02:36 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v12] In-Reply-To: <0Ce3kXIzVDXuzpKGBzekEEOuJftpvkCvLwDkVOtpPR0=.ec2c1c1b-2b78-497d-97d3-a613ef736d2f@github.com> References: <0Ce3kXIzVDXuzpKGBzekEEOuJftpvkCvLwDkVOtpPR0=.ec2c1c1b-2b78-497d-97d3-a613ef736d2f@github.com> Message-ID: <_6h1VZCWQ25jOovnzdnQkR1OljZGcmx7SEY7ezhGE-g=.8805d48d-01f1-4f89-b396-4f7660919d6a@github.com> On Tue, 24 Sep 2024 16:53:51 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > left over Great work Daniel! The changes look good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19646#pullrequestreview-2342618110 From roland at openjdk.org Wed Oct 2 11:26:47 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 11:26:47 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop Message-ID: The patch includes 2 test cases for this: test1() causes the assert failure in the bug description, test2() causes an incorrect execution where a load floats above a store that it should be dependent on. In the test cases, `field` is accessed on object `a` of type `A`. When the field is accessed, the type that c2 has for `a` is `A` with interface `I`. The holder of the field is class `A` which implements no interface. The reason the type of `a` and the type of the holder are slightly different is because `a` is the result of a merge of objects of subclasses `B` and `C` which implements `I`. The root cause of the bug is that `Compile::flatten_alias_type()` doesn't change `A` + interface `I` into `A`, the actual holder of the field. So `field` in `A` + interface `I` and `field` in `A` get different slices which is wrong. At parse time, the logic that creates the `Store` node uses: C->alias_type(field)->adr_type() to compute the slice which is the slice for `field` in `A`. So the slice used at parse time is the right one but during igvn, when the slice is computed from the input address, a different slice (the one for `A` + interface `I`) is used. That causes load/store nodes when they are processed by igvn to use the wrong memory state. In `Compile::flatten_alias_type()`: if (!ik->equals(canonical_holder) || tj->offset() != offset) { if( is_known_inst ) { tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); } else { tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); } } only flattens the type if it's not the canonical holder but it should test that the type doesn't implement interfaces that the canonical holder doesn't. To keep the logic simple, the fix I propose creates a new type whenever there's a chance that a type implements extra interfaces (the type is not exact). I also added asserts in `GraphKit::make_load()` and `GraphKit::store_to_memory()` to make sure the slice that is passed and the address type agree. Those asserts fire with the new test cases. When running testing, I found that they also catch a few cases in `library_call.cpp` where an incorrect slice is passed. As further clean up, maybe we want to drop the slice argument to `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to their callers) given it's redundant with the address type and error prone. ------------- Commit messages: - test cleanup - fix & test Changes: https://git.openjdk.org/jdk/pull/21303/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340214 Stats: 121 lines in 4 files changed: 109 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21303/head:pull/21303 PR: https://git.openjdk.org/jdk/pull/21303 From thartmann at openjdk.org Wed Oct 2 11:29:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 11:29:40 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v19] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Mon, 30 Sep 2024 13:36:19 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments in TestParallelIvInIntCountedLoop.java Changes requested by thartmann (Reviewer). test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 315: > 313: } > 314: > 315: return a; Shouldn't there also be tests for the `int a` `long i` variant? test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 319: > 317: > 318: private static void testCorrectness() { > 319: Random rng = new Random(); You should use `Utils.getRandomInstance()` instead which logs the seed for better reproducibility. Also add `@key randomness` to the test header. test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 321: > 319: Random rng = new Random(); > 320: > 321: // Since we can't easily determined expected values if loop varibles overflow, we make sure i is less than (MAX_VALUE - stride). Suggestion: // Since we can't easily determine expected values if loop variables overflow, we make sure i is less than (MAX_VALUE - stride). test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 325: > 323: > 324: for (int i : iterations) { > 325: Asserts.assertEQ(i, testIntCountedLoopWithIntIV(i)); Code in this loop is not guaranteed to be even C2 compiled because IR verification will be executed in a separate VM. IR framework tests that also want to verify the output, should be written like this: https://github.com/openjdk/jdk/blob/9bd478593cc92a716151d1373f3426f1d92143bb/test/hotspot/jtreg/testlibrary_tests/ir_framework/examples/CustomRunTestExample.java#L84-L97 ------------- PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-2342647468 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1784324768 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1784331782 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1784325142 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1784333260 From rcastanedalo at openjdk.org Wed Oct 2 11:42:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 2 Oct 2024 11:42:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: <7L7jYDlFa0WnVvgiyNHI9KZrcffYwNnBB899AuMS56Q=.40b031e7-07b8-4a15-b319-c53b38a17a49@github.com> On Wed, 2 Oct 2024 10:10:12 GMT, Hamlin Li wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: >> >> - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion >> - riscv port refactor >> - Remove temporary support code >> - Merge jdk-24+17 >> - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization >> - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions >> - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes >> - Merge jdk-24+16 >> - Ensure that detected encode-and-store patterns are matched >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - ... and 43 more: https://git.openjdk.org/jdk/compare/486c5b0d...14483b83 > > src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 55: > >> 53: } >> 54: for (RegSetIterator reg = no_preserve.begin(); *reg != noreg; ++reg) { >> 55: stub->dont_preserve(*reg); > > Could `no_preserve` and `preserve` overlap? > If false, then seems it's not necessary to do `dont_preserve` for `no_preserve` > If true, seems it's not safe to `dont_preserve` these regs? I'm not sure. In the G1 case, the use of `dont_preserve` is an optimization to avoid spilling and reloading, in the slow path of the pre-barrier, registers (`res`) that are not live at that point. It is not necessary for correctness, but saves a few bytes in the generated code. If `res` was not marked as `dont_preserve`, it would be included in the pre-barrier stub's preserve set (`BarrierStubC2::preserve_set()`) because it is live out of the entire AD instruction (as computed by `BarrierSetC2::compute_liveness_at_stubs()`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784346898 From rcastanedalo at openjdk.org Wed Oct 2 11:53:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 2 Oct 2024 11:53:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 09:58:29 GMT, Hamlin Li wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: >> >> - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion >> - riscv port refactor >> - Remove temporary support code >> - Merge jdk-24+17 >> - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization >> - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions >> - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes >> - Merge jdk-24+16 >> - Ensure that detected encode-and-store patterns are matched >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - ... and 43 more: https://git.openjdk.org/jdk/compare/0dc16d16...14483b83 > > src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 169: > >> 167: predicate(UseG1GC && n->as_LoadStore()->barrier_data() != 0); >> 168: match(Set res (CompareAndExchangeP mem (Binary oldval newval))); >> 169: effect(TEMP res, TEMP tmp1, TEMP tmp2); > > should `res` be `TEMP_DEF`? It could, but the effect would be the same (see [JDK-8058880](https://bugs.openjdk.org/browse/JDK-8058880)). I went with `TEMP` for the x64 and aarch64 platforms for consistency with the analogous ZGC ADL code, see e.g. https://github.com/openjdk/jdk/blob/855c8a7def21025bc2fc47594f7285a55924c213/src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad#L182-L204. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784358586 From chagedorn at openjdk.org Wed Oct 2 12:00:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Oct 2024 12:00:44 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v12] In-Reply-To: <0Ce3kXIzVDXuzpKGBzekEEOuJftpvkCvLwDkVOtpPR0=.ec2c1c1b-2b78-497d-97d3-a613ef736d2f@github.com> References: <0Ce3kXIzVDXuzpKGBzekEEOuJftpvkCvLwDkVOtpPR0=.ec2c1c1b-2b78-497d-97d3-a613ef736d2f@github.com> Message-ID: <2F_C30zYunRTYqFh4cphJcHHyosVVyiKjESHiBGjRlE=.b7f5f6a5-9848-4972-8a7d-ccf38c42be7d@github.com> On Tue, 24 Sep 2024 16:53:51 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > left over Nice stress mode! Some small comments but otherwise, looks good to me, too. src/hotspot/share/opto/compile.hpp line 792: > 790: > 791: #ifdef ASSERT > 792: bool phase_verify_ideal_loop() { return _phase_verify_ideal_loop; } can be made `const`: Suggestion: bool phase_verify_ideal_loop() const { return _phase_verify_ideal_loop; } src/hotspot/share/opto/compile.hpp line 838: > 836: const CompilationFailureInfo* first_failure_details() const { return _first_failure_details; } > 837: > 838: bool failing(DEBUG_ONLY(bool no_stress_bailout = false)) { It's somehow difficult to read what `failing(false/true)` now exactly mean. When having `failing(true)`, don't we get the same behavior as if we call `failing_internal()`? If `failing_internal()` is false, then we would only return false because we are not entering if (StressBailout && !no_stress_bailout) { return fail_randomly(); } So, I'm wondering if we cannot just use `failing_internal()` instead of `failing(true)` and remove the parameter completely? src/hotspot/share/opto/compile.hpp line 843: > 841: } > 842: #ifdef ASSERT > 843: // Disable stress code for PhaseIdealLoop verification Can you expand the comment here and add the reason why? From a comment above, you mentioned that it is not easy to make it work. I guess it's fine to just mention that here. ------------- PR Review: https://git.openjdk.org/jdk/pull/19646#pullrequestreview-2342682921 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1784348743 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1784363994 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1784351478 From chagedorn at openjdk.org Wed Oct 2 12:33:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Oct 2024 12:33:38 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 11:21:43 GMT, Roland Westrelin wrote: > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. Looks reasonable. This assert has proven to be quite valuable to find problems in the memory graph that we would otherwise miss. It was also one of the few assert that triggered when having a corrupted graph due to missing Assertion Predicates. I'm wondering if we need more such memory graph checks in general. Anyway, that's just a thought for some future RFE. src/hotspot/share/opto/compile.cpp line 1468: > 1466: ciInstanceKlass *canonical_holder = ik->get_canonical_holder(offset); > 1467: assert(offset < canonical_holder->layout_helper_size_in_bytes(), ""); > 1468: assert(tj->offset() == offset, "not change to offset expected"); Suggestion: assert(tj->offset() == offset, "no change to offset expected"); test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java line 56: > 54: A a; > 55: if (flag) { > 56: a = b; Indentation is off: Suggestion: a = b; ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2342747463 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784414779 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784400108 From thartmann at openjdk.org Wed Oct 2 12:33:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 12:33:39 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 11:21:43 GMT, Roland Westrelin wrote: > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. Looks reasonable to me. I submitted testing and will report back once it passed. > As further clean up, maybe we want to drop the slice argument to GraphKit::make_load() and GraphKit::store_to_memory() (and to their callers) given it's redundant with the address type and error prone. Yes, let's do that. Please file a starter RFE. src/hotspot/share/opto/compile.cpp line 1468: > 1466: ciInstanceKlass *canonical_holder = ik->get_canonical_holder(offset); > 1467: assert(offset < canonical_holder->layout_helper_size_in_bytes(), ""); > 1468: assert(tj->offset() == offset, "not change to offset expected"); Suggestion: assert(tj->offset() == offset, "no change to offset expected"); src/hotspot/share/opto/compile.cpp line 1475: > 1473: assert(tj == TypeInstPtr::make(to->ptr(), canonical_holder, is_known_inst, nullptr, offset, instance_id), "exact type should be canonical type"); > 1474: } else { > 1475: assert(xk || !is_known_inst, "Known instance should be exact type"); Maybe add a comment here and explain the two cases when we create a new type. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2342672361 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784341679 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784422309 From mli at openjdk.org Wed Oct 2 12:57:48 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 2 Oct 2024 12:57:48 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: <7L7jYDlFa0WnVvgiyNHI9KZrcffYwNnBB899AuMS56Q=.40b031e7-07b8-4a15-b319-c53b38a17a49@github.com> References: <7L7jYDlFa0WnVvgiyNHI9KZrcffYwNnBB899AuMS56Q=.40b031e7-07b8-4a15-b319-c53b38a17a49@github.com> Message-ID: On Wed, 2 Oct 2024 11:40:18 GMT, Roberto Casta?eda Lozano wrote: > If `res` was not marked as `dont_preserve`, it would be included in the pre-barrier stub's preserve set (`BarrierStubC2::preserve_set()`) because it is live out of the entire AD instruction (as computed by `BarrierSetC2::compute_liveness_at_stubs()`). Thanks for explanation! I did not realize this, if that's the case, then it's good. >> src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 169: >> >>> 167: predicate(UseG1GC && n->as_LoadStore()->barrier_data() != 0); >>> 168: match(Set res (CompareAndExchangeP mem (Binary oldval newval))); >>> 169: effect(TEMP res, TEMP tmp1, TEMP tmp2); >> >> should `res` be `TEMP_DEF`? > > It could, but the effect would be the same (see [JDK-8058880](https://bugs.openjdk.org/browse/JDK-8058880)). I went with `TEMP` for the x64 and aarch64 platforms for consistency with the analogous ZGC ADL code, see e.g. https://github.com/openjdk/jdk/blob/855c8a7def21025bc2fc47594f7285a55924c213/src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad#L182-L204. I saw the riscv one in z_riscv.ad is: `effect(TEMP oldval_tmp, TEMP newval_tmp, TEMP tmp1, TEMP_DEF res);`, maybe it's good to change riscv one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784479784 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784479526 From roland at openjdk.org Wed Oct 2 13:02:11 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 13:02:11 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v2] In-Reply-To: References: Message-ID: <9prDCh5_yHkuEwmfeUfE_v8AZch3DkQjBkRXMIqy820=.85e297f0-6589-4731-a825-7665d26af08b@github.com> > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/compile.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21303/files - new: https://git.openjdk.org/jdk/pull/21303/files/09f2e987..913a82b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21303/head:pull/21303 PR: https://git.openjdk.org/jdk/pull/21303 From roland at openjdk.org Wed Oct 2 13:11:18 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 13:11:18 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v3] In-Reply-To: References: Message-ID: <_VDcL5UcOCF1JkAVrpqVkOmJkHfxEbp0NxF1kZHTkXU=.23df893b-c0cf-4a0a-8c3a-49375e91e0f0@github.com> > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - review - Merge branch 'master' into JDK-8340214 - Update src/hotspot/share/opto/compile.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java Co-authored-by: Christian Hagedorn - test cleanup - fix & test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21303/files - new: https://git.openjdk.org/jdk/pull/21303/files/913a82b4..46042b26 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=01-02 Stats: 3240 lines in 131 files changed: 2545 ins; 329 del; 366 mod Patch: https://git.openjdk.org/jdk/pull/21303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21303/head:pull/21303 PR: https://git.openjdk.org/jdk/pull/21303 From roland at openjdk.org Wed Oct 2 13:23:36 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 13:23:36 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v3] In-Reply-To: References: Message-ID: <994Q512PXsbEDCXktORppesCubBHFpD0wcI4EnQDQdc=.9b6f8d01-139c-4b1e-b218-c2906a2eccd5@github.com> On Wed, 2 Oct 2024 12:30:30 GMT, Tobias Hartmann wrote: > > As further clean up, maybe we want to drop the slice argument to GraphKit::make_load() and GraphKit::store_to_memory() (and to their callers) given it's redundant with the address type and error prone. > > Yes, let's do that. Please file a starter RFE. https://bugs.openjdk.org/browse/JDK-8341411 > src/hotspot/share/opto/compile.cpp line 1475: > >> 1473: assert(tj == TypeInstPtr::make(to->ptr(), canonical_holder, is_known_inst, nullptr, offset, instance_id), "exact type should be canonical type"); >> 1474: } else { >> 1475: assert(xk || !is_known_inst, "Known instance should be exact type"); > > Maybe add a comment here and explain the two cases when we create a new type. Done in new commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21303#issuecomment-2388631754 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784521453 From thartmann at openjdk.org Wed Oct 2 13:34:38 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 13:34:38 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v3] In-Reply-To: <_VDcL5UcOCF1JkAVrpqVkOmJkHfxEbp0NxF1kZHTkXU=.23df893b-c0cf-4a0a-8c3a-49375e91e0f0@github.com> References: <_VDcL5UcOCF1JkAVrpqVkOmJkHfxEbp0NxF1kZHTkXU=.23df893b-c0cf-4a0a-8c3a-49375e91e0f0@github.com> Message-ID: On Wed, 2 Oct 2024 13:11:18 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8340214 > - Update src/hotspot/share/opto/compile.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn > - test cleanup > - fix & test Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2342966421 From thartmann at openjdk.org Wed Oct 2 13:34:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 13:34:39 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v3] In-Reply-To: <994Q512PXsbEDCXktORppesCubBHFpD0wcI4EnQDQdc=.9b6f8d01-139c-4b1e-b218-c2906a2eccd5@github.com> References: <994Q512PXsbEDCXktORppesCubBHFpD0wcI4EnQDQdc=.9b6f8d01-139c-4b1e-b218-c2906a2eccd5@github.com> Message-ID: On Wed, 2 Oct 2024 13:21:34 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/compile.cpp line 1475: >> >>> 1473: assert(tj == TypeInstPtr::make(to->ptr(), canonical_holder, is_known_inst, nullptr, offset, instance_id), "exact type should be canonical type"); >>> 1474: } else { >>> 1475: assert(xk || !is_known_inst, "Known instance should be exact type"); >> >> Maybe add a comment here and explain the two cases when we create a new type. > > Done in new commit. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784537173 From chagedorn at openjdk.org Wed Oct 2 13:52:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Oct 2024 13:52:38 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v3] In-Reply-To: <_VDcL5UcOCF1JkAVrpqVkOmJkHfxEbp0NxF1kZHTkXU=.23df893b-c0cf-4a0a-8c3a-49375e91e0f0@github.com> References: <_VDcL5UcOCF1JkAVrpqVkOmJkHfxEbp0NxF1kZHTkXU=.23df893b-c0cf-4a0a-8c3a-49375e91e0f0@github.com> Message-ID: On Wed, 2 Oct 2024 13:11:18 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8340214 > - Update src/hotspot/share/opto/compile.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn > - test cleanup > - fix & test Still good, one more minor thing. test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java line 68: > 66: A a; > 67: if (flag) { > 68: a = b; Suggestion: a = b; ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2343020437 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784567122 From roland at openjdk.org Wed Oct 2 13:58:14 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 13:58:14 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21303/files - new: https://git.openjdk.org/jdk/pull/21303/files/46042b26..6cdd2337 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21303/head:pull/21303 PR: https://git.openjdk.org/jdk/pull/21303 From vlivanov at openjdk.org Wed Oct 2 18:34:45 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Oct 2024 18:34:45 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v6] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:20:13 GMT, Jorn Vernee wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>
>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - use resolve_global_jobject on s390 > - Merge branch 'master' into LoadVMTraget > - remove PC save/restore on s390 > - use fatal() > - add RISC-V as target platform > - Adjust ppc & RISC-V code > - Add s390 changes > - Merge branch 'master' into LoadVMTraget > - Don't save/restore LR/CR + resolve_jobject on s390 > - eyeball other platforms > - ... and 14 more: https://git.openjdk.org/jdk/compare/2faf8b8d...b703b162 Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20479#pullrequestreview-2343806028 From jvernee at openjdk.org Wed Oct 2 18:58:44 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 2 Oct 2024 18:58:44 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v6] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:20:13 GMT, Jorn Vernee wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>
>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - use resolve_global_jobject on s390 > - Merge branch 'master' into LoadVMTraget > - remove PC save/restore on s390 > - use fatal() > - add RISC-V as target platform > - Adjust ppc & RISC-V code > - Add s390 changes > - Merge branch 'master' into LoadVMTraget > - Don't save/restore LR/CR + resolve_jobject on s390 > - eyeball other platforms > - ... and 14 more: https://git.openjdk.org/jdk/compare/2faf8b8d...b703b162 Thanks for all the reviews! I will do one more round of testing before integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2389467122 From rcastanedalo at openjdk.org Wed Oct 2 19:43:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 2 Oct 2024 19:43:50 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: <7L7jYDlFa0WnVvgiyNHI9KZrcffYwNnBB899AuMS56Q=.40b031e7-07b8-4a15-b319-c53b38a17a49@github.com> Message-ID: On Wed, 2 Oct 2024 12:55:13 GMT, Hamlin Li wrote: >> It could, but the effect would be the same (see [JDK-8058880](https://bugs.openjdk.org/browse/JDK-8058880)). I went with `TEMP` for the x64 and aarch64 platforms for consistency with the analogous ZGC ADL code, see e.g. https://github.com/openjdk/jdk/blob/855c8a7def21025bc2fc47594f7285a55924c213/src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad#L182-L204. > > I saw the riscv one in z_riscv.ad is: `effect(TEMP oldval_tmp, TEMP newval_tmp, TEMP tmp1, TEMP_DEF res);`, maybe it's good to change riscv one? I suggest to postpone these types of refactorings to follow-up enhancements, given that the pull request in its current form is stable, thoroughly tested, and approved by reviewers. I intend to integrate it within the following 24 hours, provided final test results look good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1785135652 From kxu at openjdk.org Wed Oct 2 19:57:55 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 2 Oct 2024 19:57:55 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v20] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: fix typos Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/6cad8c19..4e2735ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=18-19 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Wed Oct 2 19:57:56 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 2 Oct 2024 19:57:56 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v19] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Wed, 2 Oct 2024 11:18:52 GMT, Tobias Hartmann wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments in TestParallelIvInIntCountedLoop.java > > test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 315: > >> 313: } >> 314: >> 315: return a; > > Shouldn't there also be tests for the `int a` `long i` variant? `long i` will be it a long-counted loop, which hs doesn't perform parallel iv at this time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1785147608 From duke at openjdk.org Wed Oct 2 22:49:43 2024 From: duke at openjdk.org (duke) Date: Wed, 2 Oct 2024 22:49:43 GMT Subject: Withdrawn: 8321008: RISC-V: C2 MulAddVS2VI In-Reply-To: References: Message-ID: <8Xm4kpGgp2U2NFhSdCCHJ_u2UrP-2lLtYxkScRL4x9w=.144122d3-89a6-484e-9bf1-74909cc00712@github.com> On Tue, 23 Apr 2024 15:02:10 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > > The motivation is to implement `MulAddVS2VI`. > But to enable `MulAddVS2VI`, `MulAddS2I` is prerequisite, although `MulAddS2I` does not bring extra benefit on riscv as we don't have an specific instruction of muladd on riscv. > So, this patch implement both `MulAddVS2VI` and `MulAddS2I`. > > > Thanks > > ## Performance > ### Summary > #### MulAddS2I > When +UseSuperWord > * There is performance gain in MulAddS2I.testa/b/c. > * There is performance regression in in MulAddS2I.testd-testi. > > When -UseSuperWord > * There is performance regression in all tests. > > #### VectorReduction > There is no performance regression in VectorReduction > > ### when +UseSuperWord > data > > Benchmark on bananapi, +UseSuperWord | (COUNT) | (COUNT_DOUBLE) | (COUNT_FLOAT) | (ITER) | (RANGE) | (seed) | Mode | Cnt | Score +intrinsic | Error | Units | Score -intrinsic | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > MulAddS2I.testa | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 65863.434 | 12082.469 | ns/op | 92576.189 | 1.406 > MulAddS2I.testb | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 74741.045 | 14608.942 | ns/op | 104428.457 | 1.397 > MulAddS2I.testc | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 42013.168 | 6029.504 | ns/op | 69380.849 | 1.651 > MulAddS2I.testd | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 99644.082 | 3078.374 | ns/op | 84316.883 | 0.846 > MulAddS2I.teste | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 98910.181 | 3170.046 | ns/op | 86023.681 | 0.87 > MulAddS2I.testf | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 101752.531 | 10994.494 | ns/op | 85473.52 | 0.84 > MulAddS2I.testg | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 99513.05 | 2919.032 | ns/op | 86680.144 | 0.871 > MulAddS2I.testh | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 100753.291 | 3449.613 | ns/op | 84424.63 | 0.838 > MulAddS2I.testi | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 100626.168 | 2924.72 | ns/op | 85477.079 | 0.849 > MulAddS2I.testj | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 100990.584 | 3756.096 | ns/op | 87010.947 | 0.862 > MulAddS2I.testk | N/A | N/A | N/A | 8191... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18919 From jbhateja at openjdk.org Thu Oct 3 05:09:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 05:09:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v15] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Review comments resolution. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338023 - Review comments resolutions. - Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. - Incorporating review and documentation suggestions. - Jcheck clearance - Review comments resolution. - Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. - Documentation suggestions from Paul. - Review resolutions. - ... and 8 more: https://git.openjdk.org/jdk/compare/bdfb41f9...6215ab91 ------------- Changes: https://git.openjdk.org/jdk/pull/20508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=14 Stats: 2804 lines in 89 files changed: 2785 ins; 18 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Thu Oct 3 05:09:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 05:09:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v14] In-Reply-To: <5wF2qLX9Z_tquvURMW0HVnrmMla1awxtz6C0UYI0lh4=.94df7340-1a27-4d93-80fa-d4c561641a97@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <5wF2qLX9Z_tquvURMW0HVnrmMla1awxtz6C0UYI0lh4=.94df7340-1a27-4d93-80fa-d4c561641a97@github.com> Message-ID: On Tue, 1 Oct 2024 18:10:10 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2797: > >> 2795: >> 2796: Node* operation = lowerSelectFromOp ? >> 2797: LowerSelectFromTwoVectorOperation(gvn(), opd1, opd2, opd3, vt) : > > Thanks for bringing the lowering right here. It opens up an optimization opportunity: currently for float/double we have two casts for index (e.g. from float -> int at line 2786 and from int -> byte at line 2661 as part of LowerSelectFromTwoVectorOperation. Could this be done by one cast? This is not sub-optimal, Float to sub-word cast is two step process where we first convert float value to integer following by integer down casting to sub-word. So resulting JIT code will still be same if we directly emit F2X or the way its handled currently. All existing targets support F2X take this route. But its good to be safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1785634731 From jbhateja at openjdk.org Thu Oct 3 05:09:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 05:09:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 1 Oct 2024 18:03:06 GMT, Sandhya Viswanathan wrote: >>> This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); >> >> Yes, this may save additional allocation penalty of result array allocation which may slightly improve fall back performance, but logical operation cannot be directly applied over floating point vectors. so, we will need an explicit conversion to integral vector, which is why I opted for current fallback implementation which is in line with rest of the code. > > I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. You will also need additional handling for NPOT vector sizes which is handled by existing fallback implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1785634658 From mli at openjdk.org Thu Oct 3 06:50:48 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Oct 2024 06:50:48 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: <7L7jYDlFa0WnVvgiyNHI9KZrcffYwNnBB899AuMS56Q=.40b031e7-07b8-4a15-b319-c53b38a17a49@github.com> Message-ID: <4S2raWNwXSaEN1p2bAXEUKlHdqSY9AqrR7cBZDhs2QI=.e6ecddb3-be2b-4bda-88ac-8cd9fcb1301b@github.com> On Wed, 2 Oct 2024 19:41:26 GMT, Roberto Casta?eda Lozano wrote: >> I saw the riscv one in z_riscv.ad is: `effect(TEMP oldval_tmp, TEMP newval_tmp, TEMP tmp1, TEMP_DEF res);`, maybe it's good to change riscv one? > > I suggest to postpone these types of refactorings to follow-up enhancements, given that the pull request in its current form is stable, thoroughly tested, and approved by reviewers. I intend to integrate it within the following 24 hours, provided final test results look good. Sounds good too. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1785711504 From chagedorn at openjdk.org Thu Oct 3 06:58:40 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Oct 2024 06:58:40 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 13:58:14 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2344788165 From aboldtch at openjdk.org Thu Oct 3 07:16:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 3 Oct 2024 07:16:04 GMT Subject: RFR: 8341451: Remove C2HandleAnonOMOwnerStub Message-ID: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> [JDK-8319796](https://bugs.openjdk.org/browse/JDK-8319796) has been implemented on all platforms which had previous C2 implementations of LM_LIGHTWEIGHT that made use of C2HandleAnonOMOwnerStub. The declaration is still left in the shared code, an a couple of platforms have the definitions still lingering. I propose we remove them. ------------- Commit messages: - 8341451: Remove C2HandleAnonOMOwnerStub Changes: https://git.openjdk.org/jdk/pull/21319/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21319&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341451 Stats: 70 lines in 3 files changed: 0 ins; 70 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21319/head:pull/21319 PR: https://git.openjdk.org/jdk/pull/21319 From fyang at openjdk.org Thu Oct 3 08:09:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Oct 2024 08:09:35 GMT Subject: RFR: 8341451: Remove C2HandleAnonOMOwnerStub In-Reply-To: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> References: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> Message-ID: On Thu, 3 Oct 2024 07:09:53 GMT, Axel Boldt-Christmas wrote: > [JDK-8319796](https://bugs.openjdk.org/browse/JDK-8319796) has been implemented on all platforms which had previous C2 implementations of LM_LIGHTWEIGHT that made use of C2HandleAnonOMOwnerStub. > > The declaration is still left in the shared code, an a couple of platforms have the definitions still lingering. I propose we remove them. Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21319#pullrequestreview-2344924640 From chagedorn at openjdk.org Thu Oct 3 08:33:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Oct 2024 08:33:41 GMT Subject: RFR: 8341451: Remove C2HandleAnonOMOwnerStub In-Reply-To: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> References: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> Message-ID: On Thu, 3 Oct 2024 07:09:53 GMT, Axel Boldt-Christmas wrote: > [JDK-8319796](https://bugs.openjdk.org/browse/JDK-8319796) has been implemented on all platforms which had previous C2 implementations of LM_LIGHTWEIGHT that made use of C2HandleAnonOMOwnerStub. > > The declaration is still left in the shared code, an a couple of platforms have the definitions still lingering. I propose we remove them. Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21319#pullrequestreview-2344975438 From rcastanedalo at openjdk.org Thu Oct 3 08:35:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 3 Oct 2024 08:35:54 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/0cf6df31...14483b83 Thanks to everyone who contributed to this JEP, integrating now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2390833194 From rcastanedalo at openjdk.org Thu Oct 3 08:39:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 3 Oct 2024 08:39:57 GMT Subject: Integrated: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 09:49:25 GMT, Roberto Casta?eda Lozano wrote: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... This pull request has now been integrated. Changeset: 0b467e90 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/0b467e902d591ae9feeec1669918d1588987cd1c Stats: 7372 lines in 58 files changed: 5924 ins; 985 del; 463 mod 8334060: Implementation of Late Barrier Expansion for G1 Co-authored-by: Roberto Casta?eda Lozano Co-authored-by: Erik ?sterlund Co-authored-by: Siyao Liu Co-authored-by: Kim Barrett Co-authored-by: Amit Kumar Co-authored-by: Martin Doerr Co-authored-by: Feilong Jiang Co-authored-by: Sergey Nazarkin Reviewed-by: kvn, tschatzl, fyang, ayang, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/19746 From jvernee at openjdk.org Thu Oct 3 12:05:46 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 3 Oct 2024 12:05:46 GMT Subject: Integrated: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: <2THc5A3PP0cegVF4ySYMLsgc4FO2ieqBgOEI02XgxOk=.0f92be1b-ddbc-4486-ac22-2c303f442ba2@github.com> On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee wrote: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 This pull request has now been integrated. Changeset: 6af13580 Author: Jorn Vernee URL: https://git.openjdk.org/jdk/commit/6af13580c2086afefde489275bc2353c2320ff3f Stats: 333 lines in 23 files changed: 255 ins; 26 del; 52 mod 8337753: Target class of upcall stub may be unloaded Reviewed-by: amitkumar, vlivanov, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/20479 From kbarrett at openjdk.org Thu Oct 3 12:56:48 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 3 Oct 2024 12:56:48 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB Message-ID: Please review this change to TypeRawPtr::add_offset to prevent a compiler from inferring things based on prior pointer arithmetic not invoking UB. As noted in the bug report, clang is actually doing this. To accomplish this, changed to integral arithmetic. Also added over/underflow checks. Also made a couple of minor touchups. Replaced an implicit conversion to bool with an explicit compare to nullptr (per style guide). Removed a no longer needed dummy return after a (now) noreturn function. Testing: mach5 tier1-7 That testing was with calls to "fatal" for the over/underflow cases and the sum==0 case. There were no hits. I'm not sure how to construct a test that would hit those. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/21324/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21324&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341178 Stats: 14 lines in 1 file changed: 9 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21324.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21324/head:pull/21324 PR: https://git.openjdk.org/jdk/pull/21324 From kxu at openjdk.org Thu Oct 3 16:31:15 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 3 Oct 2024 16:31:15 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v21] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: correctly verify outputs with custom @Run methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/4e2735ae..32bedd00 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=19-20 Stats: 201 lines in 1 file changed: 122 ins; 60 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Thu Oct 3 16:47:44 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 3 Oct 2024 16:47:44 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v19] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Wed, 2 Oct 2024 11:27:08 GMT, Tobias Hartmann wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments in TestParallelIvInIntCountedLoop.java > > test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 325: > >> 323: >> 324: for (int i : iterations) { >> 325: Asserts.assertEQ(i, testIntCountedLoopWithIntIV(i)); > > Code in this loop is not guaranteed to be even C2 compiled because IR verification will be executed in a separate VM. IR framework tests that also want to verify the output, should be written like this: > > https://github.com/openjdk/jdk/blob/9bd478593cc92a716151d1373f3426f1d92143bb/test/hotspot/jtreg/testlibrary_tests/ir_framework/examples/CustomRunTestExample.java#L84-L97 Updated to use custom run methods instead. Thanks for the info! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1786536420 From kvn at openjdk.org Thu Oct 3 17:12:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 3 Oct 2024 17:12:44 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB In-Reply-To: References: Message-ID: <3iWhHL3A0KJIVBfPcMP--NRbbN00pEtFqWrURscE84E=.68f0d6be-58bc-4c12-af83-f7f9f34601ed@github.com> On Thu, 3 Oct 2024 12:50:55 GMT, Kim Barrett wrote: > Please review this change to TypeRawPtr::add_offset to prevent a compiler from > inferring things based on prior pointer arithmetic not invoking UB. As noted in > the bug report, clang is actually doing this. > > To accomplish this, changed to integral arithmetic. Also added over/underflow > checks. > > Also made a couple of minor touchups. Replaced an implicit conversion to bool > with an explicit compare to nullptr (per style guide). Removed a no longer > needed dummy return after a (now) noreturn function. > > Testing: mach5 tier1-7 > That testing was with calls to "fatal" for the over/underflow cases and the > sum==0 case. There were no hits. I'm not sure how to construct a test that > would hit those. Looks reasonable. Just one nit comment. src/hotspot/share/opto/type.cpp line 3136: > 3134: > 3135: const TypeRawPtr *TypeRawPtr::make( address bits ) { > 3136: assert( bits != nullptr, "Use TypePtr for null" ); Please, remove spaces after open and before close `()`. ------------- PR Review: https://git.openjdk.org/jdk/pull/21324#pullrequestreview-2346113508 PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1786529286 From shade at openjdk.org Thu Oct 3 17:15:03 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 3 Oct 2024 17:15:03 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/9bb3ef4e...14483b83 src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 335: > 333: assert(!use_ReduceInitialCardMarks(), > 334: "post-barriers are only needed for tightly-coupled initialization stores when ReduceInitialCardMarks is disabled"); > 335: access.set_barrier_data(access.barrier_data() ^ G1C2BarrierPre); I have been looking at this code after integration, and I wonder if `^` is really correct here? Was the intend to remove `G1C2BarrierPre` from the barrier data? What happens if `get_store_barrier` does not actually set it? Do we flip the bit back? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1786573527 From sviswanathan at openjdk.org Thu Oct 3 17:30:45 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 17:30:45 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v19] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 05:09:25 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Merge stashing and re-commit src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 140: > 138: * @param b the second operand. > 139: * @return the saturating addition of the operands. > 140: * @see VectorOperators#SADD This should be SUADD. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 167: > 165: * @param b the second operand. > 166: * @return the saturating difference of the operands. > 167: * @see VectorOperators#SSUB This should be SUSUB. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1786595393 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1786595850 From sviswanathan at openjdk.org Thu Oct 3 17:53:42 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 17:53:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 3 Oct 2024 05:04:35 GMT, Jatin Bhateja wrote: >> I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. > > You will also need additional handling for NPOT vector sizes which is handled by existing fallback implementation. The intrinsic is limited to power of two. We can safely do src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2) for integral types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1786637638 From sviswanathan at openjdk.org Thu Oct 3 18:18:44 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 18:18:44 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v15] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 3 Oct 2024 05:09:22 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338023 > - Review comments resolutions. > - Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. > - Incorporating review and documentation suggestions. > - Jcheck clearance > - Review comments resolution. > - Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > - Documentation suggestions from Paul. > - Review resolutions. > - ... and 8 more: https://git.openjdk.org/jdk/compare/bdfb41f9...6215ab91 Thanks for making the changes. It looks to me that the following checks at lines 2963-2071 in file vectorIntrinsics.cpp is now only needed when lowerSelectFromOp is false. Could you please verify and update accordingly? if (is_floating_point_type(elem_bt)) { if (!arch_supports_vector(Op_AndV, num_elem, index_elem_bt, VecMaskNotUsed) || !arch_supports_vector(cast_vopc, num_elem, index_elem_bt, VecMaskNotUsed) || !arch_supports_vector(Op_Replicate, num_elem, index_elem_bt, VecMaskNotUsed)) { log_if_needed(" ** index wrapping not supported: vlen=%d etype=%s" , num_elem, type2name(elem_bt)); return false; // not supported } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2392036048 From sviswanathan at openjdk.org Thu Oct 3 18:41:40 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 18:41:40 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <95BWoQiYfM-c7esOvzluxwrXbh_sQD9MAUm9-5JhULc=.c3f1f31e-5b13-4698-9481-e02a763b1ce6@github.com> On Thu, 3 Oct 2024 05:04:35 GMT, Jatin Bhateja wrote: >> I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. > > You will also need additional handling for NPOT vector sizes which is handled by existing fallback implementation. Agree, so we can't assume power of two in fallback. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1786691519 From jbhateja at openjdk.org Thu Oct 3 19:05:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 19:05:14 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v16] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Sharpening intrinsic exit check. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/6215ab91..1cca8e24 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=14-15 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Thu Oct 3 19:13:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 19:13:22 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v20] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Typographic error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/952920ae..f5b5e6f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=18-19 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From psandoz at openjdk.org Thu Oct 3 19:21:43 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 3 Oct 2024 19:21:43 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v20] In-Reply-To: References: Message-ID: On Thu, 3 Oct 2024 19:13:22 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Typographic error src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 46: > 44: * @return the smaller of {@code a} and {@code b}. > 45: * @see VectorOperators#UMIN > 46: * @since 24 Remove `@since 24` in the documentation of each method and place in the documentation on the class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1786732581 From jbhateja at openjdk.org Thu Oct 3 19:55:03 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 19:55:03 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v21] In-Reply-To: References: Message-ID: <59ZQPsSgxrGE2E4vGKs0PvO7KJIJdAhKCkZb8OPv4qI=.7762bee0-fcb0-4ab5-ae29-1069d7d64ca4@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Doc fixups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/f5b5e6f5..3beac2db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=19-20 Stats: 26 lines in 1 file changed: 2 ins; 24 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From psandoz at openjdk.org Thu Oct 3 19:55:04 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 3 Oct 2024 19:55:04 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v21] In-Reply-To: <59ZQPsSgxrGE2E4vGKs0PvO7KJIJdAhKCkZb8OPv4qI=.7762bee0-fcb0-4ab5-ae29-1069d7d64ca4@github.com> References: <59ZQPsSgxrGE2E4vGKs0PvO7KJIJdAhKCkZb8OPv4qI=.7762bee0-fcb0-4ab5-ae29-1069d7d64ca4@github.com> Message-ID: On Thu, 3 Oct 2024 19:51:31 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Doc fixups src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 30: > 28: * The class {@code VectorMath} contains methods for performing > 29: * scalar numeric operations in support of vector numeric operations. > 30: * @author Paul Sandoz We no longer use the `@author` tag on newly added classes, can you please remove it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1786769928 From jbhateja at openjdk.org Thu Oct 3 19:55:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 19:55:04 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v20] In-Reply-To: References: Message-ID: On Thu, 3 Oct 2024 19:18:38 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Typographic error > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 46: > >> 44: * @return the smaller of {@code a} and {@code b}. >> 45: * @see VectorOperators#UMIN >> 46: * @since 24 > > Remove `@since 24` in the documentation of each method and place in the documentation on the class. DONE ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1786767732 From sviswanathan at openjdk.org Thu Oct 3 21:07:40 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 21:07:40 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v16] In-Reply-To: <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> Message-ID: On Thu, 3 Oct 2024 19:05:14 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Sharpening intrinsic exit check. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2346694947 From jbhateja at openjdk.org Fri Oct 4 00:01:59 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 4 Oct 2024 00:01:59 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v22] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update VectorMath.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/3beac2db..550eeb9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=20-21 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From kbarrett at openjdk.org Fri Oct 4 04:56:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 4 Oct 2024 04:56:35 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB In-Reply-To: <3iWhHL3A0KJIVBfPcMP--NRbbN00pEtFqWrURscE84E=.68f0d6be-58bc-4c12-af83-f7f9f34601ed@github.com> References: <3iWhHL3A0KJIVBfPcMP--NRbbN00pEtFqWrURscE84E=.68f0d6be-58bc-4c12-af83-f7f9f34601ed@github.com> Message-ID: On Thu, 3 Oct 2024 16:38:34 GMT, Vladimir Kozlov wrote: >> Please review this change to TypeRawPtr::add_offset to prevent a compiler from >> inferring things based on prior pointer arithmetic not invoking UB. As noted in >> the bug report, clang is actually doing this. >> >> To accomplish this, changed to integral arithmetic. Also added over/underflow >> checks. >> >> Also made a couple of minor touchups. Replaced an implicit conversion to bool >> with an explicit compare to nullptr (per style guide). Removed a no longer >> needed dummy return after a (now) noreturn function. >> >> Testing: mach5 tier1-7 >> That testing was with calls to "fatal" for the over/underflow cases and the >> sum==0 case. There were no hits. I'm not sure how to construct a test that >> would hit those. > > src/hotspot/share/opto/type.cpp line 3136: > >> 3134: >> 3135: const TypeRawPtr *TypeRawPtr::make( address bits ) { >> 3136: assert( bits != nullptr, "Use TypePtr for null" ); > > Please, remove spaces after open and before close `()`. I'm not fond of those spaces, but they follow the style used throughout this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1787152954 From duke at openjdk.org Fri Oct 4 06:30:11 2024 From: duke at openjdk.org (Daniel Skantz) Date: Fri, 4 Oct 2024 06:30:11 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v13] In-Reply-To: References: Message-ID: > This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). > > We check two invariants. > a) Bailouts should be successful starting from any given `failing()` check. > b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). > > a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. > > The added flag should not have any effect in debug mode. > > Testing: > > T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: use failing_internal instead; add a const; clarify skip ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19646/files - new: https://git.openjdk.org/jdk/pull/19646/files/d91bc068..cb748fb8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=11-12 Stats: 19 lines in 8 files changed: 1 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/19646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19646/head:pull/19646 PR: https://git.openjdk.org/jdk/pull/19646 From aboldtch at openjdk.org Fri Oct 4 06:58:39 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 4 Oct 2024 06:58:39 GMT Subject: Integrated: 8341451: Remove C2HandleAnonOMOwnerStub In-Reply-To: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> References: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> Message-ID: <-0xVtaAqP_jhjHJ9G7Jgxm59BXbu6X4t0Z2b0JO94us=.b5b55e04-7046-42c6-ab5c-367aa70e0492@github.com> On Thu, 3 Oct 2024 07:09:53 GMT, Axel Boldt-Christmas wrote: > [JDK-8319796](https://bugs.openjdk.org/browse/JDK-8319796) has been implemented on all platforms which had previous C2 implementations of LM_LIGHTWEIGHT that made use of C2HandleAnonOMOwnerStub. > > The declaration is still left in the shared code, an a couple of platforms have the definitions still lingering. I propose we remove them. This pull request has now been integrated. Changeset: 3f420fac Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/3f420fac842153372e17222e7153cbc71c5789a7 Stats: 70 lines in 3 files changed: 0 ins; 70 del; 0 mod 8341451: Remove C2HandleAnonOMOwnerStub Reviewed-by: fyang, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/21319 From aboldtch at openjdk.org Fri Oct 4 06:58:38 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 4 Oct 2024 06:58:38 GMT Subject: RFR: 8341451: Remove C2HandleAnonOMOwnerStub In-Reply-To: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> References: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> Message-ID: On Thu, 3 Oct 2024 07:09:53 GMT, Axel Boldt-Christmas wrote: > [JDK-8319796](https://bugs.openjdk.org/browse/JDK-8319796) has been implemented on all platforms which had previous C2 implementations of LM_LIGHTWEIGHT that made use of C2HandleAnonOMOwnerStub. > > The declaration is still left in the shared code, an a couple of platforms have the definitions still lingering. I propose we remove them. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21319#issuecomment-2392957973 From rrich at openjdk.org Fri Oct 4 08:28:41 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 4 Oct 2024 08:28:41 GMT Subject: RFR: 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 13:37:18 GMT, Richard Reingruber wrote: > With `-XX:+PrintInterpreter` the instructions of the generated interpreter are printed at startup. > > But also after startup when printing an interpreted frame the instructions of the corresponding `InterpreterCodelet` are also printed. This can become a problem with `-Xlog:continuations=trace` because large sections of the interpreter are printed repeatedly and redundantly. > > With this change the instructions are only printed when printing all codelets, i.e. normally once at start-up. > > I've tested with the reproducer from the JBS-Issue. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21158#issuecomment-2393129872 From rrich at openjdk.org Fri Oct 4 08:28:42 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 4 Oct 2024 08:28:42 GMT Subject: Integrated: 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets In-Reply-To: References: Message-ID: <8fDEjlAycetKYDoWvOI9_2IeeX4xVH_DGDmZWDLmMCM=.089fa2b5-94c2-4bf1-8318-20cd7a86a6a9@github.com> On Tue, 24 Sep 2024 13:37:18 GMT, Richard Reingruber wrote: > With `-XX:+PrintInterpreter` the instructions of the generated interpreter are printed at startup. > > But also after startup when printing an interpreted frame the instructions of the corresponding `InterpreterCodelet` are also printed. This can become a problem with `-Xlog:continuations=trace` because large sections of the interpreter are printed repeatedly and redundantly. > > With this change the instructions are only printed when printing all codelets, i.e. normally once at start-up. > > I've tested with the reproducer from the JBS-Issue. This pull request has now been integrated. Changeset: a63ac5a6 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/a63ac5a699a5d40c76d14f94a502b8003753f4dd Stats: 10 lines in 3 files changed: 7 ins; 0 del; 3 mod 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets Reviewed-by: mdoerr, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/21158 From mli at openjdk.org Fri Oct 4 08:45:43 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Oct 2024 08:45:43 GMT Subject: RFR: 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp Message-ID: Hi, Can you help to review this simple patch to add add t3-t6? I also modified some particularly obvious places which use x28-x31 but indeed intending t3-t6, but keep others as it is, because it's more consistent with context code. Thanks! ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/21349/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21349&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340880 Stats: 14 lines in 2 files changed: 4 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21349/head:pull/21349 PR: https://git.openjdk.org/jdk/pull/21349 From rcastanedalo at openjdk.org Fri Oct 4 09:20:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Oct 2024 09:20:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Thu, 3 Oct 2024 17:12:04 GMT, Aleksey Shipilev wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: >> >> - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion >> - riscv port refactor >> - Remove temporary support code >> - Merge jdk-24+17 >> - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization >> - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions >> - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes >> - Merge jdk-24+16 >> - Ensure that detected encode-and-store patterns are matched >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - ... and 43 more: https://git.openjdk.org/jdk/compare/0165cb32...14483b83 > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 335: > >> 333: assert(!use_ReduceInitialCardMarks(), >> 334: "post-barriers are only needed for tightly-coupled initialization stores when ReduceInitialCardMarks is disabled"); >> 335: access.set_barrier_data(access.barrier_data() ^ G1C2BarrierPre); > > I have been looking at this code after integration, and I wonder if `^` is really correct here? Was the intend to remove `G1C2BarrierPre` from the barrier data? What happens if `get_store_barrier` does not actually set it? Do we flip the bit back? Yes, the intend (and actual effect) is to remove `G1C2BarrierPre` from the barrier data. Using an XOR (`^`) is correct because at that program point `G1C2BarrierPre` is guaranteed to be set. This is because an `access` corresponding to a tightly-coupled initialization store is always of type `C2OptAccess`, hence `!access.is_parse_access()` and `get_store_barrier(access)` trivially returns `G1C2BarrierPre | G1C2BarrierPost`. Having said this, it would be clearly less convoluted to simply clear `G1C2BarrierPre` instead of flipping it. I will file a RFE, thanks. As a side note, this complexity is necessary to handle `!ReduceInitialCardMarks`. I keep wondering if the benefit of being able to disable `ReduceInitialCardMarks` [1,2,3] is worth the significant complexity required in the GC-C2 interface to deal with it. [1] https://docs.oracle.com/en/java/javase/23/gctuning/garbage-first-garbage-collector-tuning.html [2] https://bugs.openjdk.org/browse/JDK-8166899 [3] https://bugs.openjdk.org/browse/JDK-8167077 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1787425169 From kbarrett at openjdk.org Fri Oct 4 09:17:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 4 Oct 2024 09:17:35 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB In-Reply-To: References: <3iWhHL3A0KJIVBfPcMP--NRbbN00pEtFqWrURscE84E=.68f0d6be-58bc-4c12-af83-f7f9f34601ed@github.com> Message-ID: <_pWIfp7Z686EEpIHxA1w1RCNHCO-_QP1_ZZbk5BPijQ=.8d026cef-6f44-46a4-94eb-510e281f8f9e@github.com> On Fri, 4 Oct 2024 04:53:47 GMT, Kim Barrett wrote: >> src/hotspot/share/opto/type.cpp line 3136: >> >>> 3134: >>> 3135: const TypeRawPtr *TypeRawPtr::make( address bits ) { >>> 3136: assert( bits != nullptr, "Use TypePtr for null" ); >> >> Please, remove spaces after open and before close `()`. > > I'm not fond of those spaces, but they follow the style used throughout this file. Although it looks like only 1/3 of the asserts in this file have extra whitespace, including the one being touched here. So sure, I can remove the extraneous whitespace from this function, since touching it anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1787421941 From kbarrett at openjdk.org Fri Oct 4 09:27:52 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 4 Oct 2024 09:27:52 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: Message-ID: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> > Please review this change to TypeRawPtr::add_offset to prevent a compiler from > inferring things based on prior pointer arithmetic not invoking UB. As noted in > the bug report, clang is actually doing this. > > To accomplish this, changed to integral arithmetic. Also added over/underflow > checks. > > Also made a couple of minor touchups. Replaced an implicit conversion to bool > with an explicit compare to nullptr (per style guide). Removed a no longer > needed dummy return after a (now) noreturn function. > > Testing: mach5 tier1-7 > That testing was with calls to "fatal" for the over/underflow cases and the > sum==0 case. There were no hits. I'm not sure how to construct a test that > would hit those. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: remove surrounding whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21324/files - new: https://git.openjdk.org/jdk/pull/21324/files/48833715..cc1f2ac8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21324&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21324&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21324.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21324/head:pull/21324 PR: https://git.openjdk.org/jdk/pull/21324 From rcastanedalo at openjdk.org Fri Oct 4 09:37:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Oct 2024 09:37:53 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 09:17:47 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 335: >> >>> 333: assert(!use_ReduceInitialCardMarks(), >>> 334: "post-barriers are only needed for tightly-coupled initialization stores when ReduceInitialCardMarks is disabled"); >>> 335: access.set_barrier_data(access.barrier_data() ^ G1C2BarrierPre); >> >> I have been looking at this code after integration, and I wonder if `^` is really correct here? Was the intend to remove `G1C2BarrierPre` from the barrier data? What happens if `get_store_barrier` does not actually set it? Do we flip the bit back? > > Yes, the intend (and actual effect) is to remove `G1C2BarrierPre` from the barrier data. Using an XOR (`^`) is correct because at that program point `G1C2BarrierPre` is guaranteed to be set. This is because an `access` corresponding to a tightly-coupled initialization store is always of type `C2OptAccess`, hence `!access.is_parse_access()` and `get_store_barrier(access)` trivially returns `G1C2BarrierPre | G1C2BarrierPost`. Having said this, it would be clearly less convoluted to simply clear `G1C2BarrierPre` instead of flipping it. I will file a RFE, thanks. > > As a side note, this complexity is necessary to handle `!ReduceInitialCardMarks`. I keep wondering if the benefit of being able to disable `ReduceInitialCardMarks` [1,2,3] is worth the significant complexity required in the GC-C2 interface to deal with it. > > [1] https://docs.oracle.com/en/java/javase/23/gctuning/garbage-first-garbage-collector-tuning.html > [2] https://bugs.openjdk.org/browse/JDK-8166899 > [3] https://bugs.openjdk.org/browse/JDK-8167077 Reported here: [JDK-8341525](https://bugs.openjdk.org/browse/JDK-8341525). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1787448241 From duke at openjdk.org Fri Oct 4 14:25:45 2024 From: duke at openjdk.org (Daniel Skantz) Date: Fri, 4 Oct 2024 14:25:45 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v12] In-Reply-To: <2F_C30zYunRTYqFh4cphJcHHyosVVyiKjESHiBGjRlE=.b7f5f6a5-9848-4972-8a7d-ccf38c42be7d@github.com> References: <0Ce3kXIzVDXuzpKGBzekEEOuJftpvkCvLwDkVOtpPR0=.ec2c1c1b-2b78-497d-97d3-a613ef736d2f@github.com> <2F_C30zYunRTYqFh4cphJcHHyosVVyiKjESHiBGjRlE=.b7f5f6a5-9848-4972-8a7d-ccf38c42be7d@github.com> Message-ID: On Wed, 2 Oct 2024 11:55:42 GMT, Christian Hagedorn wrote: >> Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: >> >> left over > > src/hotspot/share/opto/compile.hpp line 838: > >> 836: const CompilationFailureInfo* first_failure_details() const { return _first_failure_details; } >> 837: >> 838: bool failing(DEBUG_ONLY(bool no_stress_bailout = false)) { > > It's somehow difficult to read what `failing(false/true)` now exactly mean. When having `failing(true)`, don't we get the same behavior as if we call `failing_internal()`? If `failing_internal()` is false, then we would only return false because we are not entering > > if (StressBailout && !no_stress_bailout) { > return fail_randomly(); > } > > So, I'm wondering if we cannot just use `failing_internal()` instead of `failing(true)` and remove the parameter completely? Thanks for the suggestions! Updated the PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1787800936 From duke at openjdk.org Fri Oct 4 15:04:49 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 15:04:49 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v2] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: <34e0yknYMI9CrosG-mI80E2BB6BkyIYF_uXhwwZtHbQ=.3ae4c4d9-47f6-4e1d-8fe8-cc9dc30b6baf@github.com> > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Libgraal does not allow _can_call_java. - rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava - added CompilerThreadCanCallJavaScope ------------- Changes: https://git.openjdk.org/jdk/pull/21285/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=01 Stats: 132 lines in 6 files changed: 116 ins; 4 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From duke at openjdk.org Fri Oct 4 15:18:38 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 15:18:38 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v2] In-Reply-To: <34e0yknYMI9CrosG-mI80E2BB6BkyIYF_uXhwwZtHbQ=.3ae4c4d9-47f6-4e1d-8fe8-cc9dc30b6baf@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <34e0yknYMI9CrosG-mI80E2BB6BkyIYF_uXhwwZtHbQ=.3ae4c4d9-47f6-4e1d-8fe8-cc9dc30b6baf@github.com> Message-ID: <6fWXm3zv1NNYxvEd6zlefj1CH7U9gVxatL2i18wM8jA=.3dc9115e-32bd-4903-83e2-4e253fb61062@github.com> On Fri, 4 Oct 2024 15:04:49 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Libgraal does not allow _can_call_java. > - rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava > - added CompilerThreadCanCallJavaScope I have simplified the `_can_call_java` transitions. The only feature in the libjvmci compiler that requires Java calls is Truffle compiler, which utilizes JNI to invoke the Truffle runtime methods. Given that we now have `CompilerThreadCanCallJavaScope`, which Truffle can use to explicitly enable Java calls, we can safely disable Java calls by default for the libjvmci compiler. For the Java JVMCI compiler, we still need to permit Java calls to accommodate upcalls to the Graal compiler and for InterpreterRuntime while running the Java JVMCI compiler. The simplification eliminates the need for `TriBool` for `_can_call_java`; it can remain a `bool`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21285#issuecomment-2393939688 From duke at openjdk.org Fri Oct 4 15:25:13 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 15:25:13 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: UseJVMCINativeLibrary check must be guarded by JVMCI_ONLY. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21285/files - new: https://git.openjdk.org/jdk/pull/21285/files/dfd72497..f687c82e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From dnsimon at openjdk.org Fri Oct 4 15:34:36 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 4 Oct 2024 15:34:36 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> Message-ID: On Fri, 4 Oct 2024 15:25:13 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > UseJVMCINativeLibrary check must be guarded by JVMCI_ONLY. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 193: > 191: __block_can_call_java = CompilerThread::cast(thread)->can_call_java(); \ > 192: } else { \ > 193: __block_can_call_java = false; \ For non-CompilerThreads, `__block_can_call_java` should be true I think since they are not affected by `-Xcomp` or `-Xbatch`. A TruffleCompileThread is such a thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21285#discussion_r1787892422 From duke at openjdk.org Fri Oct 4 16:02:38 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 16:02:38 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> Message-ID: On Fri, 4 Oct 2024 15:31:52 GMT, Doug Simon wrote: >> Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: >> >> UseJVMCINativeLibrary check must be guarded by JVMCI_ONLY. > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 193: > >> 191: __block_can_call_java = CompilerThread::cast(thread)->can_call_java(); \ >> 192: } else { \ >> 193: __block_can_call_java = false; \ > > For non-CompilerThreads, `__block_can_call_java` should be true I think since they are not affected by `-Xcomp` or `-Xbatch`. A TruffleCompileThread is such a thread. For non-compiler thread the new value is never used because [CompilerThreadCanCallJava::update](https://github.com/openjdk/jdk/blob/f687c82ef9ede1d9d02ca0965c896bcf658c450a/src/hotspot/share/jvmci/jvmci.cpp#L58) does not modify the `CompilerThread::_can_call_java` value in this case. However, using `true` may improve readability. I will change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21285#discussion_r1787925558 From kvn at openjdk.org Fri Oct 4 16:06:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 4 Oct 2024 16:06:37 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: <_UfXDCsUh7XlJS1APDR6uEdAFZyDktB56D3l5idS0OA=.f6ed2d6d-251d-4237-ad0d-3dd8298e538b@github.com> On Fri, 4 Oct 2024 09:27:52 GMT, Kim Barrett wrote: >> Please review this change to TypeRawPtr::add_offset to prevent a compiler from >> inferring things based on prior pointer arithmetic not invoking UB. As noted in >> the bug report, clang is actually doing this. >> >> To accomplish this, changed to integral arithmetic. Also added over/underflow >> checks. >> >> Also made a couple of minor touchups. Replaced an implicit conversion to bool >> with an explicit compare to nullptr (per style guide). Removed a no longer >> needed dummy return after a (now) noreturn function. >> >> Testing: mach5 tier1-7 >> That testing was with calls to "fatal" for the over/underflow cases and the >> sum==0 case. There were no hits. I'm not sure how to construct a test that >> would hit those. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove surrounding whitespace Good. Side note: please enable GHA testing for your repo. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21324#pullrequestreview-2348418050 PR Comment: https://git.openjdk.org/jdk/pull/21324#issuecomment-2394024252 From duke at openjdk.org Fri Oct 4 16:07:14 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 16:07:14 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v4] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: Set __block_can_call_java to true for non compiler threads. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21285/files - new: https://git.openjdk.org/jdk/pull/21285/files/f687c82e..346f8982 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From duke at openjdk.org Fri Oct 4 16:34:54 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 16:34:54 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v5] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: <4X5xzYDU45UnuXtoC0sEJ7dF5seNqQDJxhdNrdktsV8=.03a8e2b5-7e43-44b7-92b4-b4440dc26770@github.com> > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: Simplified C2V_BLOCK. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21285/files - new: https://git.openjdk.org/jdk/pull/21285/files/346f8982..e07d4448 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=03-04 Stats: 8 lines in 1 file changed: 0 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From qamai at openjdk.org Sun Oct 6 08:32:20 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 6 Oct 2024 08:32:20 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: [vectorapi] Refactor VectorShuffle implementation ------------- Changes: https://git.openjdk.org/jdk/pull/21042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=01 Stats: 5013 lines in 64 files changed: 2737 ins; 1068 del; 1208 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From qamai at openjdk.org Sun Oct 6 10:11:48 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 6 Oct 2024 10:11:48 GMT Subject: RFR: 8341102: Add element type information to vector types [v3] In-Reply-To: References: Message-ID: > Hi, > > This patch adds the type information of each element in a `TypeVect`. This helps constant folding vectors as well as strength reduction of several complex operations such as `Rearrange`. Some notable points: > > - I only implement `ConV` rule on x86, looking at other architectures it seems that I would not only need to implement the `ConV` implementations, but several other rules that match `ReplicateNode` of a constant. > - I changed the implementation of an array constant in `constanttable`, I think working with `jbyte` is easier as it allows `memcpy` and at this point, we are close to the metal anyway. > - Constant folding for a `VectorUnboxNode`, this is special because an element of a normal stable array is only constant if it is non-zero, so implementing constant folding on a load node seems less trivial. > - Memory fences because `Vector::payload` is a final field and we should respect that. > - Several places expect a `const Type*` when in reality it expects a `BasicType`, I refactor that so that the intent is clearer and there is less room for possible errors, this is needed because `byte`, `short` and `int` share the same kind of `const Type*`. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: add element types to vector types ------------- Changes: https://git.openjdk.org/jdk/pull/21229/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21229&range=02 Stats: 1431 lines in 39 files changed: 887 ins; 330 del; 214 mod Patch: https://git.openjdk.org/jdk/pull/21229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21229/head:pull/21229 PR: https://git.openjdk.org/jdk/pull/21229 From qamai at openjdk.org Sun Oct 6 10:27:35 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 6 Oct 2024 10:27:35 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Sun, 6 Oct 2024 08:32:20 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > [vectorapi] Refactor VectorShuffle implementation I have adapted the patch in accordance with https://github.com/openjdk/jdk/pull/20634, I moved the index wrapping into C2 instead of making it a separate step as I think it seems clearer. Also, I think in the future we can eliminate this step so putting it in C2 would make the progress easier. Please take a look, thanks a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2395383093 From chagedorn at openjdk.org Mon Oct 7 05:27:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Oct 2024 05:27:44 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v13] In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 06:30:11 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > use failing_internal instead; add a const; clarify skip Thanks for the update, looks good! Some minor code style comments regarding existing code that you touched which I think you could also fix while at it. src/hotspot/share/opto/compile.cpp line 4375: > 4373: > 4374: Compile::TracePhase::~TracePhase() { > 4375: if (_compile->failing_internal()) return; // timing code, not stressing bailouts. While at it, I suggest to add braces: Suggestion: if (_compile->failing_internal()) { return; // timing code, not stressing bailouts. } Same below at some places. src/hotspot/share/opto/graphKit.cpp line 343: > 341: // regions do not appear except in this function, and in use_exception_state. > 342: void GraphKit::combine_exception_states(SafePointNode* ex_map, SafePointNode* phi_map) { > 343: if (failing_internal()) return; // dying anyway... Suggestion: if (failing_internal()) { return; // dying anyway... } src/hotspot/share/opto/graphKit.cpp line 2059: > 2057: bool must_throw, > 2058: bool keep_exact_action) { > 2059: if (failing_internal()) stop(); Suggestion: if (failing_internal()) { stop(); } src/hotspot/share/opto/loopnode.cpp line 4938: > 4936: > 4937: PhaseIdealLoop phase_verify(_igvn, this); > 4938: if (C->failing_internal()) return; Suggestion: if (C->failing_internal()) { return; } src/hotspot/share/opto/output.cpp line 3394: > 3392: > 3393: // Emitting into the scratch buffer should not fail > 3394: assert (!C->failing_internal() || C->failure_is_artificial(), "Must not have pending failure. Reason is: %s", C->failure_reason()); Suggestion: assert(!C->failing_internal() || C->failure_is_artificial(), "Must not have pending failure. Reason is: %s", C->failure_reason()); src/hotspot/share/opto/parse.hpp line 429: > 427: > 428: // Must this parse be aborted? > 429: bool failing() { return C->failing_internal(); } // might have cascading effects, not stressing bailouts for now. Can be made const: Suggestion: bool failing() const { return C->failing_internal(); } // might have cascading effects, not stressing bailouts for now. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19646#pullrequestreview-2350875835 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789512799 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789513200 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789513401 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789513596 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789513864 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789512066 From thartmann at openjdk.org Mon Oct 7 05:46:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Oct 2024 05:46:40 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: On Fri, 4 Oct 2024 09:27:52 GMT, Kim Barrett wrote: >> Please review this change to TypeRawPtr::add_offset to prevent a compiler from >> inferring things based on prior pointer arithmetic not invoking UB. As noted in >> the bug report, clang is actually doing this. >> >> To accomplish this, changed to integral arithmetic. Also added over/underflow >> checks. >> >> Also made a couple of minor touchups. Replaced an implicit conversion to bool >> with an explicit compare to nullptr (per style guide). Removed a no longer >> needed dummy return after a (now) noreturn function. >> >> Testing: mach5 tier1-7 >> That testing was with calls to "fatal" for the over/underflow cases and the >> sum==0 case. There were no hits. I'm not sure how to construct a test that >> would hit those. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove surrounding whitespace What about using `intptr_t` for `TypeRawPtr::_bits` instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21324#issuecomment-2395956241 From thartmann at openjdk.org Mon Oct 7 06:03:35 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Oct 2024 06:03:35 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: <4ckwGsbjH0VDbGmBBTuG4HUc6ARwbISQ7L8xsVCeqDs=.28d696f2-47ba-4323-855d-e71369242876@github.com> On Wed, 2 Oct 2024 13:58:14 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2350926679 From thartmann at openjdk.org Mon Oct 7 06:38:37 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Oct 2024 06:38:37 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 21:31:12 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments, use explicit opcode comparisons for LShift nodes Testing all passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2396023249 From roland at openjdk.org Mon Oct 7 07:55:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 7 Oct 2024 07:55:41 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 21:31:12 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments, use explicit opcode comparisons for LShift nodes Marked as reviewed by roland (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20754#pullrequestreview-2351182436 From roland at openjdk.org Mon Oct 7 07:55:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 7 Oct 2024 07:55:41 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 06:36:06 GMT, Tobias Hartmann wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments, use explicit opcode comparisons for LShift nodes > > Testing all passed. @TobiHartmann @chhagedorn thanks for running tests ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2396180050 From thartmann at openjdk.org Mon Oct 7 07:58:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Oct 2024 07:58:39 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v21] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <1R6tWNQF3WSJ4joPaoWR3N_JKy0W-BB0Zntg00N-mlU=.e8b25703-93ea-42be-ba1d-46e06c17987c@github.com> On Thu, 3 Oct 2024 16:31:15 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > correctly verify outputs with custom @Run methods `compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java` times out in our testing both with `-XX:StressLongCountedLoop=200000000` and with `-XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0`: "main" #1 [2771172] prio=5 os_prio=0 cpu=500187.70ms elapsed=503.08s allocated=6554K defined_classes=227 tid=0x0000ffff9002d550 nid=2771172 runnable [0x0000ffff972bf000] java.lang.Thread.State: RUNNABLE Thread: 0x0000ffff9002d550 [0x2a48e4] State: _at_safepoint _at_poll_safepoint 1 JavaThread state: _thread_blocked at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:93) at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.runTestIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:103) at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base at 24-internal/DirectMethodHandle$Holder) at java.lang.invoke.LambdaForm$MH/0x0000ffff58460870.invoke(java.base at 24-internal/LambdaForm$MH) at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base at 24-internal/Invokers$Holder) at jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(java.base at 24-internal/DirectMethodHandleAccessor.java:154) at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(java.base at 24-internal/DirectMethodHandleAccessor.java:104) at java.lang.reflect.Method.invoke(java.base at 24-internal/Method.java:573) at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159) at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:98) at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:89) at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861) at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252) at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2396187667 From luhenry at openjdk.org Mon Oct 7 08:22:39 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 7 Oct 2024 08:22:39 GMT Subject: RFR: 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 08:39:56 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this simple patch to add add t3-t6? > I also modified some particularly obvious places which use x28-x31 but indeed intending t3-t6, but keep others as it is, because it's more consistent with context code. > > Thanks! Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21349#pullrequestreview-2351243290 From fyang at openjdk.org Mon Oct 7 08:29:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 7 Oct 2024 08:29:35 GMT Subject: RFR: 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 08:39:56 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this simple patch to add add t3-t6? > I also modified some particularly obvious places which use x28-x31 but indeed intending t3-t6, but keep others as it is, because it's more consistent with context code. > > Thanks! Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21349#pullrequestreview-2351259208 From duke at openjdk.org Mon Oct 7 08:32:22 2024 From: duke at openjdk.org (Daniel Skantz) Date: Mon, 7 Oct 2024 08:32:22 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v14] In-Reply-To: References: Message-ID: > This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). > > We check two invariants. > a) Bailouts should be successful starting from any given `failing()` check. > b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). > > a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. > > The added flag should not have any effect in debug mode. > > Testing: > > T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19646/files - new: https://git.openjdk.org/jdk/pull/19646/files/cb748fb8..b6eb9a84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=12-13 Stats: 14 lines in 5 files changed: 8 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19646/head:pull/19646 PR: https://git.openjdk.org/jdk/pull/19646 From chagedorn at openjdk.org Mon Oct 7 08:33:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Oct 2024 08:33:41 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 21:31:12 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments, use explicit opcode comparisons for LShift nodes Good updates, it's now easy to follow the logic and understand the code. I have some more comments/suggestions. src/hotspot/share/opto/addnode.cpp line 429: > 427: ? (Node*) phase->intcon((jint) multiplier) // intentional type narrowing to allow overflow at max_jint > 428: : (Node*) phase->longcon(multiplier); > 429: return MulNode::make(con, in(2), bt); Could you use `in2` here? Suggestion: return MulNode::make(con, in2, bt); src/hotspot/share/opto/addnode.cpp line 437: > 435: // Match `a + a`, extract `a` and `2` > 436: Node* AddNode::find_simple_addition_pattern(Node* n, BasicType bt, jlong* multiplier) { > 437: // Look for pattern: AddNode(a, a) Could also be added as method comment above. Same for other `find*` methods. src/hotspot/share/opto/addnode.cpp line 446: > 444: } > 445: > 446: // Match `a << CON`, extract `a` and `1 << CON` "extract" was a bit confusing at first. So, what you mean is return `a` and set `multiplier` to `1 << CON`. Maybe you want to update the comment to make this more explicit? Maybe something like that: // Try to match `a << CON`. On success, return `a` and set `1 << CON` as `multiplier`. You could do the same for the other `find*` methods. src/hotspot/share/opto/addnode.cpp line 547: > 545: > 546: return nullptr; > 547: } I think you could remove the new lines for more compactness here: Suggestion: } return nullptr; } return nullptr; } src/hotspot/share/opto/addnode.cpp line 567: > 565: > 566: // We can't simply return the lshift node even if ((a << CON) - a) + a cancels out. Ideal() must return a new node. > 567: *multiplier = ((jlong) 1 << con->get_int()) - 1; Can't this be an `Identity()` transformation where you can return existing nodes? src/hotspot/share/opto/addnode.hpp line 46: > 44: virtual uint hash() const; > 45: > 46: private: Can be removed since these methods are already private by default here since it's a `class` and not a `struct`. Suggestion: test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 61: > 59: @Arguments(values = {Argument.RANDOM_EACH}) > 60: @IR(counts = { IRNode.ADD_I, "1" }) > 61: @IR(failOn = {IRNode.LSHIFT_I}) Generally, for single strings, you can remove the braces: Suggestion: @IR(failOn = IRNode.LSHIFT_I) test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 64: > 62: private static void addTo2(int a) { > 63: int sum = a + a; // Simple additions like a + a should be kept as-is > 64: verifyResult(a, 2, sum); Generally, we should move all verification code out of the `@Test` methods to avoid side effects and worrying about whether the result checking is now compiled or not (we must ensure that the result checking code is interpreted to catch wrong executions with miscompiled code). I suggest the following (not tested): Introduce a `@Run` method, which is never compiled, for your `@Test` methods. You can still call methods from there but then you should ensure that they are not compiled either with `@DontCompile`: static final Random RANDOM = Utils.getRandomInstance(); ... @DontCompile private static void verifyResult(int base, int factor, int observed) { ... } ... @Test @IR(counts = {IRNode.ADD_I, "1", IRNode.LSHIFT_I, "1"}) private static int addTo3(int a) { return a + a + a; // a*3 => (a<<1) + a } @Run(test = "addTo3") void runAddTo3() { int a = RANDOM.nextInt(); int result = addTo3(a); verifyResult(a, 3, result); } Since the tests are all very similar and require the same setup and verification, you could even go a step further and provide a single shared `@Run` method which is possible: @Test @IR(counts = {IRNode.ADD_I, "1", IRNode.LSHIFT_I, "1"}) private static int addTo3(int a) { return a + a + a; // a*3 => (a<<1) + a } @Test @IR(failOn = IRNode.ADD_I) @IR(counts = {IRNode.LSHIFT_I, "1"}) private static int addTo4(int a) { return a + a + a + a; // a*4 => a<<2 } @Run(test = {"addTo3", "addTo4"}) // List all @Test methods here and make sure you call all of them below. void runTests() { int a = RANDOM.nextInt(); verifyResult(a, 3, addTo3(a)); verifyResult(a, 4, addTo4(a)); } This also allows you to run with some more edge case values like `a == 0` or `a == min_int` etc. which gives us even some more confidence. test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 70: > 68: @Arguments(values = {Argument.RANDOM_EACH}) > 69: @IR(counts = { IRNode.ADD_I, "1" }) > 70: @IR(counts = {IRNode.LSHIFT_I, "1"}) Generally, you can merge these together: Suggestion: @IR(counts = {IRNode.ADD_I, "1", IRNode.LSHIFT_I, "1"}) ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20754#pullrequestreview-2351092284 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789654678 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789657284 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789679835 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789700721 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789725214 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789722502 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789759013 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789744964 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789760417 From chagedorn at openjdk.org Mon Oct 7 09:01:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Oct 2024 09:01:41 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v14] In-Reply-To: References: Message-ID: <5OZ8ScEw3a_dazfA83RTIOPQdBbn8ZctXj8mMbvlZv0=.23fe019f-7b74-4e58-9a77-ca183f5e4a9c@github.com> On Mon, 7 Oct 2024 08:32:22 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Looks good, thanks for the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19646#pullrequestreview-2351336877 From duke at openjdk.org Mon Oct 7 09:10:52 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 09:10:52 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v4] In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: <-VhlbMrk05I4TJjr4U_ejcmX02d8ywyaUyQlv8diCHE=.ccd4b85a-3cd4-4f04-95a0-ae9dd59a8c0f@github.com> On Tue, 24 Sep 2024 16:43:57 GMT, Ant?n Seoane wrote: >> Hi all, >> >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: >> >> - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. >> - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. >> - Additionally, defaults may target a specific log level. >> >> Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. >> >> In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. >> >> Please consider this PR, and thanks! > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Removed whitespace After some review and discussion, I am closing this PR and opening a new (simplified) version of this that aligns with the needed use cases in [8341622: Tag-specific disabled default decorators for UnifiedLogging](https://github.com/openjdk/jdk/pull/21383). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2396352901 From duke at openjdk.org Mon Oct 7 09:10:52 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 09:10:52 GMT Subject: Withdrawn: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: <6FI0_lFOFEAktdR8fDEyglCSi_mL_zZv8QJdDvTJ5L8=.e0a68b15-9677-48df-8b0a-f263b2357bc5@github.com> On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20988 From mli at openjdk.org Mon Oct 7 09:32:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 7 Oct 2024 09:32:39 GMT Subject: RFR: 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:20:21 GMT, Ludovic Henry wrote: >> Hi, >> >> Can you help to review this simple patch to add add t3-t6? >> I also modified some particularly obvious places which use x28-x31 but indeed intending t3-t6, but keep others as it is, because it's more consistent with context code. >> >> Thanks! > > Marked as reviewed by luhenry (Committer). Thanks @luhenry @RealFYang for your reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21349#issuecomment-2396403264 From mli at openjdk.org Mon Oct 7 09:35:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 7 Oct 2024 09:35:39 GMT Subject: Integrated: 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 08:39:56 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this simple patch to add add t3-t6? > I also modified some particularly obvious places which use x28-x31 but indeed intending t3-t6, but keep others as it is, because it's more consistent with context code. > > Thanks! This pull request has now been integrated. Changeset: 28977972 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/28977972a0129892543222eada4dc99f4cd62574 Stats: 14 lines in 2 files changed: 4 ins; 0 del; 10 mod 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp Reviewed-by: luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/21349 From rcastanedalo at openjdk.org Mon Oct 7 09:44:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 7 Oct 2024 09:44:38 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:56:40 GMT, Ant?n Seoane wrote: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Labeling the PR as `hotspot-compiler` because it proposes disabling default decorators of `jit+inlining`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21383#issuecomment-2396429323 From jsjolen at openjdk.org Mon Oct 7 11:20:42 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 7 Oct 2024 11:20:42 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:56:40 GMT, Ant?n Seoane wrote: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. src/hotspot/share/logging/logDecorators.cpp line 110: > 108: bool LogDecorators::has_disabled_default_decorators(const LogSelection& selection, const DefaultUndecoratedSelection* defaults, size_t defaults_count) { > 109: for (size_t i = 0; i < defaults_count; ++i) { > 110: auto current_default = defaults[i]; Please expand with deduced type. src/hotspot/share/logging/logSelectionList.cpp line 62: > 60: } > 61: } > 62: return LogDecorators(0); Here I'd like to see either an explanation of 0 as `LogDecorators(0 /* comment */)` or a meaningful name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1790018720 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1790025195 From duke at openjdk.org Mon Oct 7 11:46:19 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 11:46:19 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v2] In-Reply-To: References: Message-ID: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Review changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/3e0a0613..e1878be5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From duke at openjdk.org Mon Oct 7 11:46:19 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 11:46:19 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v2] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:17:18 GMT, Johan Sj?len wrote: >> Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: >> >> Review changes > > src/hotspot/share/logging/logSelectionList.cpp line 62: > >> 60: } >> 61: } >> 62: return LogDecorators(0); > > Here I'd like to see either an explanation of 0 as `LogDecorators(0 /* comment */)` or a meaningful name. I have used the mask_from_decorators function, I think it should be cleaner now ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1790064483 From aboldtch at openjdk.org Mon Oct 7 11:46:19 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 7 Oct 2024 11:46:19 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v2] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:42:37 GMT, Ant?n Seoane wrote: >> src/hotspot/share/logging/logSelectionList.cpp line 62: >> >>> 60: } >>> 61: } >>> 62: return LogDecorators(0); >> >> Here I'd like to see either an explanation of 0 as `LogDecorators(0 /* comment */)` or a meaningful name. > > I have used the mask_from_decorators function, I think it should be cleaner now There is `LogDecorators::None` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1790065559 From jsjolen at openjdk.org Mon Oct 7 11:57:37 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 7 Oct 2024 11:57:37 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v2] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:46:19 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Review changes Code is OK, please consider Axel's advice and see if it's applicable. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2351732452 From duke at openjdk.org Mon Oct 7 13:14:21 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 13:14:21 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v3] In-Reply-To: References: Message-ID: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 35 additional commits since the last revision: - Merge commit '19642bd3833fa96eb4bc7a8a11e902782e0b7844' into ul-defaults-simplified - Review changes - Final changes - Renaming, test adaptation - Renaming - Temporarily commenting out testing code - Preliminary simplification of UL tag-specific defaults to only target defaults on/off - Removed whitespace - Initialization of _decorators field in logDecorators - Test adaptations to new focus - ... and 25 more: https://git.openjdk.org/jdk/compare/02d8dd79...fdf6ac02 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/e1878be5..fdf6ac02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=01-02 Stats: 209846 lines in 1840 files changed: 187264 ins; 12599 del; 9983 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From duke at openjdk.org Mon Oct 7 13:23:45 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 13:23:45 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v3] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:43:23 GMT, Axel Boldt-Christmas wrote: >> I have used the mask_from_decorators function, I think it should be cleaner now > > There is `LogDecorators::None` LogDecorators::None is defined in the .cpp, so I'd either have to make it "visible" or use the alternative NoDecorators. Both options are fine for me ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1790215272 From duke at openjdk.org Mon Oct 7 13:26:56 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 13:26:56 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v4] In-Reply-To: References: Message-ID: <60YluOey0QgBIIcU-QQk58zr3lioQUjkNETfoyR5yCA=.164295af-bc30-4ed6-96d4-cdb272134abf@github.com> > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Update full name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/fdf6ac02..deef63ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=02-03 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From kbarrett at openjdk.org Mon Oct 7 15:16:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 7 Oct 2024 15:16:43 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: On Mon, 7 Oct 2024 05:43:47 GMT, Tobias Hartmann wrote: > What about using `intptr_t` for `TypeRawPtr::_bits` instead? That has more fannout, into code I'm not familiar with. The proposed change fixes the immediate "miscompilation". A change of the type could be done as a further enhancement, if that makes sense to do. I'd rather leave that to someone from the compiler team. If that approach is what's wanted to fix the immediate problem, then I'm going to want to hand this issue off. Also, uintptr_t might be more appropriate than intptr_t. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21324#issuecomment-2397213908 From thartmann at openjdk.org Mon Oct 7 16:22:05 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Oct 2024 16:22:05 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching Message-ID: C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way. Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body: https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary: https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned. I propose to simply align it in `PatchingStub::emit_code`. The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking. AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms. Thanks, Tobias ------------- Commit messages: - Increased timeout - Removed platform specific asserts from shared code - 8340313: Crash due to invalid oop in nmethod after C1 patching Changes: https://git.openjdk.org/jdk/pull/21389/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21389&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340313 Stats: 152 lines in 3 files changed: 147 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21389/head:pull/21389 PR: https://git.openjdk.org/jdk/pull/21389 From kxu at openjdk.org Mon Oct 7 18:44:37 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 7 Oct 2024 18:44:37 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 07:28:35 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments, use explicit opcode comparisons for LShift nodes > > src/hotspot/share/opto/addnode.cpp line 446: > >> 444: } >> 445: >> 446: // Match `a << CON`, extract `a` and `1 << CON` > > "extract" was a bit confusing at first. So, what you mean is return `a` and set `multiplier` to `1 << CON`. Maybe you want to update the comment to make this more explicit? Maybe something like that: > > // Try to match `a << CON`. On success, return `a` and set `1 << CON` as `multiplier`. > > You could do the same for the other `find*` methods. Updated comments. Thanks for the suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1790709000 From kxu at openjdk.org Mon Oct 7 18:50:57 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 7 Oct 2024 18:50:57 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: - remove matching power-of-2 subtractions since it's already handled by Identity() - verify results with custom test methods - update comments to be more descriptive, remove unused can_reshape argument ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/af6f8084..ecee68ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=07-08 Stats: 234 lines in 3 files changed: 62 ins; 91 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From kxu at openjdk.org Mon Oct 7 18:50:58 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 7 Oct 2024 18:50:58 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:02:35 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments, use explicit opcode comparisons for LShift nodes > > src/hotspot/share/opto/addnode.cpp line 567: > >> 565: >> 566: // We can't simply return the lshift node even if ((a << CON) - a) + a cancels out. Ideal() must return a new node. >> 567: *multiplier = ((jlong) 1 << con->get_int()) - 1; > > Can't this be an `Identity()` transformation where you can return existing nodes? Good point. I realized `(x - y) + y => x` is already handled by `AddINode::Identity` and `AddLNode::Identify`. I don't need to repeat here. > test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 64: > >> 62: private static void addTo2(int a) { >> 63: int sum = a + a; // Simple additions like a + a should be kept as-is >> 64: verifyResult(a, 2, sum); > > Generally, we should move all verification code out of the `@Test` methods to avoid side effects and worrying about whether the result checking is now compiled or not (we must ensure that the result checking code is interpreted to catch wrong executions with miscompiled code). > > I suggest the following (not tested): > > Introduce a `@Run` method, which is never compiled, for your `@Test` methods. You can still call methods from there but then you should ensure that they are not compiled either with `@DontCompile`: > > static final Random RANDOM = Utils.getRandomInstance(); > > ... > > @DontCompile > private static void verifyResult(int base, int factor, int observed) { ... } > > ... > > @Test > @IR(counts = {IRNode.ADD_I, "1", IRNode.LSHIFT_I, "1"}) > private static int addTo3(int a) { > return a + a + a; // a*3 => (a<<1) + a > } > > @Run(test = "addTo3") > void runAddTo3() { > int a = RANDOM.nextInt(); > int result = addTo3(a); > verifyResult(a, 3, result); > } > > Since the tests are all very similar and require the same setup and verification, you could even go a step further and provide a single shared `@Run` method which is possible: > > @Test > @IR(counts = {IRNode.ADD_I, "1", IRNode.LSHIFT_I, "1"}) > private static int addTo3(int a) { > return a + a + a; // a*3 => (a<<1) + a > } > > @Test > @IR(failOn = IRNode.ADD_I) > @IR(counts = {IRNode.LSHIFT_I, "1"}) > private static int addTo4(int a) { > return a + a + a + a; // a*4 => a<<2 > } > > @Run(test = {"addTo3", "addTo4"}) // List all @Test methods here and make sure you call all of them below. > void runTests() { > int a = RANDOM.nextInt(); > verifyResult(a, 3, addTo3(a)); > verifyResult(a, 4, addTo4(a)); > } > > > This also allows you to run with some more edge case values like `a == 0` or `a == min_int` etc. which gives us even some more confidence. Thanks for the idea. Converted to custom `@Run` methods and test with `a = 0, 1, min, max, rand` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1790711922 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1790713703 From kvn at openjdk.org Mon Oct 7 19:18:36 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 7 Oct 2024 19:18:36 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: <-W7RQShH8RK7mJ0_FNh-7nYeqCC_1IFiFRiATETFAaw=.9a5c31a8-e781-42f1-b877-7d5122b67730@github.com> On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann wrote: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned. > > I propose to simply align it in `PatchingStub::emit_code`. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking. > > AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms. > > Thanks, > Tobias src/hotspot/cpu/x86/c1_CodeStubs_x86.cpp line 334: > 332: // 8-byte align the address of the oop immediate to guarantee atomicity > 333: // when patching since the GC might walk nmethod oops concurrently. > 334: __ align(8, __ offset() + NativeMovConstReg::data_offset_rex); In 32-bit VM oops are 4 bytes so 8 bytes is overkill but I am fine with unified alignment. Should we align mov_metadata() too or it is guarantee aligned already? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21389#discussion_r1790750290 From dlong at openjdk.org Mon Oct 7 21:30:38 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 7 Oct 2024 21:30:38 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: On Fri, 4 Oct 2024 09:27:52 GMT, Kim Barrett wrote: >> Please review this change to TypeRawPtr::add_offset to prevent a compiler from >> inferring things based on prior pointer arithmetic not invoking UB. As noted in >> the bug report, clang is actually doing this. >> >> To accomplish this, changed to integral arithmetic. Also added over/underflow >> checks. >> >> Also made a couple of minor touchups. Replaced an implicit conversion to bool >> with an explicit compare to nullptr (per style guide). Removed a no longer >> needed dummy return after a (now) noreturn function. >> >> Testing: mach5 tier1-7 >> That testing was with calls to "fatal" for the over/underflow cases and the >> sum==0 case. There were no hits. I'm not sure how to construct a test that >> would hit those. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove surrounding whitespace src/hotspot/share/opto/type.cpp line 3226: > 3224: return this; > 3225: case TypePtr::Null: > 3226: return make( (address)offset ); Shouldn't this assert that _bits == 0? Looking at the code, however, I can't find anywhere that we actually create a TypeRawPtr with TypePtr::Null. We could probably remove this case and let it fall through to the default ShouldNotReachHere(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1790898473 From kbarrett at openjdk.org Mon Oct 7 22:01:27 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 7 Oct 2024 22:01:27 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: On Mon, 7 Oct 2024 21:27:58 GMT, Dean Long wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> remove surrounding whitespace > > src/hotspot/share/opto/type.cpp line 3226: > >> 3224: return this; >> 3225: case TypePtr::Null: >> 3226: return make( (address)offset ); > > Shouldn't this assert that _bits == 0? Looking at the code, however, I can't find anywhere that we actually create a TypeRawPtr with TypePtr::Null. We could probably remove this case and let it fall through to the default ShouldNotReachHere(). Initialization of `TypePtr::NULL_PTR` here: https://github.com/openjdk/jdk/blob/4d50cbb5a73ad1f84ecd6a895045ecfdb0835adc/src/hotspot/share/opto/type.cpp#L538 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1790914960 From dlong at openjdk.org Mon Oct 7 22:08:35 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 7 Oct 2024 22:08:35 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: <5LaZcZeK2ShpVHDF5LuK1m_Z0RLeQOIdHYXd9t_Vl5c=.ed093c6b-3949-4757-ba4b-a963067ce0ab@github.com> On Mon, 7 Oct 2024 21:45:31 GMT, Kim Barrett wrote: >> src/hotspot/share/opto/type.cpp line 3226: >> >>> 3224: return this; >>> 3225: case TypePtr::Null: >>> 3226: return make( (address)offset ); >> >> Shouldn't this assert that _bits == 0? Looking at the code, however, I can't find anywhere that we actually create a TypeRawPtr with TypePtr::Null. We could probably remove this case and let it fall through to the default ShouldNotReachHere(). > > Initialization of `TypePtr::NULL_PTR` here: > https://github.com/openjdk/jdk/blob/4d50cbb5a73ad1f84ecd6a895045ecfdb0835adc/src/hotspot/share/opto/type.cpp#L538 I saw that too, but it creates a TypePtr, not a TypeRawPtr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1790935162 From dlong at openjdk.org Mon Oct 7 23:33:02 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 7 Oct 2024 23:33:02 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann wrote: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned. > > I propose to simply align it in `PatchingStub::emit_code`. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking. > > AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms. > > Thanks, > Tobias Wouldn't it be better to get rid of the concurrency? We could grab CodeCache_lock and Patching_lock in the same block, so we serialize the patching and register_nmethod. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2398150822 From duke at openjdk.org Tue Oct 8 06:15:01 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 8 Oct 2024 06:15:01 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v14] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:32:22 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19646#issuecomment-2398935540 From duke at openjdk.org Tue Oct 8 06:15:01 2024 From: duke at openjdk.org (duke) Date: Tue, 8 Oct 2024 06:15:01 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v14] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:32:22 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn @danielogh Your change (at version b6eb9a843e18b05ff2a23a3faecbe28c9118aa79) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19646#issuecomment-2398937479 From thartmann at openjdk.org Tue Oct 8 06:16:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 06:16:02 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v2] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Fri, 27 Sep 2024 19:46:15 GMT, Igor Veresov wrote: >> This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` >> >> The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. >> >> Compare the good graph shape (with control dependency set to `UnknownControl`): >> Good graph >> >> With the graph produce with the nodes pinned: >> Bad graph >> >> With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result > Summary: Make sure insert_anti_dependencies() starts from the right root Looks good to me otherwise. You might want to run performance testing just to make sure. src/hotspot/share/opto/gcm.cpp line 750: > 748: Node* initial_mem = load->in(MemNode::Memory); > 749: > 750: // We don't optimize memory graph for pinned loads, so we may need to raise the Suggestion: // We don't optimize the memory graph for pinned loads, so we may need to raise the ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21222#pullrequestreview-2353479727 PR Review Comment: https://git.openjdk.org/jdk/pull/21222#discussion_r1791256151 From thartmann at openjdk.org Tue Oct 8 06:16:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 06:16:03 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v2] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> <9kF42tnA1qVGkpW1hRaObBtueDCnO0wgZfhpOArNLnI=.11131d42-f12f-4559-9c47-24aa4e527782@github.com> Message-ID: On Fri, 27 Sep 2024 19:13:48 GMT, Vladimir Kozlov wrote: >> Also if there is a MergeMem as a root for some weird reason then `insert_anti_dependencies()` may very well miss an interfering store. So we'd have to do this loop for correctness. > > Okay Would it still make sense to assert `load->control_dependency() == Pinned` here for now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21222#discussion_r1791250944 From kbarrett at openjdk.org Tue Oct 8 06:25:57 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 8 Oct 2024 06:25:57 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: <5LaZcZeK2ShpVHDF5LuK1m_Z0RLeQOIdHYXd9t_Vl5c=.ed093c6b-3949-4757-ba4b-a963067ce0ab@github.com> References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> <5LaZcZeK2ShpVHDF5LuK1m_Z0RLeQOIdHYXd9t_Vl5c=.ed093c6b-3949-4757-ba4b-a963067ce0ab@github.com> Message-ID: On Mon, 7 Oct 2024 22:06:24 GMT, Dean Long wrote: >> Initialization of `TypePtr::NULL_PTR` here: >> https://github.com/openjdk/jdk/blob/4d50cbb5a73ad1f84ecd6a895045ecfdb0835adc/src/hotspot/share/opto/type.cpp#L538 > > I saw that too, but it creates a TypePtr, not a TypeRawPtr. Oh, you are right. And TypeRawPtr::make asserts the PTR is neither Constant nor Null. Which makes both switch cases under modification here supposedly unreachable. That would explain why I never hit either after running lots of tests. All of the change proposed here can be eliminated, and instead change both cases to fall through to the default ShouldNotReachHere(). (And that would be another way to remove the -Wzero-as-null-pointer-constant warning that was how I got here in the first place. :) ) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1791266904 From rcastanedalo at openjdk.org Tue Oct 8 07:02:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 07:02:50 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node Message-ID: This cleanup removes C2's `StoreCM` (store card mark) node and all its special handling code. This node used to model card mark stores in early-expanded G1 post-barriers, and is no longer needed after [JEP 475](https://openjdk.org/jeps/475). __Testing:__ tier1-5 (linux-x64, linux-aarch64, windows-x64, macosx-x64, and macosx-aarch64; release and debug mode). ------------- Commit messages: - Remove StoreCM node Changes: https://git.openjdk.org/jdk/pull/21385/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21385&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341619 Stats: 388 lines in 23 files changed: 0 ins; 376 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21385.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21385/head:pull/21385 PR: https://git.openjdk.org/jdk/pull/21385 From chagedorn at openjdk.org Tue Oct 8 07:13:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 07:13:01 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 18:50:57 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: > > - remove matching power-of-2 subtractions since it's already handled by Identity() > - verify results with custom test methods > - update comments to be more descriptive, remove unused can_reshape argument Thanks for the updates! Good conversion of the tests. ll give this another spinning in our testing. src/hotspot/share/opto/addnode.cpp line 439: > 437: > 438: // Try to match `a + a`. On success, return `a` and set `2` as `multiplier`. > 439: // The method matches `n` to for pattern: AddNode(a, a). Suggestion: // The method matches `n` for pattern: AddNode(a, a). test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 65: > 63: "mulAndAddToZero", // > 64: "mulAndAddToMinus1", // > 65: "mulAndAddToMinus42" // Why did you add the trailing `//`? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20754#pullrequestreview-2353577415 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1791311803 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1791313522 From chagedorn at openjdk.org Tue Oct 8 07:13:03 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 07:13:03 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 18:44:41 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/addnode.cpp line 567: >> >>> 565: >>> 566: // We can't simply return the lshift node even if ((a << CON) - a) + a cancels out. Ideal() must return a new node. >>> 567: *multiplier = ((jlong) 1 << con->get_int()) - 1; >> >> Can't this be an `Identity()` transformation where you can return existing nodes? > > Good point. I realized `(x - y) + y => x` is already handled by `AddINode::Identity` and `AddLNode::Identify`. I don't need to repeat here. That's great! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1791317004 From chagedorn at openjdk.org Tue Oct 8 07:16:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 07:16:02 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:28:19 GMT, Roberto Casta?eda Lozano wrote: > This cleanup removes C2's `StoreCM` (store card mark) node and all its special handling code. This node used to model card mark stores in early-expanded G1 post-barriers, and is no longer needed after [JEP 475](https://openjdk.org/jeps/475). > > __Testing:__ tier1-5 (linux-x64, linux-aarch64, windows-x64, macosx-x64, and macosx-aarch64; release and debug mode). Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21385#pullrequestreview-2353597839 From rcastanedalo at openjdk.org Tue Oct 8 07:21:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 07:21:57 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 07:13:42 GMT, Christian Hagedorn wrote: > Looks good! Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21385#issuecomment-2399042363 From thartmann at openjdk.org Tue Oct 8 08:57:59 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 08:57:59 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: References: Message-ID: <66NAr7bZnmYCELHL00gm1ge8PgrXUF5MVC_I8--pLxw=.4350170b-b782-48c1-bd15-15df5f71d91b@github.com> On Mon, 7 Oct 2024 11:28:19 GMT, Roberto Casta?eda Lozano wrote: > This cleanup removes C2's `StoreCM` (store card mark) node and all its special handling code. This node used to model card mark stores in early-expanded G1 post-barriers, and is no longer needed after [JEP 475](https://openjdk.org/jeps/475). > > __Testing:__ tier1-5 (linux-x64, linux-aarch64, windows-x64, macosx-x64, and macosx-aarch64; release and debug mode). Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21385#pullrequestreview-2353857802 From chagedorn at openjdk.org Tue Oct 8 09:51:00 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 09:51:00 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v3] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 07:42:54 GMT, Christian Hagedorn wrote: >> This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. >> >> ### Predicate Interfaces and Implementing Classes >> - `Predicate` interface is implemented by four predicate classes: >> - `ParsePredicate` (existing class) >> - `RuntimePredicate` (existing and updated class) >> - `TemplateAssertionPredicate` (new class) >> - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) >> >> ### Predicate Iterator with Visitor classes >> There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: >> - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. >> - Replaces the old now retired `ParsePredicateIterator`. >> - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. >> - Replaces the old now retired `PredicateEntryIterator`. >> - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. >> >> #### To Be Replaced soon >> There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. >> >> ### More Information >> More information about specific classes and changes can be found as code comments and PR comments. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add missing public for UnifiedPredicateVisitor Anyone for a second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21161#issuecomment-2399381127 From rcastanedalo at openjdk.org Tue Oct 8 11:04:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 11:04:59 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: <66NAr7bZnmYCELHL00gm1ge8PgrXUF5MVC_I8--pLxw=.4350170b-b782-48c1-bd15-15df5f71d91b@github.com> References: <66NAr7bZnmYCELHL00gm1ge8PgrXUF5MVC_I8--pLxw=.4350170b-b782-48c1-bd15-15df5f71d91b@github.com> Message-ID: On Tue, 8 Oct 2024 08:55:07 GMT, Tobias Hartmann wrote: > Looks good to me too. Thanks, Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21385#issuecomment-2399535146 From rcastanedalo at openjdk.org Tue Oct 8 11:14:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 11:14:59 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v4] In-Reply-To: <60YluOey0QgBIIcU-QQk58zr3lioQUjkNETfoyR5yCA=.164295af-bc30-4ed6-96d4-cdb272134abf@github.com> References: <60YluOey0QgBIIcU-QQk58zr3lioQUjkNETfoyR5yCA=.164295af-bc30-4ed6-96d4-cdb272134abf@github.com> Message-ID: On Mon, 7 Oct 2024 13:26:56 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Update full name Thanks for working on this, Ant?n! The new test files look good to me, I also agree on hiding decorators by default for `jit+inlining`. I just have a few minor comments. test/hotspot/gtest/logging/test_logDefaultDecorators.cpp line 29: > 27: #include "logging/logDecorators.hpp" > 28: #include "runtime/os.hpp" > 29: #include "unittest.hpp" Please sort the included files alphabetically (except for `precompiled.hpp` which should go first) for consistency with the other test files in the directory. Also, `runtime/os.hpp` is unused. Suggestion: #include "precompiled.hpp" #include "jvm.h" #include "logging/logDecorators.hpp" #include "logging/logTag.hpp" #include "unittest.hpp" test/hotspot/jtreg/runtime/logging/DefaultLogDecoratorsTest.java line 27: > 25: * @test > 26: * @requires vm.flagless > 27: * @summary Running -Xlog with tags which have default decorators should pick them This summary reflects the old proposal in JDK-8340363, please update. test/hotspot/jtreg/runtime/logging/DefaultLogDecoratorsTest.java line 51: > 49: for (String string : xlog) { > 50: argsList.add(string); > 51: } Suggestion: List argsList = new ArrayList(Arrays.asList(xlog)); test/hotspot/jtreg/runtime/logging/DefaultLogDecoratorsTest.java line 69: > 67: doTest(false, "-Xlog:jit+inlining*=trace:decorators.log"); > 68: > 69: Nit: unnecessary extra line (same for the other lines below). ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2353875217 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1791493896 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1791501653 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1791672411 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1791512229 From duke at openjdk.org Tue Oct 8 11:57:01 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 8 Oct 2024 11:57:01 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v4] In-Reply-To: References: <60YluOey0QgBIIcU-QQk58zr3lioQUjkNETfoyR5yCA=.164295af-bc30-4ed6-96d4-cdb272134abf@github.com> Message-ID: On Tue, 8 Oct 2024 09:07:08 GMT, Roberto Casta?eda Lozano wrote: >> Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: >> >> Update full name > > test/hotspot/jtreg/runtime/logging/DefaultLogDecoratorsTest.java line 27: > >> 25: * @test >> 26: * @requires vm.flagless >> 27: * @summary Running -Xlog with tags which have default decorators should pick them > > This summary reflects the old proposal in JDK-8340363, please update. Oh, I missed that! Thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1791734747 From duke at openjdk.org Tue Oct 8 12:12:31 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 8 Oct 2024 12:12:31 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v5] In-Reply-To: References: Message-ID: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Applying review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/deef63ff..a80d5fe1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=03-04 Stats: 11 lines in 2 files changed: 0 ins; 8 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From rcastanedalo at openjdk.org Tue Oct 8 12:53:03 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 12:53:03 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v5] In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 12:12:31 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Applying review comments Thanks for your work and for addressing the comments, Ant?n! The test code and the decision to hide decorators by default for `jit+inlining` look good to me. Note that this is only a partial review; a second review of the `src/hotspot/share/logging` changes is still required. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2354397406 From thartmann at openjdk.org Tue Oct 8 14:34:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 14:34:03 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann wrote: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned. > > I propose to simply align it in `PatchingStub::emit_code`. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking. > > AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms. > > Thanks, > Tobias Thanks for looking at this, Vladimir and Dean! > Wouldn't it be better to get rid of the concurrency? We could grab CodeCache_lock and Patching_lock in the same block, so we serialize the patching and register_nmethod. Yes, that would be an alternative solution. I went with the alignment because I thought it has the least impact. I'll ping the GC team, maybe they want to have a say in this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2400023164 From thartmann at openjdk.org Tue Oct 8 14:34:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 14:34:04 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: <-W7RQShH8RK7mJ0_FNh-7nYeqCC_1IFiFRiATETFAaw=.9a5c31a8-e781-42f1-b877-7d5122b67730@github.com> References: <-W7RQShH8RK7mJ0_FNh-7nYeqCC_1IFiFRiATETFAaw=.9a5c31a8-e781-42f1-b877-7d5122b67730@github.com> Message-ID: On Mon, 7 Oct 2024 19:16:27 GMT, Vladimir Kozlov wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned. >> >> I propose to simply align it in `PatchingStub::emit_code`. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking. >> >> AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms. >> >> Thanks, >> Tobias > > src/hotspot/cpu/x86/c1_CodeStubs_x86.cpp line 334: > >> 332: // 8-byte align the address of the oop immediate to guarantee atomicity >> 333: // when patching since the GC might walk nmethod oops concurrently. >> 334: __ align(8, __ offset() + NativeMovConstReg::data_offset_rex); > > In 32-bit VM oops are 4 bytes so 8 bytes is overkill but I am fine with unified alignment. > Should we align mov_metadata() too or it is guarantee aligned already? I don't think we need to guarantee atomicity for metadata because it's not observed concurrently as far as I know, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21389#discussion_r1791996064 From thartmann at openjdk.org Tue Oct 8 15:20:01 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 15:20:01 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: On Mon, 7 Oct 2024 15:13:47 GMT, Kim Barrett wrote: > > What about using `intptr_t` for `TypeRawPtr::_bits` instead? > > That has more fannout, into code I'm not familiar with. The proposed change fixes the immediate "miscompilation". A change of the type could be done as a further enhancement, if that makes sense to do. I'd rather leave that to someone from the compiler team. If that approach is what's wanted to fix the immediate problem, then I'm going to want to hand this issue off. Also, uintptr_t might be more appropriate than intptr_t. Okay, that's fine with me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21324#issuecomment-2400143674 From iveresov at openjdk.org Tue Oct 8 15:39:37 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 8 Oct 2024 15:39:37 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v3] In-Reply-To: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: > This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` > > The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. > > Compare the good graph shape (with control dependency set to `UnknownControl`): > Good graph > > With the graph produce with the nodes pinned: > Bad graph > > With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/gcm.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21222/files - new: https://git.openjdk.org/jdk/pull/21222/files/e9295d93..e80084ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21222&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21222&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21222.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21222/head:pull/21222 PR: https://git.openjdk.org/jdk/pull/21222 From iveresov at openjdk.org Tue Oct 8 15:41:59 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 8 Oct 2024 15:41:59 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v2] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Tue, 8 Oct 2024 06:13:41 GMT, Tobias Hartmann wrote: > Looks good to me otherwise. You might want to run performance testing just to make sure. While working on it I inserted a printf in it and the loop almost never happens since Loads are typically normalized. So, I don't think there is any impact on performance and I don't think checking for controlled dependency is necessary. It would also require us to carry this information to mach nodes... ------------- PR Comment: https://git.openjdk.org/jdk/pull/21222#issuecomment-2400194839 From chagedorn at openjdk.org Tue Oct 8 15:48:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 15:48:02 GMT Subject: RFR: 8341131: Some jdk/jfr/event/compiler tests shouldn't be executed with Xcomp In-Reply-To: <0s7WMhOzLNXKcDjZkg5NEzvaeCvswLtIbTVEdsezr8M=.56c5ea42-ac74-4be1-87ef-353373f5bd6b@github.com> References: <0s7WMhOzLNXKcDjZkg5NEzvaeCvswLtIbTVEdsezr8M=.56c5ea42-ac74-4be1-87ef-353373f5bd6b@github.com> Message-ID: On Sat, 28 Sep 2024 03:47:50 GMT, Leonid Mesnik wrote: > Few jdk/jfr/event/compiler tests sensitive to compile flags and shouldn't be executed with Xcomp. Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21239#pullrequestreview-2354895186 From chagedorn at openjdk.org Tue Oct 8 15:52:08 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 15:52:08 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 18:50:57 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: > > - remove matching power-of-2 subtractions since it's already handled by Identity() > - verify results with custom test methods > - update comments to be more descriptive, remove unused can_reshape argument Testing looked good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2400216991 From kvn at openjdk.org Tue Oct 8 16:27:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 8 Oct 2024 16:27:00 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: References: Message-ID: <-rmpO6zs0OfYl616EhcyeL1Izlx-kpx84VUmFjTZ3HM=.7ede4276-5e3e-479e-ae13-c8b2e8f34275@github.com> On Mon, 7 Oct 2024 11:28:19 GMT, Roberto Casta?eda Lozano wrote: > This cleanup removes C2's `StoreCM` (store card mark) node and all its special handling code. This node used to model card mark stores in early-expanded G1 post-barriers, and is no longer needed after [JEP 475](https://openjdk.org/jeps/475). > > __Testing:__ tier1-5 (linux-x64, linux-aarch64, windows-x64, macosx-x64, and macosx-aarch64; release and debug mode). Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21385#pullrequestreview-2354992010 From rcastanedalo at openjdk.org Tue Oct 8 16:36:05 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 16:36:05 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: <-rmpO6zs0OfYl616EhcyeL1Izlx-kpx84VUmFjTZ3HM=.7ede4276-5e3e-479e-ae13-c8b2e8f34275@github.com> References: <-rmpO6zs0OfYl616EhcyeL1Izlx-kpx84VUmFjTZ3HM=.7ede4276-5e3e-479e-ae13-c8b2e8f34275@github.com> Message-ID: <7Q8caYWlp3OGt-DLjZG65wnx1dKA4tllGuY2G8lmX50=.3f88d91c-53e4-4af5-8e9c-c697405dccb5@github.com> On Tue, 8 Oct 2024 16:24:32 GMT, Vladimir Kozlov wrote: > Good. Thanks, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21385#issuecomment-2400338019 From psandoz at openjdk.org Tue Oct 8 16:40:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 8 Oct 2024 16:40:17 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v22] In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 00:01:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update VectorMath.java Java changes look good (see comments to fix some typos). Needs another HotSpot reviewer. Marked as reviewed by psandoz (Reviewer). src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 275: > 273: * @param b the second operand. > 274: * @return the saturating addition of the operands. > 275: * @see VectorOperators#SADD Suggestion: * @see VectorOperators#SUADD src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 301: > 299: * @param b the second operand. > 300: * @return the saturating difference of the operands. > 301: * @see VectorOperators#SSUB Suggestion: * @see VectorOperators#SUSUB src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 413: > 411: * @param b the second operand. > 412: * @return the saturating addition of the operands. > 413: * @see VectorOperators#SADD Suggestion: * @see VectorOperators#SUADD src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 439: > 437: * @param b the second operand. > 438: * @return the saturating difference of the operands. > 439: * @see VectorOperators#SSUB Suggestion: * @see VectorOperators#SUSUB src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 551: > 549: * @param b the second operand. > 550: * @return the saturating addition of the operands. > 551: * @see VectorOperators#SADD Suggestion: * @see VectorOperators#SUADD src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 577: > 575: * @param b the second operand. > 576: * @return the saturating difference of the operands. > 577: * @see VectorOperators#SSUB Suggestion: * @see VectorOperators#SUSUB ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2354993291 PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2355019508 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792178593 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792178872 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792179260 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792179485 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792179780 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792180281 From psandoz at openjdk.org Tue Oct 8 17:13:10 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 8 Oct 2024 17:13:10 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v16] In-Reply-To: <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> Message-ID: On Thu, 3 Oct 2024 19:05:14 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Sharpening intrinsic exit check. test/jdk/jdk/incubator/vector/templates/Unit-header.template line 408: > 406: for (j = 0; j < vector_len; j++) { > 407: idx = i + j; > 408: wrapped_index =(((int)order[idx]) & (2 * vector_len -1)); This assumes a power of two, can we change to use `Math.floorMod`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1792232986 From lmesnik at openjdk.org Tue Oct 8 17:47:06 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 8 Oct 2024 17:47:06 GMT Subject: Integrated: 8341131: Some jdk/jfr/event/compiler tests shouldn't be executed with Xcomp In-Reply-To: <0s7WMhOzLNXKcDjZkg5NEzvaeCvswLtIbTVEdsezr8M=.56c5ea42-ac74-4be1-87ef-353373f5bd6b@github.com> References: <0s7WMhOzLNXKcDjZkg5NEzvaeCvswLtIbTVEdsezr8M=.56c5ea42-ac74-4be1-87ef-353373f5bd6b@github.com> Message-ID: <4IPHgtEK4RfBTA2R1pMLvCCdzCB8A6jQLpQZFfebiCU=.400c4381-ee42-44ed-93dc-e24ba53b0b36@github.com> On Sat, 28 Sep 2024 03:47:50 GMT, Leonid Mesnik wrote: > Few jdk/jfr/event/compiler tests sensitive to compile flags and shouldn't be executed with Xcomp. This pull request has now been integrated. Changeset: 7312eea3 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/7312eea382eed048b6abdb6409c006fc8e8f45b4 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod 8341131: Some jdk/jfr/event/compiler tests shouldn't be executed with Xcomp Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/21239 From dlong at openjdk.org Tue Oct 8 18:36:12 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Oct 2024 18:36:12 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> <5LaZcZeK2ShpVHDF5LuK1m_Z0RLeQOIdHYXd9t_Vl5c=.ed093c6b-3949-4757-ba4b-a963067ce0ab@github.com> Message-ID: On Tue, 8 Oct 2024 06:23:47 GMT, Kim Barrett wrote: >> I saw that too, but it creates a TypePtr, not a TypeRawPtr. > > Oh, you are right. And TypeRawPtr::make asserts the PTR is neither Constant nor Null. Which makes > both switch cases under modification here supposedly unreachable. That would explain why I never hit > either after running lots of tests. All of the change proposed here can be eliminated, and instead change > both cases to fall through to the default ShouldNotReachHere(). (And that would be another way to > remove the -Wzero-as-null-pointer-constant warning that was how I got here in the first place. :) ) There's TypeRawPtr::make(enum PTR ptr) which doesn't allow Constant or Null, but we are using TypeRawPtr::make(address bits) here. We may need to keep the Constant case. I wouldn't be surprised if there was a way to trigger that path using Unsafe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1792333243 From dlong at openjdk.org Tue Oct 8 19:15:57 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Oct 2024 19:15:57 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: <5t5vgIdcLZyg10tHz23C9NV1c4mFvsDrSDhBp49Ugk0=.b5ba825b-2373-4af5-ba69-062450711bbb@github.com> On Wed, 25 Sep 2024 22:52:18 GMT, Dean Long wrote: >> Instead of bailout in alternative approach we can change `cha_monomorphic_target` to `nullptr` in code which is looking for it in previous lines. `target` will be used for call and we will loose a little performance when JVMTI is used instead of skipping compilation. Am I missing something? > > @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. > > Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. > @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? Either one is fine with me. I could make a separate draft PR with the alternative solution if that helps reviewers decide. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2400626113 From dlong at openjdk.org Tue Oct 8 19:21:58 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Oct 2024 19:21:58 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 21:23:28 GMT, Vladimir Ivanov wrote: > > JVMTI can add and delete methods > > Can you elaborate on that point, please? JVMTI spec states that redefinition/retransformation "must not add, remove or rename fields or methods" [1] [2]. > > [1] https://docs.oracle.com/en/java/javase/23/docs/specs/jvmti.html#RedefineClasses [2] https://docs.oracle.com/en/java/javase/23/docs/specs/jvmti.html#RetransformClasses It's because of the AllowRedefinitionToAddDeleteMethods flag: https://github.com/openjdk/jdk/blob/7312eea382eed048b6abdb6409c006fc8e8f45b4/src/hotspot/share/prims/jvmtiRedefineClasses.cpp#L928 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2400635053 From jbhateja at openjdk.org Tue Oct 8 19:25:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 8 Oct 2024 19:25:24 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v23] In-Reply-To: References: Message-ID: <0LkBvvNPq5jmWfOdjItIXGedRDtpiivJM06BAx7vP0I=.c5417544-edef-4623-beaa-08cd7c565361@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge branch 'JDK-8338201' of http://github.com/jatin-bhateja/jdk into JDK-8338201 - Update VectorMath.java - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 - Typographical error fixups - Doc fixups - Typographic error - Merge stashing and re-commit - Tuning extra spaces. - Tests for newly added VectorMath.* operations - Test cleanups. - ... and 16 more: https://git.openjdk.org/jdk/compare/7312eea3...ce76c3e5 ------------- Changes: https://git.openjdk.org/jdk/pull/20507/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=22 Stats: 9206 lines in 51 files changed: 8778 ins; 27 del; 401 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From qamai at openjdk.org Tue Oct 8 19:50:32 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 8 Oct 2024 19:50:32 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* Message-ID: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Hi, This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. Please take a look and leave your reviews, Thanks a lot. ------------- Commit messages: - more cleanup - copyright - fix tests - cleanup TypeVect Changes: https://git.openjdk.org/jdk/pull/21414/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21414&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341784 Stats: 188 lines in 18 files changed: 4 ins; 73 del; 111 mod Patch: https://git.openjdk.org/jdk/pull/21414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21414/head:pull/21414 PR: https://git.openjdk.org/jdk/pull/21414 From dlong at openjdk.org Tue Oct 8 20:30:34 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Oct 2024 20:30:34 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v2] In-Reply-To: References: Message-ID: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request incrementally with one additional commit since the last revision: simplification based on reviewer comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21148/files - new: https://git.openjdk.org/jdk/pull/21148/files/3b258664..0705b33e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=00-01 Stats: 45 lines in 3 files changed: 11 ins; 33 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From dlong at openjdk.org Tue Oct 8 20:30:34 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Oct 2024 20:30:34 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 16:33:45 GMT, Vladimir Kozlov wrote: >> @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. >> >> Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. > > @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? > I like @vnkozlov suggestion to null out `cha_monomorphic_target`. Moreover, the validation can be performed inside `ciMethod::find_monomorphic_target()` which is used to compute `cha_monomorphic_target`. I like this idea. I pushed a new version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2400759341 From iveresov at openjdk.org Tue Oct 8 20:40:59 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 8 Oct 2024 20:40:59 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v3] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Tue, 8 Oct 2024 15:39:37 GMT, Igor Veresov wrote: >> This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` >> >> The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. >> >> Compare the good graph shape (with control dependency set to `UnknownControl`): >> Good graph >> >> With the graph produce with the nodes pinned: >> Bad graph >> >> With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/gcm.cpp > > Co-authored-by: Tobias Hartmann Oh, I guess I need one of you guys to approve it again after I fixed the comment per Tobias' recommendation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21222#issuecomment-2400776905 From kvn at openjdk.org Tue Oct 8 22:24:59 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 8 Oct 2024 22:24:59 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v2] In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 20:30:34 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > simplification based on reviewer comments This looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21148#pullrequestreview-2355641562 From kvn at openjdk.org Tue Oct 8 22:24:59 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 8 Oct 2024 22:24:59 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v3] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Tue, 8 Oct 2024 15:39:37 GMT, Igor Veresov wrote: >> This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` >> >> The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. >> >> Compare the good graph shape (with control dependency set to `UnknownControl`): >> Good graph >> >> With the graph produce with the nodes pinned: >> Bad graph >> >> With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/gcm.cpp > > Co-authored-by: Tobias Hartmann Re-approved. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21222#pullrequestreview-2355642340 From iveresov at openjdk.org Tue Oct 8 22:34:01 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 8 Oct 2024 22:34:01 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v3] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Tue, 8 Oct 2024 15:39:37 GMT, Igor Veresov wrote: >> This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` >> >> The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. >> >> Compare the good graph shape (with control dependency set to `UnknownControl`): >> Good graph >> >> With the graph produce with the nodes pinned: >> Bad graph >> >> With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/gcm.cpp > > Co-authored-by: Tobias Hartmann Thanks, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21222#issuecomment-2400931753 From kvn at openjdk.org Tue Oct 8 23:07:58 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 8 Oct 2024 23:07:58 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* In-Reply-To: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: On Tue, 8 Oct 2024 19:46:12 GMT, Quan Anh Mai wrote: > Hi, > > This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. > > Please take a look and leave your reviews, > Thanks a lot. Looks reasonable. Need to test it internally. src/hotspot/share/opto/type.cpp line 2531: > 2529: > 2530: //------------------------------meet------------------------------------------- > 2531: // Compute the MEET of two types. It returns a new Type object. It never returns new type now. ------------- PR Review: https://git.openjdk.org/jdk/pull/21414#pullrequestreview-2355671466 PR Review Comment: https://git.openjdk.org/jdk/pull/21414#discussion_r1792593799 From iveresov at openjdk.org Tue Oct 8 23:25:05 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 8 Oct 2024 23:25:05 GMT Subject: Integrated: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result In-Reply-To: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Fri, 27 Sep 2024 16:02:29 GMT, Igor Veresov wrote: > This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` > > The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. > > Compare the good graph shape (with control dependency set to `UnknownControl`): > Good graph > > With the graph produce with the nodes pinned: > Bad graph > > With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. This pull request has now been integrated. Changeset: 7eab0a50 Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/7eab0a506adffac7bed940cc020e37754f0adbdb Stats: 59 lines in 2 files changed: 59 ins; 0 del; 0 mod 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21222 From sviswanathan at openjdk.org Wed Oct 9 00:14:00 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 00:14:00 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 16:44:57 GMT, hanklo6 wrote: > Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > ### Generate test instructions > With `binutils = 2.43` > * `python3 x86-asmtest.py > asmtest.out.h` > ### Run test > * `make test TEST="gtest:AssemblerX86"` test/hotspot/gtest/x86/test_assemblerx86.cpp line 1: > 1: #include "precompiled.hpp" Need to add copyright header to this file at the beginning. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1792637049 From kxu at openjdk.org Wed Oct 9 02:52:38 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Oct 2024 02:52:38 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v10] In-Reply-To: References: Message-ID: <708mNBltFukvzi1tAy1jileWyCS80mGr6UJ2vBAds9E=.7b13b64f-eb7e-440d-9439-fca40b327032@github.com> > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request incrementally with two additional commits since the last revision: - remove trailing empty comments - fix comment grammar Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/ecee68ce..b5bc4f92 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=08-09 Stats: 29 lines in 2 files changed: 0 ins; 0 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From kxu at openjdk.org Wed Oct 9 02:52:38 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Oct 2024 02:52:38 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 15:48:55 GMT, Christian Hagedorn wrote: > Testing looked good. Thank you @chhagedorn. Could please grant an approval once again (after updates only to comments) so we can merge this? Thanks! > test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 65: > >> 63: "mulAndAddToZero", // >> 64: "mulAndAddToMinus1", // >> 65: "mulAndAddToMinus42" // > > Why did you add the trailing `//`? Those are added to prevent formatter from collapsing these lines to one. I've gone ahead to remove them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2401163656 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1792714150 From chagedorn at openjdk.org Wed Oct 9 05:29:59 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Oct 2024 05:29:59 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v10] In-Reply-To: <708mNBltFukvzi1tAy1jileWyCS80mGr6UJ2vBAds9E=.7b13b64f-eb7e-440d-9439-fca40b327032@github.com> References: <708mNBltFukvzi1tAy1jileWyCS80mGr6UJ2vBAds9E=.7b13b64f-eb7e-440d-9439-fca40b327032@github.com> Message-ID: On Wed, 9 Oct 2024 02:52:38 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with two additional commits since the last revision: > > - remove trailing empty comments > - fix comment grammar > > Co-authored-by: Christian Hagedorn Still good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20754#pullrequestreview-2356049327 From thartmann at openjdk.org Wed Oct 9 07:03:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Oct 2024 07:03:02 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v3] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 07:42:54 GMT, Christian Hagedorn wrote: >> This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. >> >> ### Predicate Interfaces and Implementing Classes >> - `Predicate` interface is implemented by four predicate classes: >> - `ParsePredicate` (existing class) >> - `RuntimePredicate` (existing and updated class) >> - `TemplateAssertionPredicate` (new class) >> - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) >> >> ### Predicate Iterator with Visitor classes >> There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: >> - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. >> - Replaces the old now retired `ParsePredicateIterator`. >> - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. >> - Replaces the old now retired `PredicateEntryIterator`. >> - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. >> >> #### To Be Replaced soon >> There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. >> >> ### More Information >> More information about specific classes and changes can be found as code comments and PR comments. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add missing public for UnifiedPredicateVisitor Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21161#pullrequestreview-2356226816 From duke at openjdk.org Wed Oct 9 07:04:09 2024 From: duke at openjdk.org (Daniel Skantz) Date: Wed, 9 Oct 2024 07:04:09 GMT Subject: Integrated: 8330157: C2: Add a stress flag for bailouts In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 07:14:20 GMT, Daniel Skantz wrote: > This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). > > We check two invariants. > a) Bailouts should be successful starting from any given `failing()` check. > b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). > > a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. > > The added flag should not have any effect in debug mode. > > Testing: > > T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. This pull request has now been integrated. Changeset: d3f3c6a8 Author: Daniel Skantz URL: https://git.openjdk.org/jdk/commit/d3f3c6a8cdf862df3a72f60c824ce50d37231061 Stats: 201 lines in 17 files changed: 167 ins; 0 del; 34 mod 8330157: C2: Add a stress flag for bailouts Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19646 From roland at openjdk.org Wed Oct 9 07:19:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 07:19:00 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: References: Message-ID: <6ZPkkz5717Lsy7F4KF4qKkWJkM0qoOXgdcCUeFlwvm0=.e6b22c97-df8b-4b20-8cfc-7041d71450b2@github.com> On Tue, 1 Oct 2024 13:22:23 GMT, Roland Westrelin wrote: >> When converting a `LongCountedLoop` into a loop nest, c2 needs jvm >> state to add predicates to the inner loop. For that, it peels an >> iteration of the loop and uses the state of the safepoint at the end >> of the loop. That's only legal if there's no side effect between the >> safepoint and the backedge that goes back into the loop. The assert >> failure here happens in code that checks that. >> >> That code compares the memory states at the safepoint and at the >> backedge. If they are the same then there's no side effect. To check >> consistency, the `MergeMem` at the safepoint is cloned. As the logic >> iterates over the backedge state, it clears every component of the >> state it encounters from the `MergeMem`. Once done, the cloned >> `MergeMem` should be "empty". In the case of this failure, no side >> effect is found but the cloned `MergeMem` is not empty. That happens >> because of EA: it adds edges to the `MergeMem` at the safepoint that >> it doesn't add to the backedge `Phis`. >> >> So it's the verification code that fails and I propose dealing with >> this by ignoring memory state added by EA in the verification code. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8336702 > - test indentation > - fix & test Anyone else for a review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21009#issuecomment-2401525131 From roland at openjdk.org Wed Oct 9 07:19:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 07:19:00 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: <4ckwGsbjH0VDbGmBBTuG4HUc6ARwbISQ7L8xsVCeqDs=.28d696f2-47ba-4323-855d-e71369242876@github.com> References: <4ckwGsbjH0VDbGmBBTuG4HUc6ARwbISQ7L8xsVCeqDs=.28d696f2-47ba-4323-855d-e71369242876@github.com> Message-ID: <9txXoz-5H9NuAz9aubEzaYp-VLYNXIJCFe_InxZ3-zQ=.94c9facc-c44d-4766-851a-4bf31e7ba76f@github.com> On Mon, 7 Oct 2024 06:00:57 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java >> >> Co-authored-by: Christian Hagedorn > > Marked as reviewed by thartmann (Reviewer). @TobiHartmann Do you have an update on testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21303#issuecomment-2401523971 From thartmann at openjdk.org Wed Oct 9 07:31:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Oct 2024 07:31:02 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: <3okDpS1wmMEyxWhDIJyCNb2jmSGic_7GWhB3KPt4VdA=.b1eb8167-0b28-4a57-af72-5e50cf3fea74@github.com> On Wed, 2 Oct 2024 13:58:14 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn Sorry, that slipped through. Testing looked good. Let me re-run some quick testing with the latest updates. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21303#issuecomment-2401546792 From chagedorn at openjdk.org Wed Oct 9 08:03:09 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Oct 2024 08:03:09 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:22:23 GMT, Roland Westrelin wrote: >> When converting a `LongCountedLoop` into a loop nest, c2 needs jvm >> state to add predicates to the inner loop. For that, it peels an >> iteration of the loop and uses the state of the safepoint at the end >> of the loop. That's only legal if there's no side effect between the >> safepoint and the backedge that goes back into the loop. The assert >> failure here happens in code that checks that. >> >> That code compares the memory states at the safepoint and at the >> backedge. If they are the same then there's no side effect. To check >> consistency, the `MergeMem` at the safepoint is cloned. As the logic >> iterates over the backedge state, it clears every component of the >> state it encounters from the `MergeMem`. Once done, the cloned >> `MergeMem` should be "empty". In the case of this failure, no side >> effect is found but the cloned `MergeMem` is not empty. That happens >> because of EA: it adds edges to the `MergeMem` at the safepoint that >> it doesn't add to the backedge `Phis`. >> >> So it's the verification code that fails and I propose dealing with >> this by ignoring memory state added by EA in the verification code. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8336702 > - test indentation > - fix & test Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21009#pullrequestreview-2356368812 From duke at openjdk.org Wed Oct 9 09:13:37 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Wed, 9 Oct 2024 09:13:37 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Replace LogDecorators::Decorator::NoDecorator with LogDecorators::None ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/a80d5fe1..5c933c06 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=04-05 Stats: 13 lines in 3 files changed: 0 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From jbhateja at openjdk.org Wed Oct 9 09:59:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 9 Oct 2024 09:59:11 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: > This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. > > > MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) > MulL (URShift SRC1 , 32) (URShift SRC2, 32) > MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) > MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 1299.407 ops/ms > VectorXXH3HashingB... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction ------------- Changes: https://git.openjdk.org/jdk/pull/21244/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=01 Stats: 354 lines in 12 files changed: 343 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From jbhateja at openjdk.org Wed Oct 9 10:11:03 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 9 Oct 2024 10:11:03 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Hi @iwanowww , @sviswa7, @merykitty, Can you kindly review this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2401895553 From jbhateja at openjdk.org Wed Oct 9 10:12:57 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 9 Oct 2024 10:12:57 GMT Subject: RFR: 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 08:09:29 GMT, Jatin Bhateja wrote: > - Enable APX EGPRs state save restoration check which triggers synthetic SIGSEGV and verifies modified EGPRs state across OS signal handling for non-product builds to match with [corresponding logic in signal handlers.](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L251) > > - Currently we haven't enabled APX support in product builds and intend to do so once entire planned support ([JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030)) is validated and checked into JDK-mainline, we are following incremental development approach for APX and hence don't want partial APX support to be enabled in intermediate releases. > > Kindly review. > > Best Regards, > Jatin Hi @TobiHartmann , @vnkozlov , @sviswa7 can you kindly check this small patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21419#issuecomment-2401895714 From thartmann at openjdk.org Wed Oct 9 10:36:58 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Oct 2024 10:36:58 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 13:58:14 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn All testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21303#issuecomment-2401946988 From thartmann at openjdk.org Wed Oct 9 11:24:59 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Oct 2024 11:24:59 GMT Subject: RFR: 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 08:09:29 GMT, Jatin Bhateja wrote: > - Enable APX EGPRs state save restoration check which triggers synthetic SIGSEGV and verifies modified EGPRs state across OS signal handling for non-product builds to match with [corresponding logic in signal handlers.](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L251) > > - Currently we haven't enabled APX support in product builds and intend to do so once entire planned support ([JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030)) is validated and checked into JDK-mainline, we are following incremental development approach for APX and hence don't want partial APX support to be enabled in intermediate releases. > > Kindly review. > > Best Regards, > Jatin Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21419#pullrequestreview-2356853675 From chagedorn at openjdk.org Wed Oct 9 11:45:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Oct 2024 11:45:12 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v3] In-Reply-To: References: Message-ID: <2g1Fk2CW1jURKwEvmKKbHsBWlBlTQzLsdOwwDRpoboM=.0bb892bf-74fe-4a34-9bce-1b19ec641b58@github.com> On Thu, 26 Sep 2024 07:42:54 GMT, Christian Hagedorn wrote: >> This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. >> >> ### Predicate Interfaces and Implementing Classes >> - `Predicate` interface is implemented by four predicate classes: >> - `ParsePredicate` (existing class) >> - `RuntimePredicate` (existing and updated class) >> - `TemplateAssertionPredicate` (new class) >> - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) >> >> ### Predicate Iterator with Visitor classes >> There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: >> - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. >> - Replaces the old now retired `ParsePredicateIterator`. >> - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. >> - Replaces the old now retired `PredicateEntryIterator`. >> - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. >> >> #### To Be Replaced soon >> There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. >> >> ### More Information >> More information about specific classes and changes can be found as code comments and PR comments. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add missing public for UnifiedPredicateVisitor Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21161#issuecomment-2402086101 From chagedorn at openjdk.org Wed Oct 9 11:45:13 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Oct 2024 11:45:13 GMT Subject: Integrated: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 14:19:41 GMT, Christian Hagedorn wrote: > This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. > > ### Predicate Interfaces and Implementing Classes > - `Predicate` interface is implemented by four predicate classes: > - `ParsePredicate` (existing class) > - `RuntimePredicate` (existing and updated class) > - `TemplateAssertionPredicate` (new class) > - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) > > ### Predicate Iterator with Visitor classes > There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: > - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. > - Replaces the old now retired `ParsePredicateIterator`. > - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. > - Replaces the old now retired `PredicateEntryIterator`. > - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. > > #### To Be Replaced soon > There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. > > ### More Information > More information about specific classes and changes can be found as code comments and PR comments. > > Thanks, > Christian This pull request has now been integrated. Changeset: 3fba1702 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/3fba1702cd8dc817b11bfa51077c41424d289281 Stats: 566 lines in 4 files changed: 420 ins; 62 del; 84 mod 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21161 From duke at openjdk.org Wed Oct 9 12:21:58 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Wed, 9 Oct 2024 12:21:58 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 13:20:36 GMT, Ant?n Seoane wrote: >> There is `LogDecorators::None` > > LogDecorators::None is defined in the .cpp, so I'd either have to make it "visible" or use the alternative NoDecorators. Both options are fine for me I am using `LogDecorators::None` now, I think it is cleaner than the `NoDecorators` alternative ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1793418398 From stefank at openjdk.org Wed Oct 9 13:42:12 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 9 Oct 2024 13:42:12 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. Looks good to me. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21422#pullrequestreview-2357165925 From fbredberg at openjdk.org Wed Oct 9 13:42:12 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 9 Oct 2024 13:42:12 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 Message-ID: This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. ------------- Commit messages: - 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 Changes: https://git.openjdk.org/jdk/pull/21422/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21422&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341854 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21422.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21422/head:pull/21422 PR: https://git.openjdk.org/jdk/pull/21422 From aboldtch at openjdk.org Wed Oct 9 13:42:12 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 9 Oct 2024 13:42:12 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21422#pullrequestreview-2357186012 From pchilanomate at openjdk.org Wed Oct 9 13:44:59 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 9 Oct 2024 13:44:59 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. Looks good, thanks for fixing this. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21422#pullrequestreview-2357206121 From kbarrett at openjdk.org Wed Oct 9 14:57:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 9 Oct 2024 14:57:37 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v3] In-Reply-To: References: Message-ID: > Please review this change to TypeRawPtr::add_offset to prevent a compiler from > inferring things based on prior pointer arithmetic not invoking UB. As noted in > the bug report, clang is actually doing this. > > To accomplish this, changed to integral arithmetic. Also added over/underflow > checks. > > Also made a couple of minor touchups. Replaced an implicit conversion to bool > with an explicit compare to nullptr (per style guide). Removed a no longer > needed dummy return after a (now) noreturn function. > > Testing: mach5 tier1-7 > That testing was with calls to "fatal" for the over/underflow cases and the > sum==0 case. There were no hits. I'm not sure how to construct a test that > would hit those. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: remove unreachable TypePtr::Null case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21324/files - new: https://git.openjdk.org/jdk/pull/21324/files/cc1f2ac8..c3dc62e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21324&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21324&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21324.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21324/head:pull/21324 PR: https://git.openjdk.org/jdk/pull/21324 From kbarrett at openjdk.org Wed Oct 9 14:57:38 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 9 Oct 2024 14:57:38 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> <5LaZcZeK2ShpVHDF5LuK1m_Z0RLeQOIdHYXd9t_Vl5c=.ed093c6b-3949-4757-ba4b-a963067ce0ab@github.com> Message-ID: On Tue, 8 Oct 2024 18:32:54 GMT, Dean Long wrote: >> Oh, you are right. And TypeRawPtr::make asserts the PTR is neither Constant nor Null. Which makes >> both switch cases under modification here supposedly unreachable. That would explain why I never hit >> either after running lots of tests. All of the change proposed here can be eliminated, and instead change >> both cases to fall through to the default ShouldNotReachHere(). (And that would be another way to >> remove the -Wzero-as-null-pointer-constant warning that was how I got here in the first place. :) ) > > There's TypeRawPtr::make(enum PTR ptr) which doesn't allow Constant or Null, but we are using TypeRawPtr::make(address bits) here. > We may need to keep the Constant case. I wouldn't be surprised if there was a way to trigger that path using Unsafe. Yeah, keeping it makes sense. I've removed the TypePtr::Null case, allowing that one to default to ShuoldNotReachHere(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1793675908 From roland at openjdk.org Wed Oct 9 15:02:09 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 15:02:09 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 10:33:57 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java >> >> Co-authored-by: Christian Hagedorn > > All testing passed. @TobiHartmann @chhagedorn thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21303#issuecomment-2402582761 From roland at openjdk.org Wed Oct 9 15:02:12 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 15:02:12 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: References: Message-ID: <0TN0cCdVnmSXZIojKFYweszMATqugx7m7TSfXSSF5X8=.0db400f0-b313-41c4-8fdf-9f321217c250@github.com> On Wed, 2 Oct 2024 10:42:14 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8336702 >> - test indentation >> - fix & test > > Looks good to me. Testing passed. @TobiHartmann @chhagedorn thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21009#issuecomment-2402577712 From roland at openjdk.org Wed Oct 9 15:02:14 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 15:02:14 GMT Subject: Integrated: 8336702: C2 compilation fails with "all memory state should have been processed" assert In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:34:44 GMT, Roland Westrelin wrote: > When converting a `LongCountedLoop` into a loop nest, c2 needs jvm > state to add predicates to the inner loop. For that, it peels an > iteration of the loop and uses the state of the safepoint at the end > of the loop. That's only legal if there's no side effect between the > safepoint and the backedge that goes back into the loop. The assert > failure here happens in code that checks that. > > That code compares the memory states at the safepoint and at the > backedge. If they are the same then there's no side effect. To check > consistency, the `MergeMem` at the safepoint is cloned. As the logic > iterates over the backedge state, it clears every component of the > state it encounters from the `MergeMem`. Once done, the cloned > `MergeMem` should be "empty". In the case of this failure, no side > effect is found but the cloned `MergeMem` is not empty. That happens > because of EA: it adds edges to the `MergeMem` at the safepoint that > it doesn't add to the backedge `Phis`. > > So it's the verification code that fails and I propose dealing with > this by ignoring memory state added by EA in the verification code. This pull request has now been integrated. Changeset: ecc77a5b Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/ecc77a5b4a84c84ffa1580174872af6df3a4f6ca Stats: 75 lines in 2 files changed: 73 ins; 0 del; 2 mod 8336702: C2 compilation fails with "all memory state should have been processed" assert Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/21009 From roland at openjdk.org Wed Oct 9 15:02:10 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 15:02:10 GMT Subject: Integrated: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop In-Reply-To: References: Message-ID: <-9qrfoG6r9BG2K8oxSRpTImnaiQQCuXRiT4IggzqVkU=.2b3cf70a-76ec-46a8-b1f5-ecc63f407e53@github.com> On Wed, 2 Oct 2024 11:21:43 GMT, Roland Westrelin wrote: > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. This pull request has now been integrated. Changeset: ff2f39f2 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/ff2f39f24018436556a8956ec55da433dc697437 Stats: 124 lines in 4 files changed: 112 ins; 1 del; 11 mod 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21303 From kxu at openjdk.org Wed Oct 9 15:11:12 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Oct 2024 15:11:12 GMT Subject: Integrated: 8325495: C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 19:27:29 GMT, Kangcheng Xu wrote: > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. This pull request has now been integrated. Changeset: c30ad012 Author: Kangcheng Xu URL: https://git.openjdk.org/jdk/commit/c30ad0124e7743f3a4c29ef901761f8fcc53de10 Stats: 414 lines in 3 files changed: 414 ins; 0 del; 0 mod 8325495: C2: implement optimization for series of Add of unique value Reviewed-by: chagedorn, roland ------------- PR: https://git.openjdk.org/jdk/pull/20754 From dcubed at openjdk.org Wed Oct 9 16:08:58 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 9 Oct 2024 16:08:58 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. > > Tested ok in tier1-3 on all x64 related platforms. Also ran the vm.lang.LockUnlock.testContendedLock test. Thumbs up. I think this is a trivial fix since the new instruction: `orl(t, 1)` is one of the well known ways to set ZF to 0. It's just the opposite of the well known way to set ZF to 1 used on L835 below: `xorl(t, t)`. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21422#pullrequestreview-2357597296 From sviswanathan at openjdk.org Wed Oct 9 16:29:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 16:29:06 GMT Subject: RFR: 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 08:09:29 GMT, Jatin Bhateja wrote: > - Enable APX EGPRs state save restoration check which triggers synthetic SIGSEGV and verifies modified EGPRs state across OS signal handling for non-product builds to match with [corresponding logic in signal handlers.](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L251) > > - Currently we haven't enabled APX support in product builds and intend to do so once entire planned support ([JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030)) is validated and checked into JDK-mainline, we are following incremental development approach for APX and hence don't want partial APX support to be enabled in intermediate releases. > > Kindly review. > > Best Regards, > Jatin Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21419#pullrequestreview-2357643470 From fbredberg at openjdk.org Wed Oct 9 16:43:58 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 9 Oct 2024 16:43:58 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. > > Tested ok in tier1-3 on all x64 related platforms. Also ran the vm.lang.LockUnlock.testContendedLock test. Thanks everyone for the quick review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21422#issuecomment-2402812881 From fbredberg at openjdk.org Wed Oct 9 16:49:02 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 9 Oct 2024 16:49:02 GMT Subject: Integrated: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. > > Tested ok in tier1-3 on all x64 related platforms. Also ran the vm.lang.LockUnlock.testContendedLock test. This pull request has now been integrated. Changeset: fcc9c8d5 Author: Fredrik Bredberg URL: https://git.openjdk.org/jdk/commit/fcc9c8d570396506068e0a1d4123e32b195e6653 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 Reviewed-by: stefank, aboldtch, pchilanomate, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/21422 From kvn at openjdk.org Wed Oct 9 16:59:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 9 Oct 2024 16:59:01 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* In-Reply-To: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: <0GutJI2Gd9s2w3wK9R4UJBYhbyrYZIakEGEaz34jvZw=.95efd71c-d350-4a34-bfda-4cbf06599c1e@github.com> On Tue, 8 Oct 2024 19:46:12 GMT, Quan Anh Mai wrote: > Hi, > > This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. > > Please take a look and leave your reviews, > Thanks a lot. My testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2402839331 From kvn at openjdk.org Wed Oct 9 17:10:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 9 Oct 2024 17:10:00 GMT Subject: RFR: 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 08:09:29 GMT, Jatin Bhateja wrote: > - Enable APX EGPRs state save restoration check which triggers synthetic SIGSEGV and verifies modified EGPRs state across OS signal handling for non-product builds to match with [corresponding logic in signal handlers.](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L251) > > - Currently we haven't enabled APX support in product builds and intend to do so once entire planned support ([JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030)) is validated and checked into JDK-mainline, we are following incremental development approach for APX and hence don't want partial APX support to be enabled in intermediate releases. > > Kindly review. > > Best Regards, > Jatin Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21419#pullrequestreview-2357732186 From qamai at openjdk.org Wed Oct 9 17:12:33 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 9 Oct 2024 17:12:33 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v2] In-Reply-To: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: > Hi, > > This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. > > Please take a look and leave your reviews, > Thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: style changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21414/files - new: https://git.openjdk.org/jdk/pull/21414/files/78b88e46..90f11d40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21414&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21414&range=00-01 Stats: 22 lines in 2 files changed: 1 ins; 6 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21414/head:pull/21414 PR: https://git.openjdk.org/jdk/pull/21414 From qamai at openjdk.org Wed Oct 9 17:12:34 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 9 Oct 2024 17:12:34 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* In-Reply-To: <0GutJI2Gd9s2w3wK9R4UJBYhbyrYZIakEGEaz34jvZw=.95efd71c-d350-4a34-bfda-4cbf06599c1e@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <0GutJI2Gd9s2w3wK9R4UJBYhbyrYZIakEGEaz34jvZw=.95efd71c-d350-4a34-bfda-4cbf06599c1e@github.com> Message-ID: On Wed, 9 Oct 2024 16:55:55 GMT, Vladimir Kozlov wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > My testing passed. @vnkozlov Thanks for your reviews and testings, the latest commit addresses your concern, as well as contains some minor style changes and a removal of switch case duplication. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2402865288 From jbhateja at openjdk.org Wed Oct 9 17:47:07 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 9 Oct 2024 17:47:07 GMT Subject: Integrated: 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds In-Reply-To: References: Message-ID: <3LlcLPjSu70smZz7MpxZw4TGI9F7N3qWfarmBBA5ET8=.2bd0f976-836d-415c-ae2b-38357629db27@github.com> On Wed, 9 Oct 2024 08:09:29 GMT, Jatin Bhateja wrote: > - Enable APX EGPRs state save restoration check which triggers synthetic SIGSEGV and verifies modified EGPRs state across OS signal handling for non-product builds to match with [corresponding logic in signal handlers.](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L251) > > - Currently we haven't enabled APX support in product builds and intend to do so once entire planned support ([JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030)) is validated and checked into JDK-mainline, we are following incremental development approach for APX and hence don't want partial APX support to be enabled in intermediate releases. > > Kindly review. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 3180aaa3 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/3180aaa370de16eb1835e1f57664b9fb15a6bb01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds Reviewed-by: thartmann, sviswanathan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21419 From kvn at openjdk.org Wed Oct 9 17:52:59 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 9 Oct 2024 17:52:59 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v2] In-Reply-To: References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: On Wed, 9 Oct 2024 17:12:33 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style changes Good. You need second review because change is not trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21414#pullrequestreview-2357817812 From kxu at openjdk.org Wed Oct 9 18:21:30 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Oct 2024 18:21:30 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v22] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: remove <= test cases, disable StressLongCountedLoop and PerMethodTrapLimit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/32bedd00..845e34cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=20-21 Stats: 65 lines in 1 file changed: 6 ins; 37 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Wed Oct 9 18:21:30 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Oct 2024 18:21:30 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v21] In-Reply-To: <1R6tWNQF3WSJ4joPaoWR3N_JKy0W-BB0Zntg00N-mlU=.e8b25703-93ea-42be-ba1d-46e06c17987c@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <1R6tWNQF3WSJ4joPaoWR3N_JKy0W-BB0Zntg00N-mlU=.e8b25703-93ea-42be-ba1d-46e06c17987c@github.com> Message-ID: On Mon, 7 Oct 2024 07:56:19 GMT, Tobias Hartmann wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> correctly verify outputs with custom @Run methods > > `compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java` times out in our testing both with `-XX:StressLongCountedLoop=200000000` and with `-XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0`: > > > "main" #1 [2771172] prio=5 os_prio=0 cpu=500187.70ms elapsed=503.08s allocated=6554K defined_classes=227 tid=0x0000ffff9002d550 nid=2771172 runnable [0x0000ffff972bf000] > java.lang.Thread.State: RUNNABLE > Thread: 0x0000ffff9002d550 [0x2a48e4] State: _at_safepoint _at_poll_safepoint 1 > JavaThread state: _thread_blocked > at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:93) > at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.runTestIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:103) > at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base at 24-internal/DirectMethodHandle$Holder) > at java.lang.invoke.LambdaForm$MH/0x0000ffff58460870.invoke(java.base at 24-internal/LambdaForm$MH) > at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base at 24-internal/Invokers$Holder) > at jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(java.base at 24-internal/DirectMethodHandleAccessor.java:154) > at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(java.base at 24-internal/DirectMethodHandleAccessor.java:104) > at java.lang.reflect.Method.invoke(java.base at 24-internal/Method.java:573) > at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159) > at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:98) > at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:89) > at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861) > at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252) > at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165) @TobiHartmann. Thanks for the feedback! I did some investigation, reasons for timeouts comes three folds: 1. Tests with `i <= stop` is not a counted loop in the first place and should be removed: Now I remember why I originally didn't test for it. Consider `for (int i = 0; i <= stop; i++);` when `stop = Integer.MAX_VALUE`. Overflow in Java is well-defined, which means the code must loop definitely and optimizations of any kind can't break this. Therefore, `<=` are not counted loops to begin with. `@IR(failOn = {IRNode.COUNTED_LOOP})` doesn't fail either. I removed these test cases. 2. It is normal to timeout with `-XX:StressLongCountedLoop=200000000` for all test cases: An value other than `0` for this flag will forcefully convert int counted loops to long counted loops, which C2 doesn't do parallel IV at this point. This is same issue as [JDK-8294839](https://bugs.openjdk.org/browse/JDK-8294838). Loops are still loops. For a large random `stop` value, this will take a long time to loop through. 3. It is normal to timeout with `-XX:PerMethodTrapLimit=0` for test cases with stride other than `1`: Take `for (int i = 0; i < stop; i += 2)` for an example. Since there is a chance for increment to `i` go beyond `stop` (and eventually overflows), there must be some sort of runtime check for `stop`. Normally, a `loop_limit_check` trap is compiled to take the slow path (deoptimization). However, the zero trap limit forces C2 to loop and check `i < stop` on every iteration. For a large random `stop` value, this will take a long time. For the latter two reasons, I added `runWithFlags()` to essentially disable the flags in questions. https://github.com/openjdk/jdk/blob/845e34cc7a82ef5cb69620a12f487adaca9d2613/test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java#L47-L51 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2402984653 From azvegint at openjdk.org Wed Oct 9 18:24:40 2024 From: azvegint at openjdk.org (Alexander Zvegintsev) Date: Wed, 9 Oct 2024 18:24:40 GMT Subject: Integrated: 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 18:02:09 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList applications/ctw/modules/java_base_2.java on linux-x64. Marked as reviewed by azvegint (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21430#pullrequestreview-2357847049 From dcubed at openjdk.org Wed Oct 9 18:24:40 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 9 Oct 2024 18:24:40 GMT Subject: Integrated: 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 18:06:25 GMT, Alexander Zvegintsev wrote: >> A trivial fix to ProblemList applications/ctw/modules/java_base_2.java on linux-x64. > > Marked as reviewed by azvegint (Reviewer). @azvegint - Thanks for the lightning fast review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21430#issuecomment-2402968716 From dcubed at openjdk.org Wed Oct 9 18:24:40 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 9 Oct 2024 18:24:40 GMT Subject: Integrated: 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 In-Reply-To: References: Message-ID: <8xEj6FBjBgN_nnLs4fVRN-5v2nhuCdys87o7qQ5NInY=.5e43e62f-1728-4337-b1dc-c0a70cc9accd@github.com> On Wed, 9 Oct 2024 18:02:09 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList applications/ctw/modules/java_base_2.java on linux-x64. This pull request has now been integrated. Changeset: a45abf13 Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/a45abf131be9ee52828c5db18a18847c45ae6994 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 Reviewed-by: azvegint ------------- PR: https://git.openjdk.org/jdk/pull/21430 From dcubed at openjdk.org Wed Oct 9 18:24:40 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 9 Oct 2024 18:24:40 GMT Subject: Integrated: 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 Message-ID: A trivial fix to ProblemList applications/ctw/modules/java_base_2.java on linux-x64. ------------- Commit messages: - 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 Changes: https://git.openjdk.org/jdk/pull/21430/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21430&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341860 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21430.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21430/head:pull/21430 PR: https://git.openjdk.org/jdk/pull/21430 From svkamath at openjdk.org Wed Oct 9 18:31:41 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Wed, 9 Oct 2024 18:31:41 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: > 8341052: SHA-512 implementation using SHA-NI Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Addressed a review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20633/files - new: https://git.openjdk.org/jdk/pull/20633/files/afeb5028..85c1aea9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20633&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20633&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20633.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20633/head:pull/20633 PR: https://git.openjdk.org/jdk/pull/20633 From dlong at openjdk.org Wed Oct 9 18:53:07 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 9 Oct 2024 18:53:07 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 16:33:45 GMT, Vladimir Kozlov wrote: >> @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. >> >> Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. > > @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? Thanks @vnkozlov. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2403058861 From jkarthikeyan at openjdk.org Wed Oct 9 19:29:18 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 9 Oct 2024 19:29:18 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v2] In-Reply-To: References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: On Wed, 9 Oct 2024 17:12:33 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style changes I think this is a really nice cleanup, it makes working with vectors more clear. I've added some minor stylistic fixes you could do since you're already changing these lines. src/hotspot/share/opto/type.cpp line 2532: > 2530: //------------------------------meet------------------------------------------- > 2531: // Compute the MEET of two types. Since each TypeVect is the only instance of > 2532: // its species, meetting often returns itself Suggestion: // its species, meeting often returns itself. src/hotspot/share/opto/vectorIntrinsics.cpp line 602: > 600: } > 601: > 602: const TypeVect * vt = TypeVect::make(elem_bt, num_elem); Suggestion: const TypeVect* vt = TypeVect::make(elem_bt, num_elem); src/hotspot/share/opto/vectorIntrinsics.cpp line 624: > 622: > 623: Node * mod_val = gvn().makecon(TypeInt::make(num_elem-1)); > 624: Node * bcast_mod = gvn().transform(VectorNode::scalar2vector(mod_val, num_elem, elem_bt)); Suggestion: Node* bcast_mod = gvn().transform(VectorNode::scalar2vector(mod_val, num_elem, elem_bt)); src/hotspot/share/opto/vectorIntrinsics.cpp line 2202: > 2200: > 2201: // cast index vector from elem_bt vector to byte vector > 2202: const TypeVect * byte_vt = TypeVect::make(T_BYTE, num_elem); Suggestion: const TypeVect* byte_vt = TypeVect::make(T_BYTE, num_elem); ------------- Marked as reviewed by jkarthikeyan (Committer). PR Review: https://git.openjdk.org/jdk/pull/21414#pullrequestreview-2358047181 PR Review Comment: https://git.openjdk.org/jdk/pull/21414#discussion_r1794077620 PR Review Comment: https://git.openjdk.org/jdk/pull/21414#discussion_r1794068522 PR Review Comment: https://git.openjdk.org/jdk/pull/21414#discussion_r1794070557 PR Review Comment: https://git.openjdk.org/jdk/pull/21414#discussion_r1794069515 From dlong at openjdk.org Wed Oct 9 20:15:17 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 9 Oct 2024 20:15:17 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v3] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 14:57:37 GMT, Kim Barrett wrote: >> Please review this change to TypeRawPtr::add_offset to prevent a compiler from >> inferring things based on prior pointer arithmetic not invoking UB. As noted in >> the bug report, clang is actually doing this. >> >> To accomplish this, changed to integral arithmetic. Also added over/underflow >> checks. >> >> Also made a couple of minor touchups. Replaced an implicit conversion to bool >> with an explicit compare to nullptr (per style guide). Removed a no longer >> needed dummy return after a (now) noreturn function. >> >> Testing: mach5 tier1-7 >> That testing was with calls to "fatal" for the over/underflow cases and the >> sum==0 case. There were no hits. I'm not sure how to construct a test that >> would hit those. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove unreachable TypePtr::Null case Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21324#pullrequestreview-2358219138 From sviswanathan at openjdk.org Wed Oct 9 20:16:13 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 20:16:13 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 18:31:41 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed a review comment Marked as reviewed by sviswanathan (Reviewer). Looks good to me. ------------- PR Review: https://git.openjdk.org/jdk/pull/20633#pullrequestreview-2358219543 PR Comment: https://git.openjdk.org/jdk/pull/20633#issuecomment-2403346825 From vlivanov at openjdk.org Wed Oct 9 21:20:18 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 9 Oct 2024 21:20:18 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v2] In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 20:30:34 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > simplification based on reviewer comments src/hotspot/share/ci/ciMethod.cpp line 692: > 690: > 691: // Redefinition support. > 692: if (this->get_Method()->is_old() || root_m->get_Method()->is_old()) { Is it safe to access raw `Method*` from a compiler thread which is not in VM state? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1794240981 From duke at openjdk.org Wed Oct 9 21:47:21 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 9 Oct 2024 21:47:21 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v2] In-Reply-To: References: Message-ID: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> > Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > ### Generate test instructions > With `binutils = 2.43` > * `python3 x86-asmtest.py > asmtest.out.h` > ### Run test > * `make test TEST="gtest:AssemblerX86"` hanklo6 has updated the pull request incrementally with one additional commit since the last revision: Add copyright header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20857/files - new: https://git.openjdk.org/jdk/pull/20857/files/2f258ba9..766582d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=00-01 Stats: 44 lines in 2 files changed: 44 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20857.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20857/head:pull/20857 PR: https://git.openjdk.org/jdk/pull/20857 From sviswanathan at openjdk.org Wed Oct 9 21:47:35 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 21:47:35 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v2] In-Reply-To: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> References: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> Message-ID: On Wed, 9 Oct 2024 21:47:21 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request incrementally with one additional commit since the last revision: > > Add copyright header Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20857#pullrequestreview-2358391276 From sviswanathan at openjdk.org Wed Oct 9 21:57:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 21:57:12 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v2] In-Reply-To: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> References: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> Message-ID: On Wed, 9 Oct 2024 21:47:21 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request incrementally with one additional commit since the last revision: > > Add copyright header @vnkozlov We look forward to your inputs on this encoding test PR. It takes care of the testing action item that came up during the review of APX instruction encoding PR (https://github.com/openjdk/jdk/pull/18476). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20857#issuecomment-2403497773 From kvn at openjdk.org Wed Oct 9 22:18:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 9 Oct 2024 22:18:13 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v2] In-Reply-To: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> References: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> Message-ID: On Wed, 9 Oct 2024 21:47:21 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request incrementally with one additional commit since the last revision: > > Add copyright header Is this test for both 32- and 64-bits instructions/VMs? How complete the set of instructions covered by the test? test/hotspot/gtest/x86/test_assemblerx86.cpp line 26: > 24: #include "precompiled.hpp" > 25: > 26: #if defined(X86) You may add ` && !defined(ZERO)` similar to `test_assembler_aarch64.cpp` test. test/hotspot/gtest/x86/test_assemblerx86.cpp line 93: > 91: address entry = __ pc(); > 92: > 93: // python x86-asmtest.py | expand > asmtest.out.h The PR description shows different instructions to build: With binutils = 2.43 python3 x86-asmtest.py > asmtest.out.h I would like to have comment with correct and detailed instructions how to build `asmtest.out.h` ------------- PR Review: https://git.openjdk.org/jdk/pull/20857#pullrequestreview-2358422678 PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1794304617 PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1794301774 From qamai at openjdk.org Thu Oct 10 01:09:06 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 10 Oct 2024 01:09:06 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> > Hi, > > This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. > > Please take a look and leave your reviews, > Thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: more style changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21414/files - new: https://git.openjdk.org/jdk/pull/21414/files/90f11d40..a99a7434 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21414&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21414&range=01-02 Stats: 11 lines in 3 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21414/head:pull/21414 PR: https://git.openjdk.org/jdk/pull/21414 From qamai at openjdk.org Thu Oct 10 01:10:18 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 10 Oct 2024 01:10:18 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v2] In-Reply-To: References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: On Wed, 9 Oct 2024 19:26:50 GMT, Jasmine Karthikeyan wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> style changes > > I think this is a really nice cleanup, it makes working with vectors more clear. I've added some minor stylistic fixes you could do since you're already changing these lines. @jaskarth Nice suggestions, I have reviewed the patch and done similar changes to nearby lines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2403696783 From jkarthikeyan at openjdk.org Thu Oct 10 03:04:47 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 10 Oct 2024 03:04:47 GMT Subject: RFR: 8341781: Improve Min/Max node identities Message-ID: Hi all, This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: Baseline Patch Benchmark Mode Cnt Score Error Units Score Error Units Improvement BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! ------------- Commit messages: - Min/Max identities Changes: https://git.openjdk.org/jdk/pull/21439/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341781 Stats: 293 lines in 5 files changed: 287 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21439.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21439/head:pull/21439 PR: https://git.openjdk.org/jdk/pull/21439 From jkarthikeyan at openjdk.org Thu Oct 10 03:06:17 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 10 Oct 2024 03:06:17 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: On Thu, 10 Oct 2024 01:09:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more style changes Thanks, looks good to me! ------------- Marked as reviewed by jkarthikeyan (Committer). PR Review: https://git.openjdk.org/jdk/pull/21414#pullrequestreview-2358765443 From liach at openjdk.org Thu Oct 10 05:16:14 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 10 Oct 2024 05:16:14 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 02:59:14 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! src/hotspot/share/opto/addnode.hpp line 270: > 268: virtual int Opcode() const = 0; > 269: virtual int max_opcode() const = 0; > 270: virtual int min_opcode() const = 0; The old comment above // all the behavior of addition on a ring. Only new thing is that we allow // 2 equal inputs to be equal. seems outdated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794661997 From chagedorn at openjdk.org Thu Oct 10 06:11:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Oct 2024 06:11:12 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 02:59:14 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! Few comments, otherwise, looks good to me. src/hotspot/share/opto/addnode.cpp line 1478: > 1476: } > 1477: > 1478: // If the operations are different return the operand, as Max(A, Min(A, B)) == A if the value isn't a floating point value, Suggestion: // If the operations are different return the operand 'A', as Max(A, Min(A, B)) == A if the value isn't a floating point value, src/hotspot/share/opto/addnode.cpp line 1479: > 1477: > 1478: // If the operations are different return the operand, as Max(A, Min(A, B)) == A if the value isn't a floating point value, > 1479: // as if B == NaN the identity doesn't hold. Reads as "as if". Maybe rephrase to Suggestion: // For floating points, the identity does not hold if B == NaN. ? src/hotspot/share/opto/addnode.cpp line 1485: > 1483: } > 1484: > 1485: return nullptr; I guess you can remove this since we return nullptr below anyway. Suggestion: test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 116: > 114: > 115: @Test > 116: @IR(applyIfPlatform = { "riscv64", "false" }, phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) Can you add a comment here why we cannot apply the rules for riscv? test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 122: > 120: > 121: @Test > 122: @IR(applyIfPlatform = { "riscv64", "false" }, failOn = { IRNode.MIN_L, IRNode.MAX_L }) Since `MinL/MaxL` are expanded in macro expansion, this rule will also succeed even if the optimization is not applied. I suggest to also add `phase = CompilePhase.BEFORE_MACRO_EXPANSION`. Same below. test/hotspot/jtreg/compiler/vectorization/runner/BasicShortOpTest.java line 220: > 218: short[] res = new short[SIZE]; > 219: for (int i = 0; i < SIZE; i++) { > 220: res[i] = (short) Math.min(a[i], b[i]); I guess without this change, this collapses to a constant which enables vectorization which was not expected before? ------------- PR Review: https://git.openjdk.org/jdk/pull/21439#pullrequestreview-2359045864 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794700959 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794705393 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794707261 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794711053 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794714816 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794720021 From dnsimon at openjdk.org Thu Oct 10 07:42:12 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Oct 2024 07:42:12 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v5] In-Reply-To: <4X5xzYDU45UnuXtoC0sEJ7dF5seNqQDJxhdNrdktsV8=.03a8e2b5-7e43-44b7-92b4-b4440dc26770@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <4X5xzYDU45UnuXtoC0sEJ7dF5seNqQDJxhdNrdktsV8=.03a8e2b5-7e43-44b7-92b4-b4440dc26770@github.com> Message-ID: On Fri, 4 Oct 2024 16:34:54 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > Simplified C2V_BLOCK. Looks good to me. src/hotspot/share/compiler/compilerThread.cpp line 58: > 56: > 57: void CompilerThread::set_compiler(AbstractCompiler* c) { > 58: /* The comment could be a little shorter: /* * Compiler threads need to make Java upcalls to the jargraal compiler. * Java upcalls are also needed by the InterpreterRuntime when using jargraal. */ ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21285#pullrequestreview-2359296330 PR Review Comment: https://git.openjdk.org/jdk/pull/21285#discussion_r1794843319 From rcastanedalo at openjdk.org Thu Oct 10 08:37:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 10 Oct 2024 08:37:16 GMT Subject: Integrated: 8341619: C2: remove unused StoreCM node In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:28:19 GMT, Roberto Casta?eda Lozano wrote: > This cleanup removes C2's `StoreCM` (store card mark) node and all its special handling code. This node used to model card mark stores in early-expanded G1 post-barriers, and is no longer needed after [JEP 475](https://openjdk.org/jeps/475). > > __Testing:__ tier1-5 (linux-x64, linux-aarch64, windows-x64, macosx-x64, and macosx-aarch64; release and debug mode). This pull request has now been integrated. Changeset: 16042556 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/16042556f394adfa93e54173944198397ad29dea Stats: 388 lines in 23 files changed: 0 ins; 376 del; 12 mod 8341619: C2: remove unused StoreCM node Reviewed-by: chagedorn, thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21385 From chagedorn at openjdk.org Thu Oct 10 09:06:13 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Oct 2024 09:06:13 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 02:59:14 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! Your new test fails on Linux with `-XX:UseAVX=0`: One or more @IR rules failed: Failed IR Rules (8) of Methods (8) ---------------------------------- 1) Method "public double compiler.c2.irTests.TestMinMaxIdentities.doubleMaxMax(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_D#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 2) Method "public double compiler.c2.irTests.TestMinMaxIdentities.doubleMaxMin(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1", "_#MAX_D#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 3) Method "public double compiler.c2.irTests.TestMinMaxIdentities.doubleMinMax(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1", "_#MAX_D#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 4) Method "public double compiler.c2.irTests.TestMinMaxIdentities.doubleMinMin(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 5) Method "public float compiler.c2.irTests.TestMinMaxIdentities.floatMaxMax(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_F#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 6) Method "public float compiler.c2.irTests.TestMinMaxIdentities.floatMaxMin(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1", "_#MAX_F#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 7) Method "public float compiler.c2.irTests.TestMinMaxIdentities.floatMinMax(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1", "_#MAX_F#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 8) Method "public float compiler.c2.irTests.TestMinMaxIdentities.floatMinMin(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! Looks like we do not emit `Min/MaxF/D` nodes with `UseAVX=0`. I quickly checked the code and indeed, the intrinsics are only enabled if `UseAVX >= 1`: https://github.com/openjdk/jdk/blob/16042556f394adfa93e54173944198397ad29dea/src/hotspot/cpu/x86/x86.ad#L1542-L1549 You can probably just update your tests to exclude IR matching for this setup. Maybe you also want to double check the other architectures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21439#issuecomment-2404520133 From chagedorn at openjdk.org Thu Oct 10 09:14:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Oct 2024 09:14:39 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes Message-ID: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). The patch includes the following changes: - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. Thanks, Christian ------------- Commit messages: - 8341328: Refactor initial Assertion Predicate creation into separate classes Changes: https://git.openjdk.org/jdk/pull/21446/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21446&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341328 Stats: 529 lines in 6 files changed: 302 ins; 118 del; 109 mod Patch: https://git.openjdk.org/jdk/pull/21446.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21446/head:pull/21446 PR: https://git.openjdk.org/jdk/pull/21446 From chagedorn at openjdk.org Thu Oct 10 09:14:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Oct 2024 09:14:39 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes In-Reply-To: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> References: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> Message-ID: On Thu, 10 Oct 2024 09:04:21 GMT, Christian Hagedorn wrote: > This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). > > The patch includes the following changes: > - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: > - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). > - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. > - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) > - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). > - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) > - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. > > Thanks, > Christian src/hotspot/share/opto/loopPredicate.cpp line 1277: > 1275: IfTrueNode* template_assertion_predicate_proj = > 1276: create_template_assertion_predicate(if_opcode, cl, parse_predicate_proj, upper_bound_proj, scale, offset, range, > 1277: deopt_reason); We only use the opcode from the `iff`. `init`, `limit` and `stride` can be fetched from the `CountedLoop` again. src/hotspot/share/opto/loopTransform.cpp line 3088: > 3086: set_ctrl(iffm->in(1), new_limit_ctrl); > 3087: > 3088: C->print_method(PHASE_AFTER_RANGE_CHECK_ELIMINATION, 4, cl); Moved this down because we missed some transformations when having this earlier. Additionally, if there are multiple range checks, we can see the intermediate state for one transformation with the next `PHASE_BEFORE_RANGE_CHECK_ELIMINATION` dump. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21446#discussion_r1795020752 PR Review Comment: https://git.openjdk.org/jdk/pull/21446#discussion_r1795027412 From jbhateja at openjdk.org Thu Oct 10 12:22:20 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Oct 2024 12:22:20 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 18:31:41 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed a review comment src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 1522: > 1520: } > 1521: > 1522: void MacroAssembler::sha512_update_ni_x1(Register arg_hash, Register arg_msg, Register ofs, Register limit, bool multi_block) { Please add a comment on this mentioning the source of algorithm. https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 1602: > 1600: vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] > 1601: vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] > 1602: vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] [Algorithm](https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm) is specifically crafted for 256 bit vectors and with 512 bit extension we modify it. Do you think we should factor out following pattern and add an alternative implementation for it ? ``` vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] This is a fixed pattern seen 4 times within computation loop and once outside the loop. We are permuting two vectors with constant paramutation mask and blending them using immediate mask. This is a very valid use case for two table permutation instruction VPERMI2Q (available for AVX512VL targets) We can store permutation pattern outside the loop into a vector and then re-use it within the loop. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 1587: > 1585: __ sha512_AVX2(msg, state0, state1, msgtmp0, msgtmp1, msgtmp2, msgtmp3, msgtmp4, > 1586: buf, state, ofs, limit, rsp, multi_block, shuf_mask); > 1587: } Suggestion: const XMMRegister msg = xmm0; const XMMRegister state0 = xmm1; const XMMRegister state1 = xmm2; const XMMRegister msgtmp0 = xmm3; const XMMRegister msgtmp1 = xmm4; const XMMRegister msgtmp2 = xmm5; const XMMRegister msgtmp3 = xmm6; const XMMRegister msgtmp4 = xmm7; const XMMRegister shuf_mask = xmm8; __ sha512_AVX2(msg, state0, state1, msgtmp0, msgtmp1, msgtmp2, msgtmp3, msgtmp4, buf, state, ofs, limit, rsp, multi_block, shuf_mask); } src/hotspot/cpu/x86/stubRoutines_x86.cpp line 446: > 444: 0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL, > 445: }; > 446: Remove this newline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1795316551 PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1795279620 PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1785638858 PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1785638760 From iveresov at openjdk.org Thu Oct 10 15:33:38 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 10 Oct 2024 15:33:38 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" Message-ID: `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. ------------- Commit messages: - 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" Changes: https://git.openjdk.org/jdk/pull/21455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341831 Stats: 9 lines in 1 file changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21455/head:pull/21455 PR: https://git.openjdk.org/jdk/pull/21455 From iveresov at openjdk.org Thu Oct 10 15:37:22 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 10 Oct 2024 15:37:22 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v2] In-Reply-To: References: Message-ID: > `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Remove the test from the problem list - 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" Summary: Relax assert to deal with CacheWB nodes ------------- Changes: https://git.openjdk.org/jdk/pull/21455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=01 Stats: 11 lines in 2 files changed: 8 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21455/head:pull/21455 PR: https://git.openjdk.org/jdk/pull/21455 From jbhateja at openjdk.org Thu Oct 10 16:27:25 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Oct 2024 16:27:25 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v23] In-Reply-To: <0LkBvvNPq5jmWfOdjItIXGedRDtpiivJM06BAx7vP0I=.c5417544-edef-4623-beaa-08cd7c565361@github.com> References: <0LkBvvNPq5jmWfOdjItIXGedRDtpiivJM06BAx7vP0I=.c5417544-edef-4623-beaa-08cd7c565361@github.com> Message-ID: On Tue, 8 Oct 2024 19:25:24 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge branch 'JDK-8338201' of http://github.com/jatin-bhateja/jdk into JDK-8338201 > - Update VectorMath.java > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Typographical error fixups > - Doc fixups > - Typographic error > - Merge stashing and re-commit > - Tuning extra spaces. > - Tests for newly added VectorMath.* operations > - Test cleanups. > - ... and 16 more: https://git.openjdk.org/jdk/compare/7312eea3...ce76c3e5 Hi @vnkozlov , Can you kindly run this through your test infrastructure. We have two review approvals for Java and x86 backend code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2405554905 From qamai at openjdk.org Thu Oct 10 16:52:17 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 10 Oct 2024 16:52:17 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* In-Reply-To: <0GutJI2Gd9s2w3wK9R4UJBYhbyrYZIakEGEaz34jvZw=.95efd71c-d350-4a34-bfda-4cbf06599c1e@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <0GutJI2Gd9s2w3wK9R4UJBYhbyrYZIakEGEaz34jvZw=.95efd71c-d350-4a34-bfda-4cbf06599c1e@github.com> Message-ID: <5vIqh2WNKarobKtio8JND9Yf81Mt67pTJ9YlTj59bFE=.069563d4-f269-4df8-9232-2cf1862147ff@github.com> On Wed, 9 Oct 2024 16:55:55 GMT, Vladimir Kozlov wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > My testing passed. @vnkozlov Could you re-review this, please, it seems required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2405603659 From kvn at openjdk.org Thu Oct 10 17:00:17 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Oct 2024 17:00:17 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v2] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 15:37:22 GMT, Igor Veresov wrote: >> `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Remove the test from the problem list > - 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" > Summary: Relax assert to deal with CacheWB nodes Please add comment into code explaining change. ------------- PR Review: https://git.openjdk.org/jdk/pull/21455#pullrequestreview-2360944637 From kvn at openjdk.org Thu Oct 10 17:05:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Oct 2024 17:05:13 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: On Thu, 10 Oct 2024 01:09:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more style changes Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21414#pullrequestreview-2360953623 From iveresov at openjdk.org Thu Oct 10 17:26:25 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 10 Oct 2024 17:26:25 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: References: Message-ID: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> > `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21455/files - new: https://git.openjdk.org/jdk/pull/21455/files/b8e0d8bd..ae69ee4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21455/head:pull/21455 PR: https://git.openjdk.org/jdk/pull/21455 From iveresov at openjdk.org Thu Oct 10 17:26:25 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 10 Oct 2024 17:26:25 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v2] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 16:57:24 GMT, Vladimir Kozlov wrote: > Please add comment into code explaining change. Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2405665750 From svkamath at openjdk.org Thu Oct 10 18:52:30 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 10 Oct 2024 18:52:30 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v4] In-Reply-To: References: Message-ID: > Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Updated code as per review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20633/files - new: https://git.openjdk.org/jdk/pull/20633/files/85c1aea9..3cb9175a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20633&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20633&range=02-03 Stats: 13 lines in 2 files changed: 1 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/20633.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20633/head:pull/20633 PR: https://git.openjdk.org/jdk/pull/20633 From svkamath at openjdk.org Thu Oct 10 18:52:31 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 10 Oct 2024 18:52:31 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 11:52:36 GMT, Jatin Bhateja wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed a review comment > > src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 1602: > >> 1600: vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] >> 1601: vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] >> 1602: vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] > > I assume [Algorithm](https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm) is specifically crafted for 256 bit vectors and with 512 bit extension we modify it. Do you think we should factor out following pattern and add an alternative implementation for it ? > > ``` > vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] > vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] > vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] > > > This is a fixed pattern seen 4 times within computation loop and once outside the loop. > We are permuting two vectors with constant paramutation mask and blending them using immediate mask. > This is a very valid use case for two table permutation instruction VPERMI2Q (available for AVX512VL targets) > We can store permutation pattern outside the loop into a vector and then re-use it within the loop. We can do this change in a separate PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1795938470 From kvn at openjdk.org Thu Oct 10 19:21:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Oct 2024 19:21:12 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: On Thu, 10 Oct 2024 17:26:25 GMT, Igor Veresov wrote: >> `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Add comment LoadStore nodes should have the same issue. Why they are not affected? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2405863892 From iveresov at openjdk.org Thu Oct 10 21:04:11 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 10 Oct 2024 21:04:11 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: On Thu, 10 Oct 2024 19:18:51 GMT, Vladimir Kozlov wrote: > LoadStore nodes should have the same issue. Why they are not affected? Because LoadStore is an official store. It consumes a memory state and produces memory state. CacheWB is not really a store, that is it doesn't produce memory effects from the perspective of the backend (its match rule is not a Set). It's hard to tell what's the best way to model it, so I just decided not to mess with its semantics right now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2406030612 From kvn at openjdk.org Thu Oct 10 21:21:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Oct 2024 21:21:11 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: On Thu, 10 Oct 2024 17:26:25 GMT, Igor Veresov wrote: >> `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Marked as reviewed by kvn (Reviewer). Okay ------------- PR Review: https://git.openjdk.org/jdk/pull/21455#pullrequestreview-2361441227 PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2406055260 From dlong at openjdk.org Thu Oct 10 21:46:14 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 10 Oct 2024 21:46:14 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v2] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 21:17:46 GMT, Vladimir Ivanov wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> simplification based on reviewer comments > > src/hotspot/share/ci/ciMethod.cpp line 692: > >> 690: >> 691: // Redefinition support. >> 692: if (this->get_Method()->is_old() || root_m->get_Method()->is_old()) { > > Is it safe to access raw `Method*` from a compiler thread which is not in VM state? No, probably not. I'll fix it. I was assuming the whole function was in the VM state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1796131749 From dlong at openjdk.org Thu Oct 10 22:40:44 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 10 Oct 2024 22:40:44 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request incrementally with one additional commit since the last revision: make sure to be in VM state when checking is_old ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21148/files - new: https://git.openjdk.org/jdk/pull/21148/files/0705b33e..80c9ae67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=01-02 Stats: 16 lines in 2 files changed: 10 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From vlivanov at openjdk.org Thu Oct 10 22:56:12 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 10 Oct 2024 22:56:12 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 22:40:44 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > make sure to be in VM state when checking is_old src/hotspot/share/ci/ciMethod.cpp line 695: > 693: // Redefinition support. > 694: if (this->is_old() || root_m->is_old()) { > 695: return nullptr; IMO you can safely drop this particular check. The one after `Dependencies::find_unique_concrete_method()` should be enough to preserve the invariant (`target == cha_monomorphic_target`) . But thinking more about it, now I'm curious what happens when an old method is actually encountered. The fix conservatively rejects possible inlining opportunity, but it seems it doesn't invalidate resulting nmethod anymore. So, no recompilation attempt follows to recuperate that. We could either record a evol dependency on a stale `Method` (to fail during nmethod installation step) or fail-fast the compilation (probably, implies additional checks to propagate the failure status). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1796190689 From duke at openjdk.org Thu Oct 10 23:04:10 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 10 Oct 2024 23:04:10 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 18:31:41 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed a review comment This implementation looks good to me. I went through the implementation of `sha512_update_ni_x1`. Looked at it line by line and compared it to the ipsec [implementation](https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm). Thanks, Srinivas Vamsi Parasa (Intel) ------------- Marked as reviewed by vamsi-parasa at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/20633#pullrequestreview-2361568171 From dlong at openjdk.org Thu Oct 10 23:08:13 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 10 Oct 2024 23:08:13 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 22:53:03 GMT, Vladimir Ivanov wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> make sure to be in VM state when checking is_old > > src/hotspot/share/ci/ciMethod.cpp line 695: > >> 693: // Redefinition support. >> 694: if (this->is_old() || root_m->is_old()) { >> 695: return nullptr; > > IMO you can safely drop this particular check. The one after `Dependencies::find_unique_concrete_method()` should be enough to preserve the invariant (`target == cha_monomorphic_target`) . > > But thinking more about it, now I'm curious what happens when an old method is actually encountered. The fix conservatively rejects possible inlining opportunity, but it seems it doesn't invalidate resulting nmethod anymore. So, no recompilation attempt follows to recuperate that. > > We could either record a evol dependency on a stale `Method` (to fail during nmethod installation step) or fail-fast the compilation (probably, implies additional checks to propagate the failure status). Even if we check for stale Methods in various places, including invoke(), there is nothing to prevent the method from going stale after the last spot-check. My understanding was that we already handle stale metadata as a precondition to creating the nmethod. If we have a loophole there that lets stale metadata get through, then that's a separate existing bug for C1 and C2. I was tempted to add a bailout, but the reason would be as a performance improvement to short-circuit wasted work, not to correct a stale metadata problem. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1796199310 From dlong at openjdk.org Thu Oct 10 23:11:10 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 10 Oct 2024 23:11:10 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 23:05:18 GMT, Dean Long wrote: >> src/hotspot/share/ci/ciMethod.cpp line 695: >> >>> 693: // Redefinition support. >>> 694: if (this->is_old() || root_m->is_old()) { >>> 695: return nullptr; >> >> IMO you can safely drop this particular check. The one after `Dependencies::find_unique_concrete_method()` should be enough to preserve the invariant (`target == cha_monomorphic_target`) . >> >> But thinking more about it, now I'm curious what happens when an old method is actually encountered. The fix conservatively rejects possible inlining opportunity, but it seems it doesn't invalidate resulting nmethod anymore. So, no recompilation attempt follows to recuperate that. >> >> We could either record a evol dependency on a stale `Method` (to fail during nmethod installation step) or fail-fast the compilation (probably, implies additional checks to propagate the failure status). > > Even if we check for stale Methods in various places, including invoke(), there is nothing to prevent the method from going stale after the last spot-check. My understanding was that we already handle stale metadata as a precondition to creating the nmethod. If we have a loophole there that lets stale metadata get through, then that's a separate existing bug for C1 and C2. > I was tempted to add a bailout, but the reason would be as a performance improvement to short-circuit wasted work, not to correct a stale metadata problem. > IMO you can safely drop this particular check. The one after Dependencies::find_unique_concrete_method() should be enough to preserve the invariant (target == cha_monomorphic_target) . If I do that, then I can also revert the VM state changes. However, I wasn't able to convince myself that this check is not needed. If we end up returning root_m here as cha_monomorphic_target, it seems possible that it could be a new version of the method, and then target == cha_monomorphic_target would fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1796201992 From duke at openjdk.org Thu Oct 10 23:16:11 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 10 Oct 2024 23:16:11 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: <0EVgFAWZ9O9e_dlRj4Na8Xf7QEC6FGU03wAs1bwVq5c=.ff19b62d-3b2c-4320-8be2-c7dee7cafdae@github.com> On Thu, 10 Oct 2024 18:49:38 GMT, Smita Kamath wrote: >> src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 1602: >> >>> 1600: vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] >>> 1601: vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] >>> 1602: vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] >> >> I assume [Algorithm](https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm) is specifically crafted for 256 bit vectors and with 512 bit extension we modify it. Do you think we should factor out following pattern and add an alternative implementation for it ? >> >> ``` >> vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] >> vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] >> vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] >> >> >> This is a fixed pattern seen 4 times within computation loop and once outside the loop. >> We are permuting two vectors with constant paramutation mask and blending them using immediate mask. >> This is a very valid use case for two table permutation instruction VPERMI2Q (available for AVX512VL targets) >> We can store permutation pattern outside the loop into a vector and then re-use it within the loop. > > We can do this change in a separate PR. I agree with Smita. The current implementation has a one-to-one correspondence with the ipsec implementation. Any new changes or refactoring will require a new round of exhaustive testing and could be implemented as a separate PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1796204440 From dlong at openjdk.org Thu Oct 10 23:24:12 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 10 Oct 2024 23:24:12 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 23:09:00 GMT, Dean Long wrote: >> Even if we check for stale Methods in various places, including invoke(), there is nothing to prevent the method from going stale after the last spot-check. My understanding was that we already handle stale metadata as a precondition to creating the nmethod. If we have a loophole there that lets stale metadata get through, then that's a separate existing bug for C1 and C2. >> I was tempted to add a bailout, but the reason would be as a performance improvement to short-circuit wasted work, not to correct a stale metadata problem. > >> IMO you can safely drop this particular check. The one after Dependencies::find_unique_concrete_method() should be enough to preserve the invariant (target == cha_monomorphic_target) . > > If I do that, then I can also revert the VM state changes. However, I wasn't able to convince myself that this check is not needed. If we end up returning root_m here as cha_monomorphic_target, it seems possible that it could be a new version of the method, and then target == cha_monomorphic_target would fail. C1 does call 2159 dependency_recorder()->assert_evol_method(inline_target); which in the CHA case would be `cha_monomorphic_target`, not `target`, so it looks like we may not detect if `target` is stale as long as `cha_monomorphic_target` is not. It seems like a minor loophole, but I'm not sure what kind of problems it could cause, especially if the bytecodes of `target` are not used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1796211261 From dlong at openjdk.org Fri Oct 11 01:06:46 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 01:06:46 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v4] In-Reply-To: References: Message-ID: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request incrementally with one additional commit since the last revision: fix errors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21148/files - new: https://git.openjdk.org/jdk/pull/21148/files/80c9ae67..55988fd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=02-03 Stats: 15 lines in 2 files changed: 11 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From dlong at openjdk.org Fri Oct 11 01:06:46 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 01:06:46 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 22:40:44 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > make sure to be in VM state when checking is_old Hold off on re-reviews. I need to fix some errors introduced by moving the VM state transitions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2406329451 From dlong at openjdk.org Fri Oct 11 01:37:53 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 01:37:53 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v5] In-Reply-To: References: Message-ID: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request incrementally with one additional commit since the last revision: redo VM state ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21148/files - new: https://git.openjdk.org/jdk/pull/21148/files/55988fd3..2c7fc099 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=03-04 Stats: 29 lines in 2 files changed: 7 ins; 17 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From dlong at openjdk.org Fri Oct 11 01:47:10 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 01:47:10 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: <-8mlP1ioQUoMffiB5nO9sDlyPH_9kEImNl9NqdVqC6s=.b2442fa3-cf7a-424f-9e70-cefd0eb70419@github.com> On Thu, 10 Oct 2024 21:01:26 GMT, Igor Veresov wrote: > > LoadStore nodes should have the same issue. Why they are not affected? > > Because LoadStore is an official store. It consumes a memory state and produces memory state. CacheWB is not really a store, that is it doesn't produce memory effects from the perspective of the backend (its match rule is not a Set). It's hard to tell what's the best way to model it, so I just decided not to mess with its semantics right now. Should it be treated like a memory barrier? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2406371663 From dlong at openjdk.org Fri Oct 11 01:54:17 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 01:54:17 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v5] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 01:37:53 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > redo VM state OK, fixed version pushed. I moved the first group of is_old checks into resolve_invoke(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2406382772 From iveresov at openjdk.org Fri Oct 11 03:09:11 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 11 Oct 2024 03:09:11 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: <-8mlP1ioQUoMffiB5nO9sDlyPH_9kEImNl9NqdVqC6s=.b2442fa3-cf7a-424f-9e70-cefd0eb70419@github.com> References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> <-8mlP1ioQUoMffiB5nO9sDlyPH_9kEImNl9NqdVqC6s=.b2442fa3-cf7a-424f-9e70-cefd0eb70419@github.com> Message-ID: On Fri, 11 Oct 2024 01:44:22 GMT, Dean Long wrote: > > > LoadStore nodes should have the same issue. Why they are not affected? > > > > > > Because LoadStore is an official store. It consumes a memory state and produces memory state. CacheWB is not really a store, that is it doesn't produce memory effects from the perspective of the backend (its match rule is not a Set). It's hard to tell what's the best way to model it, so I just decided not to mess with its semantics right now. > > Should it be treated like a memory barrier? I'm not sure why it's not, I guess they wanted a more relaxed behavior? It's more like the opposite of prefetch really. I didn't want to touch the semantics of it in this bug fix because it feels like it will likely open another can of worms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2406471308 From jkarthikeyan at openjdk.org Fri Oct 11 04:30:11 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 04:30:11 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 05:57:09 GMT, Christian Hagedorn wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 116: > >> 114: >> 115: @Test >> 116: @IR(applyIfPlatform = { "riscv64", "false" }, phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) > > Can you add a comment here why we cannot apply the rules for riscv? This is a good call, the IR rules don't apply to RISC-V because it doesn't have support for CMoves so the MinL/MaxL nodes aren't made at all. Since `Math.min/max(LL)` isn't intensified it first needs to be matched into CMove, then Min/Max, and then the identity needs to be called. Since #20098 implements the intrinsic we could remove the special casing after it's merged. I've added a comment to the source code as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1796408365 From jkarthikeyan at openjdk.org Fri Oct 11 04:35:12 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 04:35:12 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 06:06:19 GMT, Christian Hagedorn wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > test/hotspot/jtreg/compiler/vectorization/runner/BasicShortOpTest.java line 220: > >> 218: short[] res = new short[SIZE]; >> 219: for (int i = 0; i < SIZE; i++) { >> 220: res[i] = (short) Math.min(a[i], b[i]); > > I guess without this change, this collapses to a constant which enables vectorization which was not expected before? Yeah, exactly - since `65536` is larger than the maximum short value, with this patch it can optimize the MinI node away entirely. I changed it to use the `b` array, which is what the `vectorMax` test case below uses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1796410477 From jkarthikeyan at openjdk.org Fri Oct 11 05:03:12 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 05:03:12 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: <1T3CTv_0tE6J0fU6QJx20vyfikrIxPG2uzTVRNJGPwA=.059935c2-32c8-43c5-ab9b-d9541f65e25f@github.com> On Fri, 11 Oct 2024 04:28:00 GMT, Jasmine Karthikeyan wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 116: >> >>> 114: >>> 115: @Test >>> 116: @IR(applyIfPlatform = { "riscv64", "false" }, phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) >> >> Can you add a comment here why we cannot apply the rules for riscv? > > This is a good call, the IR rules don't apply to RISC-V because it doesn't have support for CMoves so the MinL/MaxL nodes aren't made at all. Since `Math.min/max(LL)` isn't intensified it first needs to be matched into CMove, then Min/Max, and then the identity needs to be called. Since #20098 implements the intrinsic we could remove the special casing after it's merged. I've added a comment to the source code as well. On closer inspection it seems that because of the CMove cost model, the outer min/max operation doesn't get turned into a CMove so the Long IR rules don't reliably get matched anywhere. It must have slipped through the cracks because of the way that my IR rules were structured, I only realized this after I added the compile phase to the other 2 rules. I think for this to work it would need the intrinsics from the other PR. Do you think we should continue with this PR with the Long cases disabled and enable it afterwards, or should we wait for #20098 to be merged? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1796424851 From chagedorn at openjdk.org Fri Oct 11 07:06:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Oct 2024 07:06:10 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: <1T3CTv_0tE6J0fU6QJx20vyfikrIxPG2uzTVRNJGPwA=.059935c2-32c8-43c5-ab9b-d9541f65e25f@github.com> References: <1T3CTv_0tE6J0fU6QJx20vyfikrIxPG2uzTVRNJGPwA=.059935c2-32c8-43c5-ab9b-d9541f65e25f@github.com> Message-ID: On Fri, 11 Oct 2024 05:00:06 GMT, Jasmine Karthikeyan wrote: >> This is a good call, the IR rules don't apply to RISC-V because it doesn't have support for CMoves so the MinL/MaxL nodes aren't made at all. Since `Math.min/max(LL)` isn't intensified it first needs to be matched into CMove, then Min/Max, and then the identity needs to be called. Since #20098 implements the intrinsic we could remove the special casing after it's merged. I've added a comment to the source code as well. > > On closer inspection it seems that because of the CMove cost model, the outer min/max operation doesn't get turned into a CMove so the Long IR rules don't reliably get matched anywhere. It must have slipped through the cracks because of the way that my IR rules were structured, I only realized this after I added the compile phase to the other 2 rules. I think for this to work it would need the intrinsics from the other PR. Do you think we should continue with this PR with the Long cases disabled and enable it afterwards, or should we wait for #20098 to be merged? Thanks for sharing more details. I think it's perfectly fine to still add them now but leave them disabled with a reference to JDK-8307513 since you already wrote them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1796513108 From thartmann at openjdk.org Fri Oct 11 07:56:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Oct 2024 07:56:50 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found Message-ID: Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 Thanks, Tobias ------------- Commit messages: - Fix - 8336726: C2: assert(\!do_asserts || projs->fallthrough_ioproj \!= nullptr) failed: must be found Changes: https://git.openjdk.org/jdk/pull/21450/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21450&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336726 Stats: 83 lines in 4 files changed: 77 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21450.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21450/head:pull/21450 PR: https://git.openjdk.org/jdk/pull/21450 From thartmann at openjdk.org Fri Oct 11 08:36:12 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Oct 2024 08:36:12 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 23:30:03 GMT, Dean Long wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> In short: >> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. >> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. >> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. >> >> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. >> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) t... > > Wouldn't it be better to get rid of the concurrency? We could grab CodeCache_lock and Patching_lock in the same block, so we serialize the patching and register_nmethod. @dean-long I discussed this with @tschatzl and, on his request, improved the PR description a bit. He would also prefer the alignment solution because it does not increase the scope of the lock (and we already rely on word-aligned word-sized memory accesses being atomic in many other places). What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2406908303 From chagedorn at openjdk.org Fri Oct 11 13:40:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Oct 2024 13:40:11 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 13:29:11 GMT, Tobias Hartmann wrote: > Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. > > Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: > https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 > > Thanks, > Tobias That looks reasonable to me. As we have discussed offline, it's probably not worth/too complex to verify that we always end in an infinite loop afterwards. test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java line 53: > 51: public static void test(boolean flag) { > 52: // Avoid executing endless loop > 53: if (flag) return; You should add braces here: Suggestion: if (flag) { return; } ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21450#pullrequestreview-2362827732 PR Review Comment: https://git.openjdk.org/jdk/pull/21450#discussion_r1796968384 From thartmann at openjdk.org Fri Oct 11 14:23:48 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Oct 2024 14:23:48 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found [v2] In-Reply-To: References: Message-ID: > Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. > > Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: > https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21450/files - new: https://git.openjdk.org/jdk/pull/21450/files/8ffdbd01..ba06b702 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21450&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21450&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21450.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21450/head:pull/21450 PR: https://git.openjdk.org/jdk/pull/21450 From thartmann at openjdk.org Fri Oct 11 14:23:48 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Oct 2024 14:23:48 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 13:29:11 GMT, Tobias Hartmann wrote: > Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. > > Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: > https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 > > Thanks, > Tobias Thanks for the review, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21450#issuecomment-2407518566 From chagedorn at openjdk.org Fri Oct 11 14:33:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Oct 2024 14:33:11 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 14:23:48 GMT, Tobias Hartmann wrote: >> Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. >> >> Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: >> https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21450#pullrequestreview-2362954248 From jkarthikeyan at openjdk.org Fri Oct 11 15:15:54 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 15:15:54 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Suggestions from review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21439/files - new: https://git.openjdk.org/jdk/pull/21439/files/af771cff..b4b96143 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=00-01 Stats: 15 lines in 3 files changed: 5 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21439.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21439/head:pull/21439 PR: https://git.openjdk.org/jdk/pull/21439 From jkarthikeyan at openjdk.org Fri Oct 11 15:15:54 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 15:15:54 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 02:59:14 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! Thanks for the suggestions and testing, @liach and @chhagedorn! I've taken a look at the backend implementations, and it seems that aarch64 and RISC-V unconditionally support floating point Min/Max while x64 only supports them with `UseAVX >= 1`, as described. I made it so that the test only runs when it matches that criteria. I've pushed a commit that should address all the suggestions here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21439#issuecomment-2407621359 From jkarthikeyan at openjdk.org Fri Oct 11 15:15:55 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 15:15:55 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: <1T3CTv_0tE6J0fU6QJx20vyfikrIxPG2uzTVRNJGPwA=.059935c2-32c8-43c5-ab9b-d9541f65e25f@github.com> Message-ID: On Fri, 11 Oct 2024 07:03:44 GMT, Christian Hagedorn wrote: >> On closer inspection it seems that because of the CMove cost model, the outer min/max operation doesn't get turned into a CMove so the Long IR rules don't reliably get matched anywhere. It must have slipped through the cracks because of the way that my IR rules were structured, I only realized this after I added the compile phase to the other 2 rules. I think for this to work it would need the intrinsics from the other PR. Do you think we should continue with this PR with the Long cases disabled and enable it afterwards, or should we wait for #20098 to be merged? > > Thanks for sharing more details. I think it's perfectly fine to still add them now but leave them disabled with a reference to JDK-8307513 since you already wrote them. Sounds good! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1797098107 From qamai at openjdk.org Fri Oct 11 15:31:17 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 15:31:17 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: On Thu, 10 Oct 2024 01:09:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more style changes Thanks a lot for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2407651084 From qamai at openjdk.org Fri Oct 11 15:31:19 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 15:31:19 GMT Subject: Integrated: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* In-Reply-To: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: On Tue, 8 Oct 2024 19:46:12 GMT, Quan Anh Mai wrote: > Hi, > > This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. > > Please take a look and leave your reviews, > Thanks a lot. This pull request has now been integrated. Changeset: 7276a1be Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/7276a1bec0d90f63e9e433fdcdfd6564b70dc9bb Stats: 208 lines in 18 files changed: 3 ins; 77 del; 128 mod 8341784: Refactor TypeVect to use a BasicType instead of a const Type* Reviewed-by: kvn, jkarthikeyan ------------- PR: https://git.openjdk.org/jdk/pull/21414 From jkarthikeyan at openjdk.org Fri Oct 11 15:34:15 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 15:34:15 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Hmm, do you think this pattern could be matched in the ad-files instead of the middle end? I think that might be a lot cleaner since the backend already has systems for matching node trees, which could avoid a lot of the complexity here. I think it could make the patch a lot smaller and simpler. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407658405 From qamai at openjdk.org Fri Oct 11 15:54:47 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 15:54:47 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop Message-ID: Hi, This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - add benchmark - don't eagerly spill if we are reassigned anyway - eagerly spill a node in the loop entry Changes: https://git.openjdk.org/jdk/pull/21472/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341697 Stats: 87 lines in 3 files changed: 81 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From qamai at openjdk.org Fri Oct 11 16:01:09 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 16:01:09 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 15:50:20 GMT, Quan Anh Mai wrote: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. > > Please take a look and leave your reviews, thanks a lot. The benchmark result: Benchmark Mode Cnt Score Error Units LoopCounterBench.field_ret avgt 3 417.865 ? 2.914 ns/op LoopCounterBench.localVar_ret avgt 3 332.657 ? 109.310 ns/op The inner loop is free of spills because it has been hoisted to the loop entry: ? 0x00007fdf9821b546: mov r9d,DWORD PTR [r11+0xc] ;*getfield increment {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 1 (line 56) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) 0.03% ? 0x00007fdf9821b54a: mov esi,DWORD PTR [r12+r8*8+0xc] ; implicit exception: dispatches to 0x00007fdf9821b6f4 ? ;*lastore {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 27 (line 58) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) ? 0x00007fdf9821b54f: lea rax,[r12+r14*8] ? 0x00007fdf9821b553: lea r13,[r12+r8*8] 0.03% ? 0x00007fdf9821b557: xor edi,edi THE SPILL ? 0x00007fdf9821b559: vmovq xmm0,rbp ? 0x00007fdf9821b55e: xchg ax,ax ;*aload_0 {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 16 (line 58) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) ?? 0x00007fdf9821b560: cmp edi,r10d 1.66% ??? 0x00007fdf9821b563: jae 0x00007fdf9821b587 ??? 0x00007fdf9821b565: mov rbp,QWORD PTR [rax+rdi*8+0x10];*laload {reexecute=0 rethrow=0 return_oop=0} ??? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 26 (line 58) ??? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) 5.43% ??? 0x00007fdf9821b56a: cmp edi,esi 0.17% ??? 0x00007fdf9821b56c: jae 0x00007fdf9821b5c8 ??? 0x00007fdf9821b56e: mov QWORD PTR [r13+rdi*8+0x10],rbp;*goto {reexecute=0 rethrow=0 return_oop=0} ??? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 32 (line 57) ??? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) 1.40% ??? 0x00007fdf9821b573: add edi,r9d ;*iadd {reexecute=0 rethrow=0 return_oop=0} ??? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 30 (line 57) ??? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) 3.40% ??? 0x00007fdf9821b576: mov rbp,QWORD PTR [r15+0x450] ; ImmutableOopMap {r11=Oop r8=NarrowOop rcx=Oop rbx=Oop rdx=Oop rax=Oop r13=Oop r14=NarrowOop } ??? ;*goto {reexecute=1 rethrow=0 return_oop=0} ??? ; - (reexecute) org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 32 (line 57) ??? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) 1.80% ??? 0x00007fdf9821b57d: test DWORD PTR [rbp+0x0],eax ;*goto {reexecute=0 rethrow=0 return_oop=0} ??? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 32 (line 57) ??? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) ??? ; {poll} 84.42% ??? 0x00007fdf9821b580: cmp edi,r10d 0.30% ??? 0x00007fdf9821b583: jl 0x00007fdf9821b560 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ? ? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 13 (line 57) ? ? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2407697430 From qamai at openjdk.org Fri Oct 11 16:05:11 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 16:05:11 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 15:50:20 GMT, Quan Anh Mai wrote: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. > > Please take a look and leave your reviews, thanks a lot. Thanks to @shipilev for the benchmark, could you verify that this can solve the issue in the original benchmark as I imagine this is a simplified version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2407710352 From jbhateja at openjdk.org Fri Oct 11 16:19:17 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 11 Oct 2024 16:19:17 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: On Fri, 11 Oct 2024 15:27:34 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> more style changes > > Thanks a lot for your reviews Hi @merykitty , LGTM. Best Regards. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2407733997 From qamai at openjdk.org Fri Oct 11 16:34:18 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 16:34:18 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: <4k1WvfgPtwKa4RSDzjGnJYo2_O1dzDKdfHQrbLX5730=.040ea20c-7318-43e8-b39d-d0c2d44b3a27@github.com> On Fri, 11 Oct 2024 16:16:46 GMT, Jatin Bhateja wrote: >> Thanks a lot for your reviews > > Hi @merykitty , LGTM. > > Best Regards. @jatin-bhateja Thanks a lot for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2407757842 From qamai at openjdk.org Fri Oct 11 16:57:13 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 16:57:13 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Another approach is to do similarly to `MacroLogicVNode`. You can make another node and transform `MulVL` to it before matching, this is more flexible than using match rules. I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering`. It can be used to do e.g split `ExtractI` into the 128-bit lane extraction and the element extraction from that lane. This allows us to do `GVN` on those and `v.lane(5) + v.lane(7)` can be compiled nicely as: vextracti128 xmm0, ymm1, 1 pextrd eax, xmm0, 1 // vextracti128 xmm0, ymm1, 1 here will be gvn-ed pextrd ecx, xmm0, 3 add eax, ecx ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407793168 From jkarthikeyan at openjdk.org Fri Oct 11 17:15:08 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 17:15:08 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 11 Oct 2024 16:54:23 GMT, Quan Anh Mai wrote: > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407821557 From kvn at openjdk.org Fri Oct 11 18:34:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 11 Oct 2024 18:34:15 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 14:23:48 GMT, Tobias Hartmann wrote: >> Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. >> >> Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: >> https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java > > Co-authored-by: Christian Hagedorn Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21450#pullrequestreview-2363370518 From jrose at openjdk.org Fri Oct 11 18:50:16 2024 From: jrose at openjdk.org (John R Rose) Date: Fri, 11 Oct 2024 18:50:16 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 09:13:37 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Replace LogDecorators::Decorator::NoDecorator with LogDecorators::None For the compiler outputs which have no tags, what happens with (a) lines that begin with something like `[42] ` and (b) multi-line outputs? In both cases a log parser could (on a bad day) struggle to interpret the UL records correctly. I see that strict compatibility with existing compiler outputs can lead to additional parsing ambiguities, which will have to be dealt with at some point in the future. (Is there a leading space? I think not. So a leading `[42]` could be a problem if it crops up. Perhaps we need a targeted way to discriminate such things, such as injecting one leading space in some cases TBD.) Note that I am not advocating, here, for an immediate solution for parsing ambiguities, but I do want us to track such issues. Another side note, just FTR: There is a third issue with UL output from compilation, which is the grouping of logically connected log outputs. In the compiler logs we use XML nesting today for such logical grouping. This grouping, in addition to unambiguous delimiting of decorations, is yet another use, by compilation logs, of a basic property of XML: The syntax is not only somewhat readable, but also well defined. I suppose if XML syntax is encapsulated in UL syntax, that would provide a parseable ("tool-friendly") solution. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21383#issuecomment-2407955314 From qamai at openjdk.org Fri Oct 11 18:56:39 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 18:56:39 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: refinement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/21600d7d..85a2c266 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=00-01 Stats: 44 lines in 2 files changed: 34 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From vlivanov at openjdk.org Fri Oct 11 19:04:15 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Oct 2024 19:04:15 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 14:23:48 GMT, Tobias Hartmann wrote: >> Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. >> >> Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: >> https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java > > Co-authored-by: Christian Hagedorn Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21450#pullrequestreview-2363412880 From dlong at openjdk.org Fri Oct 11 19:12:13 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 19:12:13 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: On Thu, 10 Oct 2024 17:26:25 GMT, Igor Veresov wrote: >> `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Add comment src/hotspot/share/opto/gcm.cpp line 773: > 771: if (use_mem_state->is_Mach()) { > 772: int ideal_op = use_mem_state->as_Mach()->ideal_Opcode(); > 773: is_cache_wb = (ideal_op == Op_CacheWB || ideal_op == Op_CacheWBPostSync || ideal_op == Op_CacheWBPreSync); The match rules for CacheWBPostSync and CacheWBPreSync don't have memory operands. Is needs_anti_dependence_check() really returning true for them? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21455#discussion_r1797333904 From qamai at openjdk.org Fri Oct 11 19:14:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 19:14:28 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v3] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: refactor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/85a2c266..b6e78eb8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=01-02 Stats: 30 lines in 1 file changed: 16 ins; 4 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From dlong at openjdk.org Fri Oct 11 20:00:17 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 20:00:17 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann wrote: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > In short: > - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. > - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. > - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. > > Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. > I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remo... Sorry, I'm still not convinced this is safe. I took another look at C1 patching, and not only are we trying to scan oops while they are being patched, we are also patching the reloc information at the same time (see the call to change_reloc_info_for_address()). So that means there is a window where the instruction is patched but the reloc information is stale. If the scope of the lock is the only issue, then we could try to address that with a finer-grained lock or even a per-nmethod lock. @tschatzl , when we call register_nmethod(), do we really need to scan the oops immediately, or could that be delayed until the next safepoint? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2408043697 From vlivanov at openjdk.org Fri Oct 11 20:31:14 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Oct 2024 20:31:14 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 23:21:03 GMT, Dean Long wrote: >>> IMO you can safely drop this particular check. The one after Dependencies::find_unique_concrete_method() should be enough to preserve the invariant (target == cha_monomorphic_target) . >> >> If I do that, then I can also revert the VM state changes. However, I wasn't able to convince myself that this check is not needed. If we end up returning root_m here as cha_monomorphic_target, it seems possible that it could be a new version of the method, and then target == cha_monomorphic_target would fail. > > C1 does call > > 2159 dependency_recorder()->assert_evol_method(inline_target); > > which in the CHA case would be `cha_monomorphic_target`, not `target`, so it looks like we may not detect if `target` is stale as long as `cha_monomorphic_target` is not. It seems like a minor loophole, but I'm not sure what kind of problems it could cause, especially if the bytecodes of `target` are not used. The scenario which concerns me is performance-related. If CHA conservatively disables inlining when concurrent class redefinition takes place during parsing, then there is no mechanism in place to recuperate possible loss of performance. Normally, if inlined method is redefined later during compilation, nmethod installation fails during dependency validation. But here no inlining happens (missed optimization opportunity) and call sites in generated code are resolved based on symbolic information (except rare cases when resolved method is attached to the call site, see `SharedRuntime::find_callee_info_helper()` for details), so there are no guarantees the stale `Method*` is recorded. I agree that the window for such sequence of events is narrow, but it may be a source of surprising performance anomalies in rare cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1797396108 From iveresov at openjdk.org Fri Oct 11 20:38:24 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 11 Oct 2024 20:38:24 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v4] In-Reply-To: References: Message-ID: > `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Address Dean's comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21455/files - new: https://git.openjdk.org/jdk/pull/21455/files/ae69ee4b..914b97ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21455/head:pull/21455 PR: https://git.openjdk.org/jdk/pull/21455 From iveresov at openjdk.org Fri Oct 11 20:38:25 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 11 Oct 2024 20:38:25 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: On Fri, 11 Oct 2024 19:09:06 GMT, Dean Long wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment > > src/hotspot/share/opto/gcm.cpp line 773: > >> 771: if (use_mem_state->is_Mach()) { >> 772: int ideal_op = use_mem_state->as_Mach()->ideal_Opcode(); >> 773: is_cache_wb = (ideal_op == Op_CacheWB || ideal_op == Op_CacheWBPostSync || ideal_op == Op_CacheWBPreSync); > > The match rules for CacheWBPostSync and CacheWBPreSync don't have memory operands. Is needs_anti_dependence_check() really returning true for them? Yes, you're right. I'll remove those. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21455#discussion_r1797399651 From dlong at openjdk.org Fri Oct 11 21:11:15 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 21:11:15 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 20:28:31 GMT, Vladimir Ivanov wrote: >> C1 does call >> >> 2159 dependency_recorder()->assert_evol_method(inline_target); >> >> which in the CHA case would be `cha_monomorphic_target`, not `target`, so it looks like we may not detect if `target` is stale as long as `cha_monomorphic_target` is not. It seems like a minor loophole, but I'm not sure what kind of problems it could cause, especially if the bytecodes of `target` are not used. > > The scenario which concerns me is performance-related. If CHA conservatively disables inlining when concurrent class redefinition takes place during parsing, then there is no mechanism in place to recuperate possible loss of performance. Normally, if inlined method is redefined later during compilation, nmethod installation fails during dependency validation. But here no inlining happens (missed optimization opportunity) and call sites in generated code are resolved based on symbolic information (except rare cases when resolved method is attached to the call site, see `SharedRuntime::find_callee_info_helper()` for details), so there are no guarantees the stale `Method*` is recorded. > > I agree that the window for such sequence of events is narrow, but it may be a source of surprising performance anomalies in rare cases. OK, it sounds like we have two choices. Either record an evol dependency every time we short-circuit an optimization based on is_old, or bail out. I vote for bailing out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1797425086 From kbarrett at openjdk.org Fri Oct 11 21:15:27 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 11 Oct 2024 21:15:27 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: <_UfXDCsUh7XlJS1APDR6uEdAFZyDktB56D3l5idS0OA=.f6ed2d6d-251d-4237-ad0d-3dd8298e538b@github.com> References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> <_UfXDCsUh7XlJS1APDR6uEdAFZyDktB56D3l5idS0OA=.f6ed2d6d-251d-4237-ad0d-3dd8298e538b@github.com> Message-ID: On Fri, 4 Oct 2024 16:03:36 GMT, Vladimir Kozlov wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> remove surrounding whitespace > > Side note: please enable GHA testing for your repo. Thanks for reviews, @vnkozlov and @dean-long ------------- PR Comment: https://git.openjdk.org/jdk/pull/21324#issuecomment-2408124892 From kbarrett at openjdk.org Fri Oct 11 21:15:27 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 11 Oct 2024 21:15:27 GMT Subject: Integrated: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB In-Reply-To: References: Message-ID: On Thu, 3 Oct 2024 12:50:55 GMT, Kim Barrett wrote: > Please review this change to TypeRawPtr::add_offset to prevent a compiler from > inferring things based on prior pointer arithmetic not invoking UB. As noted in > the bug report, clang is actually doing this. > > To accomplish this, changed to integral arithmetic. Also added over/underflow > checks. > > Also made a couple of minor touchups. Replaced an implicit conversion to bool > with an explicit compare to nullptr (per style guide). Removed a no longer > needed dummy return after a (now) noreturn function. > > Testing: mach5 tier1-7 > That testing was with calls to "fatal" for the over/underflow cases and the > sum==0 case. There were no hits. I'm not sure how to construct a test that > would hit those. This pull request has now been integrated. Changeset: 0a57fe1d Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/0a57fe1df6f3431cfb2d5d868597c61ef6af3806 Stats: 15 lines in 1 file changed: 8 ins; 2 del; 5 mod 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB Reviewed-by: dlong, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21324 From kvn at openjdk.org Fri Oct 11 21:19:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 11 Oct 2024 21:19:13 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v5] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 01:37:53 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > redo VM state Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21148#pullrequestreview-2363574423 From kvn at openjdk.org Fri Oct 11 21:19:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 11 Oct 2024 21:19:14 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 21:08:56 GMT, Dean Long wrote: >> The scenario which concerns me is performance-related. If CHA conservatively disables inlining when concurrent class redefinition takes place during parsing, then there is no mechanism in place to recuperate possible loss of performance. Normally, if inlined method is redefined later during compilation, nmethod installation fails during dependency validation. But here no inlining happens (missed optimization opportunity) and call sites in generated code are resolved based on symbolic information (except rare cases when resolved method is attached to the call site, see `SharedRuntime::find_callee_info_helper()` for details), so there are no guarantees the stale `Method*` is recorded. >> >> I agree that the window for such sequence of events is narrow, but it may be a source of surprising performance anomalies in rare cases. > > OK, it sounds like we have two choices. Either record an evol dependency every time we short-circuit an optimization based on is_old, or bail out. I vote for bailing out. I vote for bail out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1797429171 From dlong at openjdk.org Fri Oct 11 21:34:17 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 21:34:17 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v4] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 20:38:24 GMT, Igor Veresov wrote: >> `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Address Dean's comment Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21455#pullrequestreview-2363588573 From vlivanov at openjdk.org Fri Oct 11 22:20:20 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Oct 2024 22:20:20 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 21:15:21 GMT, Vladimir Kozlov wrote: >> OK, it sounds like we have two choices. Either record an evol dependency every time we short-circuit an optimization based on is_old, or bail out. I vote for bailing out. > > I vote for bail out. I prefer bailing out as well, but, please, check it doesn't mark the root method as non-compilable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1797464343 From sviswanathan at openjdk.org Fri Oct 11 23:35:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 11 Oct 2024 23:35:06 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 Message-ID: When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. Also a regression test case is added accordingly. Best Regards, Sandhya ------------- Commit messages: - 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 Changes: https://git.openjdk.org/jdk/pull/21480/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338126 Stats: 20 lines in 2 files changed: 19 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21480/head:pull/21480 PR: https://git.openjdk.org/jdk/pull/21480 From qamai at openjdk.org Sat Oct 12 10:30:51 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 12 Oct 2024 10:30:51 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v4] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add LoopAwaredSpilling flag, refine implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/b6e78eb8..5f572bbb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=02-03 Stats: 167 lines in 3 files changed: 97 ins; 6 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From qamai at openjdk.org Sat Oct 12 10:52:50 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 12 Oct 2024 10:52:50 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v5] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. > > My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. > > A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. > > - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. > - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. > - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/5f572bbb..74fbc7d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=03-04 Stats: 22 lines in 1 file changed: 19 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From qamai at openjdk.org Sat Oct 12 10:55:10 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 12 Oct 2024 10:55:10 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v4] In-Reply-To: References: Message-ID: On Sat, 12 Oct 2024 10:30:51 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add LoopAwaredSpilling flag, refine implementation New benchmark results: Before After Benchmark (prob) Mode Cnt Score Error Score Error Units LoopCounterBench.field_ret N/A avgt 5 425.678 ? 5.086 419.819 ? 1.965 ns/op LoopCounterBench.localVar_ret N/A avgt 5 1126.937 ? 1.078 325.651 ? 5.309 ns/op LoopCounterBench.reloadAtEntry_ret N/A avgt 5 582.465 ? 2.649 491.421 ? 0.909 ns/op LoopCounterBench.spillUncommon_ret 0.0 avgt 5 490.901 ? 5.505 490.981 ? 2.118 ns/op LoopCounterBench.spillUncommon_ret 0.01 avgt 5 2491.557 ? 4.837 1912.170 ? 19.208 ns/op LoopCounterBench.spillUncommon_ret 0.1 avgt 5 21316.571 ? 88.198 10518.618 ? 183.380 ns/op LoopCounterBench.spillUncommon_ret 0.2 avgt 5 42095.064 ? 210.995 19908.240 ? 313.108 ns/op LoopCounterBench.spillUncommon_ret 0.5 avgt 5 113825.492 ? 1637.428 48194.341 ? 719.049 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2408520138 From qamai at openjdk.org Sun Oct 13 07:03:04 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 13 Oct 2024 07:03:04 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. > > My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. > > A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. > > - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. > - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. > - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: refine comments + typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/74fbc7d5..12d1a2b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=04-05 Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From shade at openjdk.org Sun Oct 13 08:04:15 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Sun, 13 Oct 2024 08:04:15 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: On Sun, 13 Oct 2024 07:03:04 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > refine comments + typo This was really found by @rschwietzke, maybe he would like to test it :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2408872608 From jbhateja at openjdk.org Sun Oct 13 09:57:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 13 Oct 2024 09:57:00 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v24] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update adlc changes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/ce76c3e5..506ae299 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=22-23 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Sun Oct 13 11:18:01 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 13 Oct 2024 11:18:01 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Updating tests to use floorMod ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/1cca8e24..79ee29c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=15-16 Stats: 31 lines in 31 files changed: 0 ins; 0 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Sun Oct 13 17:12:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 13 Oct 2024 17:12:11 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v4] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 18:52:30 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated code as per review comments Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20633#pullrequestreview-2364957828 From jbhateja at openjdk.org Sun Oct 13 17:12:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 13 Oct 2024 17:12:12 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: <0EVgFAWZ9O9e_dlRj4Na8Xf7QEC6FGU03wAs1bwVq5c=.ff19b62d-3b2c-4320-8be2-c7dee7cafdae@github.com> References: <0EVgFAWZ9O9e_dlRj4Na8Xf7QEC6FGU03wAs1bwVq5c=.ff19b62d-3b2c-4320-8be2-c7dee7cafdae@github.com> Message-ID: On Thu, 10 Oct 2024 23:13:11 GMT, Srinivas Vamsi Parasa wrote: >> We can do this change in a separate PR. > > I agree with Smita. The current implementation has a one-to-one correspondence with the ipsec implementation. Any new changes or refactoring could be implemented as a separate PR. I agree, in principle, any optimization crafted to AVX2 is also applicable to AVX512 target, in future with AVX10.2 (converged ISA) we will have a 256bits flavors of two table permute for non-AVX512 targets, for now AVX-SHA512 is only available on client parts (upcoming Lunar lake) and its ok to follow the IPsec algorithm in toto. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1798463742 From thartmann at openjdk.org Mon Oct 14 05:30:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 05:30:23 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 14:23:48 GMT, Tobias Hartmann wrote: >> Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. >> >> Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: >> https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java > > Co-authored-by: Christian Hagedorn Thanks for the reviews, Vladimir and Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21450#issuecomment-2409974472 From thartmann at openjdk.org Mon Oct 14 05:30:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 05:30:24 GMT Subject: Integrated: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 13:29:11 GMT, Tobias Hartmann wrote: > Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. > > Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: > https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 > > Thanks, > Tobias This pull request has now been integrated. Changeset: 8d0975a2 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/8d0975a27d826f7aa487a612131827586abaefd5 Stats: 85 lines in 4 files changed: 79 ins; 0 del; 6 mod 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found Reviewed-by: chagedorn, kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/21450 From duke at openjdk.org Mon Oct 14 06:41:12 2024 From: duke at openjdk.org (Rene Schwietzke) Date: Mon, 14 Oct 2024 06:41:12 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v4] In-Reply-To: References: Message-ID: On Sat, 12 Oct 2024 10:52:11 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add LoopAwaredSpilling flag, refine implementation > > New benchmark results: > > Before After > Benchmark (prob) Mode Cnt Score Error Score Error Units > LoopCounterBench.field_ret N/A avgt 5 425.678 ? 5.086 419.819 ? 1.965 ns/op > LoopCounterBench.localVar_ret N/A avgt 5 1126.937 ? 1.078 325.651 ? 5.309 ns/op > LoopCounterBench.reloadAtEntry_ret N/A avgt 5 582.465 ? 2.649 491.421 ? 0.909 ns/op > LoopCounterBench.spillUncommon_ret 0.0 avgt 5 490.901 ? 5.505 490.981 ? 2.118 ns/op > LoopCounterBench.spillUncommon_ret 0.01 avgt 5 2491.557 ? 4.837 1912.170 ? 19.208 ns/op > LoopCounterBench.spillUncommon_ret 0.1 avgt 5 21316.571 ? 88.198 10518.618 ? 183.380 ns/op > LoopCounterBench.spillUncommon_ret 0.2 avgt 5 42095.064 ? 210.995 19908.240 ? 313.108 ns/op > LoopCounterBench.spillUncommon_ret 0.5 avgt 5 113825.492 ? 1637.428 48194.341 ? 719.049 ns/op Sure thing, I will give it a try in the coming days. @merykitty 's results look promising. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2409100311 From epeter at openjdk.org Mon Oct 14 07:50:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Oct 2024 07:50:37 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v22] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Add example where I use the framework with VM flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/ad3865bb..2d4a8ff0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=20-21 Stats: 132 lines in 2 files changed: 132 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Oct 14 08:36:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Oct 2024 08:36:12 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v23] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 76 additional commits since the last revision: - Merge branch 'master' into fuzzer-test - test refactoring - Add example where I use the framework with VM flags - Apply suggestions from code review Co-authored-by: Christian Hagedorn - move some code for Christian - more for Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - another small suggestion from Christian - more fixup for Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - ... and 66 more: https://git.openjdk.org/jdk/compare/3e81a0a4...5178e7c2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/2d4a8ff0..5178e7c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=21-22 Stats: 211263 lines in 1584 files changed: 195772 ins; 7833 del; 7658 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From thartmann at openjdk.org Mon Oct 14 08:54:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 08:54:28 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type Message-ID: After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 -> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. Thanks, Tobias ------------- Commit messages: - First prototype Changes: https://git.openjdk.org/jdk/pull/21470/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339694 Stats: 111 lines in 4 files changed: 108 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21470/head:pull/21470 PR: https://git.openjdk.org/jdk/pull/21470 From duke at openjdk.org Mon Oct 14 08:59:18 2024 From: duke at openjdk.org (Rene Schwietzke) Date: Mon, 14 Oct 2024 08:59:18 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: <3rG5xHJ4ikWxAeo6XP_XCbrUCuRa9M8KDUpU7L1iEOU=.fb54b7ad-056a-4f06-b034-1d3123a44db7@github.com> On Sun, 13 Oct 2024 07:03:04 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > refine comments + typo Fix confirmed. Performance matches the user expectation when pulling data local. I will look into the runtime difference for the plain loop and systemcopy. ### Old - JDK 23.0.0 Benchmark (SIZE) Mode Cnt Score Error Units Example8ArrayCopying.manualCopy1 1000 avgt 10 70.222 ? 3.549 ns/op Example8ArrayCopying.manualCopy2 1000 avgt 10 70.011 ? 0.880 ns/op Example8ArrayCopying.manualCopyAntiUnroll1 1000 avgt 10 394.275 ? 20.067 ns/op Example8ArrayCopying.manualCopyAntiUnroll2 1000 avgt 10 636.158 ? 101.505 ns/op Example8ArrayCopying.manualCopyAntiUnroll3 1000 avgt 10 1646.330 ? 23.042 ns/op Example8ArrayCopying.systemCopy 1000 avgt 10 74.845 ? 1.535 ns/op ### New - JDK 24-internal (merrykitty/improveregalloc, 12d1a2b21fc62145dac04fecf43f267f539b2aa5) Example8ArrayCopying.manualCopy1 1000 avgt 10 80.155 ? 4.504 ns/op Example8ArrayCopying.manualCopy2 1000 avgt 10 81.122 ? 3.074 ns/op Example8ArrayCopying.manualCopyAntiUnroll1 1000 avgt 10 394.094 ? 6.809 ns/op Example8ArrayCopying.manualCopyAntiUnroll2 1000 avgt 10 626.155 ? 13.055 ns/op Example8ArrayCopying.manualCopyAntiUnroll3 1000 avgt 10 564.199 ? 23.854 ns/op Example8ArrayCopying.systemCopy 1000 avgt 10 99.393 ? 0.634 ns/op Source code for reference: https://github.com/Xceptance/jmh-training/blob/1dbcc9c38553b0e8b683c6f70475a25150b66635/src/main/java/org/xc/jmh/Example8ArrayCopying.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2410501449 From amitkumar at openjdk.org Mon Oct 14 09:03:10 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 14 Oct 2024 09:03:10 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double In-Reply-To: References: Message-ID: <1pH2MHT0z7llvLP9DFnu1H9V1YKdEHOfDvksC1nEhVk=.800bafbe-598d-47e7-b047-ad4cab5d73e5@github.com> On Fri, 4 Oct 2024 10:39:25 GMT, Amit Kumar wrote: > This is trivial PR to change data type of some "*Threshold" variables form `intx` to `double`. Hi, Can I get reviews for this trivial change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2410511223 From thartmann at openjdk.org Mon Oct 14 09:31:33 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 09:31:33 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v2] In-Reply-To: References: Message-ID: > After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 > > Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 > > -> > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 > > The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. > > Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: > https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 > > I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Missed a return ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21470/files - new: https://git.openjdk.org/jdk/pull/21470/files/94259abe..0e9e0219 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21470/head:pull/21470 PR: https://git.openjdk.org/jdk/pull/21470 From thartmann at openjdk.org Mon Oct 14 10:46:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 10:46:50 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: > After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 > > Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 > > -> > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 > > The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. > > Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: > https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 > > I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Modified ciTypeFlow::can_trap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21470/files - new: https://git.openjdk.org/jdk/pull/21470/files/0e9e0219..4a48a793 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21470/head:pull/21470 PR: https://git.openjdk.org/jdk/pull/21470 From chagedorn at openjdk.org Mon Oct 14 11:06:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 14 Oct 2024 11:06:30 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v23] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 08:36:12 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 76 additional commits since the last revision: > > - Merge branch 'master' into fuzzer-test > - test refactoring > - Add example where I use the framework with VM flags > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - move some code for Christian > - more for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - another small suggestion from Christian > - more fixup for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - ... and 66 more: https://git.openjdk.org/jdk/compare/bcd1673b...5178e7c2 Nice that you added an additional test. Still looks good. test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 51: > 49: Should one require the modified classpath that includes the compiled classes, this is available with `compileFramework.getEscapedClassPathOfCompiledClasses()`. This can be necessary if the test launches any other VMs that also access the compiled classes. This is for example necessary when using the IR Framework. > 50: > 51: ### Running the compiled code in a new VM Following the capital letter style from the other titles: Suggestion: ### Running the Compiled Code in a New VM test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/RunWithFlagsExample.java line 66: > 64: CompileFramework comp = new CompileFramework(); > 65: > 66: // Add a java source file. Suggestion: // Add a Java source file. test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/RunWithFlagsExample.java line 78: > 76: comp.getEscapedClassPathOfCompiledClasses(), > 77: // Pass additional flags here. > 78: // "-Xbatch" is a harmless VM flag, so this example runs everywhere without issue. Suggestion: // "-Xbatch" is a harmless VM flag, so this example runs everywhere without issues. test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/RunWithFlagsExample.java line 87: > 85: > 86: // Execute the command, and capture the output. > 87: // The JTREG VM options are automatically passed to the test VM. Suggestion: // The JTREG Java and VM options are automatically passed to the test VM. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20184#pullrequestreview-2366239197 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1799268346 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1799272237 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1799272889 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1799274775 From duke at openjdk.org Mon Oct 14 11:25:20 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 14 Oct 2024 11:25:20 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 09:13:37 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Replace LogDecorators::Decorator::NoDecorator with LogDecorators::None Right now, no space is being added at the beginning. I agree that it should be important to at least distinguish decorators from the message itself and I hope to address that, alongside multiline/grouping, in a future PR (adding a space between them seems an easy and sensible choice IMO, and in the case for no decorators we would just have a starting space which does not affect human readability). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21383#issuecomment-2410918753 From jbhateja at openjdk.org Mon Oct 14 12:15:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 14 Oct 2024 12:15:11 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 11 Oct 2024 17:12:49 GMT, Jasmine Karthikeyan wrote: > > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` > > I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. Hey @jaskarth , @merykitty , we already have an infrastructure where during parsing we create Macro Nodes which can be lowered / expanded to multiple IRs nodes during macro expansion, what we need in this case is a target specific IR pattern check since not all targets may support 32x32 multiplication with quadword saturation, idea is to avoid creating a new IR and piggyback needed on existing MulVL IR, we already use such tricks for relaxed unsafe reductions. Patch is performing point optimization for specific set of constrained multiplication patterns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411053693 From thartmann at openjdk.org Mon Oct 14 12:18:11 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 12:18:11 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 23:27:35 GMT, Sandhya Viswanathan wrote: > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya That looks good to me. @eme64 should have a look as well. I submitted testing and will report back once it passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21480#pullrequestreview-2366448884 From epeter at openjdk.org Mon Oct 14 12:21:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Oct 2024 12:21:16 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 23:27:35 GMT, Sandhya Viswanathan wrote: > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java line 76: > 74: sout[i+1] = Float.floatToFloat16(finp[i+1]); > 75: } > 76: } Your test looks different than the one that I added on JIRA. Can you please add that one as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1799399301 From epeter at openjdk.org Mon Oct 14 12:26:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Oct 2024 12:26:14 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 23:27:35 GMT, Sandhya Viswanathan wrote: > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya src/hotspot/cpu/x86/x86.ad line 3679: > 3677: > 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{ > 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); Ok, and so what alternative path is the matcher now going to take if we have `vector_length = 8`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1799405906 From epeter at openjdk.org Mon Oct 14 12:33:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Oct 2024 12:33:44 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v24] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/lib/compile_framework/README.md Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/5178e7c2..4eeab363 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=22-23 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From qamai at openjdk.org Mon Oct 14 13:45:23 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 14 Oct 2024 13:45:23 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: On Sun, 13 Oct 2024 07:03:04 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > refine comments + typo Thanks for the source code. That's really interesting, running the benchmark multiple times may give different results, and even when there is a difference in the observed throughputs, the 2 compiled methods are exactly the same. So I think we are running into different quirks here, probably due to the fact that this benchmark saturates the memory bandwidth. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2411312474 From qamai at openjdk.org Mon Oct 14 14:14:13 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 14 Oct 2024 14:14:13 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: <2g_Hm5UuVBqoklekkaxtnYn05JYKmosnzaMefQi_q3s=.aea039bb-d80c-4863-986b-d73d7cf71fcc@github.com> On Mon, 14 Oct 2024 12:12:58 GMT, Jatin Bhateja wrote: >>> I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` >> >> I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. > >> > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` >> >> I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. > > Hey @jaskarth , @merykitty , we already have an infrastructure where during parsing we create Macro Nodes which can be lowered / expanded to multiple IRs nodes during macro expansion, what we need in this case is a target specific IR pattern check since not all targets may support 32x32 multiplication with quadword saturation, idea is to avoid creating a new IR and piggyback needed information on existing MulVL IR, we already use such tricks for relaxed unsafe reductions. Going forward, infusion of KnownBits into our data flow analysis infrastructure will streamline such optimizations, this patch is performing point optimization for specific set of constrained multiplication patterns. @jatin-bhateja That is machine-independent lowering, we are talking about machine-dependent lowering to which `MacroLogicV` transformation belongs. You can have `phaselowering_x86` and not have to add another method to `Matcher` as well as add default implementations to various architecture files. You can reuse `MulVL` node for that but I believe these transformations should be done as late as possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411389030 From qamai at openjdk.org Mon Oct 14 14:17:09 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 14 Oct 2024 14:17:09 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v7] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. > > My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. > > A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. > > - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. > - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. > - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix uncommon_freq ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/12d1a2b2..1d36cb4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From jkarthikeyan at openjdk.org Mon Oct 14 15:07:21 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 14 Oct 2024 15:07:21 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction For the record I think in this PR we could simply match the IR patterns in the ad file, since (from my understanding) the patterns we are matching could be supported there. We should do platform-specific lowering in a separate patch because it is pretty nuanced, and we could potentially move it to the new system afterwards. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411538179 From psandoz at openjdk.org Mon Oct 14 15:37:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 14 Oct 2024 15:37:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sun, 13 Oct 2024 11:18:01 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating tests to use floorMod Marked as reviewed by psandoz (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2367019017 From iveresov at openjdk.org Mon Oct 14 16:48:27 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Mon, 14 Oct 2024 16:48:27 GMT Subject: Integrated: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" In-Reply-To: References: Message-ID: <1Oi70hebODn90MIbP6HFaaHObA0zX57DCgsh-4LnJK8=.400ceef1-812d-4cea-8f75-50f3d36a210c@github.com> On Thu, 10 Oct 2024 15:22:20 GMT, Igor Veresov wrote: > `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. This pull request has now been integrated. Changeset: a8a8b2de Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/a8a8b2deba854ac105ed760c09e65701c4d0f6fc Stats: 13 lines in 2 files changed: 10 ins; 2 del; 1 mod 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" Reviewed-by: dlong, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21455 From aboldtch at openjdk.org Mon Oct 14 17:36:13 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 14 Oct 2024 17:36:13 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 09:13:37 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Replace LogDecorators::Decorator::NoDecorator with LogDecorators::None I have a few smaller nits, and one larger issue. The rest of the implementation and logic looks fine. src/hotspot/share/logging/logDecorators.cpp line 30: > 28: const LogLevelType AnyLevel = LogLevelType::NotMentioned; > 29: #define UNDECORATED_DEFAULTS \ > 30: UNDECORATED_DEFAULT(AnyLevel, LOG_TAGS(jit, inlining)) Maybe move this down to next where it is used and then `#undef UNDECORATED_DEFAULTS` src/hotspot/share/logging/logDecorators.cpp line 55: > 53: #define UNDECORATED_DEFAULT(level, ...) LogDecorators::DefaultUndecoratedSelection(level, __VA_ARGS__), > 54: UNDECORATED_DEFAULTS > 55: #undef UNDECORATED_TAGSET Suggestion: #undef UNDECORATED_DEFAULT Typo, I think this was ment to match with the `#define`. src/hotspot/share/logging/logDecorators.cpp line 57: > 55: #undef UNDECORATED_TAGSET > 56: }; > 57: const size_t LogDecorators::number_of_default_decorators = sizeof(default_decorators) / sizeof(LogDecorators::DefaultUndecoratedSelection); I think this reads better and is less error prone. Suggestion: const size_t LogDecorators::number_of_default_decorators = ARRAY_SIZE(default_decorators); src/hotspot/share/logging/logDecorators.hpp line 142: > 140: // Check if we have some default decorators for a given LogSelection. If that is the case, > 141: // the output parameter mask will contain the defaults-specified decorators mask > 142: static bool has_disabled_default_decorators(const LogSelection& selection, const DefaultUndecoratedSelection* defaults = default_decorators, size_t defaults_count = number_of_default_decorators); I was trying to think if we could make this mockable without the incomplete object types (`const DefaultUndecoratedSelection* defaults = default_decorators, size_t defaults_count = number_of_default_decorators`). Maybe have the mockable part private (we already friend the gtest) and only have a `static bool has_disabled_default_decorators(const LogSelection& selection)` public (which calls this on the inside). But I am fine with this as it currently is. ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2367213509 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1799860489 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1799862505 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1799845556 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1799869065 From aboldtch at openjdk.org Mon Oct 14 17:39:12 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 14 Oct 2024 17:39:12 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 09:13:37 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Replace LogDecorators::Decorator::NoDecorator with LogDecorators::None Seems like on of my comments (the large one) got lost. Trying this again. :) src/hotspot/share/logging/logDecorators.hpp line 96: > 94: > 95: const LogSelection& selection() const { return _selection; } > 96: }; I am uncomfortable with this type erasure. `LogTagType[LogTag::MaxTags + 1 /* = 6 */]` -> `LogTagType*` -> `LogTagType[LogTag::MaxTags /* = 5 */]`. I think this should be rewritten so that `tag_arr` is typed as a `LogTagType[5]`. I think everywhere we have a `const LogTagType parameter[LogTag::MaxTags]` really should have been `const LogTagType (¶meter)[LogTag::MaxTags]` so that this would have been a compile error. My suggestion is to either do the following: Suggestion: public: DefaultUndecoratedSelection(LogLevelType level, LogTagType t0, LogTagType t1 = LogTag::__NO_TAG, LogTagType t2 = LogTag::__NO_TAG, LogTagType t3 = LogTag::__NO_TAG, LogTagType t4 = LogTag::__NO_TAG, LogTagType guard_tag = LogTag::__NO_TAG) : _selection(LogSelection::Invalid) { assert(guard_tag == LogTag::__NO_TAG, "Too many tags specified!"); LogTagType tag_arr[LogTag::MaxTags] = { t0, t1, t2, t3, t4 }; _selection = LogSelection(tag_arr, false, level); } const LogSelection& selection() const { return _selection; } }; or maybe even better, do what we do for the `LogTagSet` and have a static helper and a private constructor, so that we can turn all the asserts into compile errors. Something like: Suggestion: DefaultUndecoratedSelection(LogLevelType level, LogTagType t0, LogTagType t1, LogTagType t2, LogTagType t3, LogTagType t4) : _selection(LogSelection::Invalid) { LogTagType tag_arr[LogTag::MaxTags] = { t0, t1, t2, t3, t4 }; _selection = LogSelection(tag_arr, false, level); } public: template static DefaultUndecoratedSelection make() { STATIC_ASSERT(GuardTag == LogTag::__NO_TAG); return DefaultUndecoratedSelection(Level, T0, T1, T2, T3, T4); } const LogSelection& selection() const { return _selection; } }; And we can then use `LogDecorators::DefaultUndecoratedSelection::make()` to create them. ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2367260954 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1799872623 From jbhateja at openjdk.org Mon Oct 14 17:50:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 14 Oct 2024 17:50:14 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Mon, 14 Oct 2024 15:04:54 GMT, Jasmine Karthikeyan wrote: > For the record I think in this PR we could simply match the IR patterns in the ad file, since (from my understanding) the patterns we are matching could be supported there. We should do platform-specific lowering in a separate patch because it is pretty nuanced, and we could potentially move it to the new system afterwards. Hi @jaskarth , Bigger pattern matching is sensitive to [IR level node sharing](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.cpp#L1724), thus it may not full proof for above 4 patterns. Current patch takes care of this limitation. > @jatin-bhateja That is machine-independent lowering, we are talking about machine-dependent lowering to which `MacroLogicV` transformation belongs. You can have `phaselowering_x86` and not have to add another method to `Matcher` as well as add default implementations to various architecture files. You can reuse `MulVL` node for that but I believe these transformations should be done as late as possible. Hi @merykitty, I see some scope of refactoring and carving out a separate target specific lowering pass going forward, I have brough this up in past too. Existing optimizations are in line with current infrastructure and guards target specific optimizations with target specific match_rule_supported checks e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L2898. As @jaskarth suggests we can pick this up going forward. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411884206 From duke at openjdk.org Mon Oct 14 17:56:28 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 14 Oct 2024 17:56:28 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn Message-ID: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways ------------- Commit messages: - Add regression test - Remove unnecessary use of rscratch2 - 8335662: [AArch64] C2: guarantee(val < (1ULL << nbits)) failed: Field too big for insn Changes: https://git.openjdk.org/jdk/pull/21473/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21473&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335662 Stats: 46 lines in 3 files changed: 43 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21473.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21473/head:pull/21473 PR: https://git.openjdk.org/jdk/pull/21473 From aph at openjdk.org Mon Oct 14 17:56:28 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 14 Oct 2024 17:56:28 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: On Fri, 11 Oct 2024 16:51:16 GMT, Chad Rakoczy wrote: > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways Thanks. Fix looks reasonable, but i think we need a regression test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21473#issuecomment-2408568858 From duke at openjdk.org Mon Oct 14 17:56:28 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 14 Oct 2024 17:56:28 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: <1hTqbs0Xtv8J3MbMfHiGxmktyEjwnJ49jK20ojCc27I=.1994823d-94f9-4ae9-96e8-3c527be72825@github.com> On Fri, 11 Oct 2024 16:51:16 GMT, Chad Rakoczy wrote: > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways Yeah working on adding a regression test ------------- PR Comment: https://git.openjdk.org/jdk/pull/21473#issuecomment-2411744674 From sviswanathan at openjdk.org Mon Oct 14 18:38:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 14 Oct 2024 18:38:12 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 12:23:25 GMT, Emanuel Peter wrote: >> When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. >> Also a regression test case is added accordingly. >> >> Best Regards, >> Sandhya > > src/hotspot/cpu/x86/x86.ad line 3679: > >> 3677: >> 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{ >> 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); > > Ok, and so what alternative path is the matcher now going to take if we have `vector_length = 8`? @eme64 It is now going to do the load from memory in register and use the register/register version for the conversion and then use the store to memory. Please see before and after code snippets below. Generated code snippet for 2 element float vector to float16 vector conversion Before: vmovq 0x10(%rdx,%rbx,4),%xmm2 ; load 8 bytes from memory into xmm2 (correct) vcvtps2ph $0x4,%xmm2,0x10(%rsi,%rbx,2) ; convert to float16 and store 8 bytes to memory (incorrect) After: vmovq 0x10(%rdx,%rbx,4),%xmm15 ; load 8 bytes from memory into xmm15 (correct) vcvtps2ph $0x4,%xmm15,%xmm0 ; convert to float16 into register (correct) vmovd %xmm0,0x10(%rsi,%rbx,2) ; store 4 byte into memory (correct) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1799938212 From kvn at openjdk.org Mon Oct 14 19:32:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 14 Oct 2024 19:32:14 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 10:46:50 GMT, Tobias Hartmann wrote: >> After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 >> >> Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 >> >> -> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 >> >> The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. >> >> Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: >> https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 >> >> I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Modified ciTypeFlow::can_trap src/hotspot/share/ci/ciTypeFlow.cpp line 2220: > 2218: case Bytecodes::_ldc_w: > 2219: case Bytecodes::_ldc2_w: > 2220: return str.is_in_error() || !str.get_constant().is_loaded(); There is also `con.is_valid()` check in `do_ldc()`. But I do know what memory is referenced in "OutOfMemoryError in the CI while loading a String constant" when it is invalid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21470#discussion_r1799984194 From kvn at openjdk.org Mon Oct 14 19:42:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 14 Oct 2024 19:42:13 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 10:39:25 GMT, Amit Kumar wrote: > This is trivial PR to change data type of some "*Threshold" variables form `intx` to `double`. Any reasons `Tier2*Threshold` flags were bit changed? For consistency. ------------- PR Review: https://git.openjdk.org/jdk/pull/21354#pullrequestreview-2367471555 From svkamath at openjdk.org Mon Oct 14 20:54:12 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 14 Oct 2024 20:54:12 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v4] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 18:52:30 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated code as per review comments @ascarpino, I have approvals for this PR. Would it be possible for you to run tests and let me know the results? I appreciate your help. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20633#issuecomment-2412323595 From vlivanov at openjdk.org Mon Oct 14 21:19:12 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 14 Oct 2024 21:19:12 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 10:46:50 GMT, Tobias Hartmann wrote: >> After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 >> >> Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 >> >> -> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 >> >> The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. >> >> Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: >> https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 >> >> I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Modified ciTypeFlow::can_trap Proposed fix is broader than strictly needed to fix the immediate problem observed with condy. It affects all LDCs with not-yet-resolved CP entires. IMO it should be fine, but I haven't thought it through. (Also, the comment at https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciTypeFlow.cpp#L2210 is outdated now.) And [`Parse::do_one_bytecode()`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/parse2.cpp#L1962) should always see resolved case now (`constant.is_loaded() == true`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21470#issuecomment-2412358307 From jkarthikeyan at openjdk.org Mon Oct 14 21:53:15 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 14 Oct 2024 21:53:15 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction > Hi @jaskarth , Bigger pattern matching is sensitive to [IR level node sharing](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.cpp#L1724), thus it may not full proof for above 4 patterns. Current patch takes care of this limitation. I think this is a good point. I've taken a look at the patch and added some comments below. src/hotspot/cpu/x86/matcher_x86.hpp line 184: > 182: // Does the CPU supports doubleword multiplication with quadword saturation. > 183: static constexpr bool supports_double_word_mult_with_quadword_staturation(void) { > 184: return true; Should this be `UseAVX > 0`? I'm wondering since we have a `MulVL` rule that applies when `UseAVX == 0`. src/hotspot/share/opto/vectornode.cpp line 2089: > 2087: if (Matcher::supports_double_word_mult_with_quadword_staturation() && > 2088: !is_mult_lower_double_word()) { > 2089: auto is_clear_upper_double_word_uright_shift_op = [](const Node *n) { Suggestion: auto is_clear_upper_double_word_uright_shift_op = [](const Node* n) { src/hotspot/share/opto/vectornode.cpp line 2093: > 2091: n->in(2)->Opcode() == Op_RShiftCntV && n->in(2)->in(1)->is_Con() && > 2092: n->in(2)->in(1)->bottom_type()->isa_int() && > 2093: n->in(2)->in(1)->bottom_type()->is_int()->get_con() == 32L; Suggestion: n->in(2)->in(1)->bottom_type()->is_int()->get_con() == 32; Since you are comparing with a `TypeInt` I think this shouldn't be `32L`. src/hotspot/share/opto/vectornode.cpp line 2098: > 2096: auto is_lower_double_word_and_mask_op = [](const Node *n) { > 2097: if (n->Opcode() == Op_AndV) { > 2098: Node *replicate_operand = n->in(1)->Opcode() == Op_Replicate ? n->in(1) Suggestion: Node* replicate_operand = n->in(1)->Opcode() == Op_Replicate ? n->in(1) src/hotspot/share/opto/vectornode.cpp line 2124: > 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > 2123: if ((is_lower_double_word_and_mask_op(in(1)) || > 2124: is_lower_double_word_and_mask_op(in(1)) || `is_lower_double_word_and_mask_op(in(1)) || is_lower_double_word_and_mask_op(in(1))` is redundant, right? Shouldn't you only need it once? Same for the other 3 calls, which are similarly repeated. test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 41: > 39: */ > 40: > 41: public class VectorMultiplyOpt { Could it be possible to also do IR verification in this test? It would be good to check that we don't generate `AndVL` or `URShiftVL` with this transform. test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 43: > 41: public class VectorMultiplyOpt { > 42: > 43: public static long [] src1; Suggestion: public static long[] src1; And for the rest of the `long []` in this file too. test/micro/org/openjdk/bench/jdk/incubator/vector/VectorXXH3HashingBenchmark.java line 39: > 37: @Param({"1024", "2048", "4096", "8192"}) > 38: private int SIZE; > 39: private long [] accumulators; Suggestion: private long[] accumulators; ------------- PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2367683334 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800159123 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153755 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153568 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153842 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800151177 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800167403 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800165261 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800169840 From sviswanathan at openjdk.org Mon Oct 14 23:35:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 14 Oct 2024 23:35:43 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: Message-ID: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Update test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21480/files - new: https://git.openjdk.org/jdk/pull/21480/files/dedb4a0a..ed299327 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21480/head:pull/21480 PR: https://git.openjdk.org/jdk/pull/21480 From sviswanathan at openjdk.org Mon Oct 14 23:35:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 14 Oct 2024 23:35:43 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 12:18:30 GMT, Emanuel Peter wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test case > > test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java line 76: > >> 74: sout[i+1] = Float.floatToFloat16(finp[i+1]); >> 75: } >> 76: } > > Your test looks different than the one that I added on JIRA. Can you please add that one as well? Thanks for pointing that out. I have modified the contents of the loop kernel to match your testcase loop kernel now. I also verified that it fails before the fix and passes after the fix. Before the fix the test fails: Test results: failed: 1 And the jtr file shows the following: Custom Run Test: @Run: kernel_test_float_float16 - @Tests: {test_float_float16,test_float_float16_strided,test_float_float16_short_vector}: compiler.lib.ir_framework.shared.TestRunException: There was an error while invoking @Run method public void compiler.vectorization.TestFloatConversionsVector.kernel_test_float_float16() at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:162) at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:87) at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861) at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252) at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165) Caused by: java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:119) at java.base/java.lang.reflect.Method.invoke(Method.java:573) at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159) ... 4 more Caused by: java.lang.RuntimeException: assertEquals expected: 18483 but was: 0 at jdk.test.lib.Asserts.fail(Asserts.java:691) at jdk.test.lib.Asserts.assertEquals(Asserts.java:204) at jdk.test.lib.Asserts.assertEquals(Asserts.java:191) at compiler.vectorization.TestFloatConversionsVector.kernel_test_float_float16(TestFloatConversionsVector.java:112) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ... 6 more After the fix the test passes with no failures: Test results: passed: 1 Please let me know if this works or you would like to see any other change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1800232110 From vlivanov at openjdk.org Tue Oct 15 00:31:12 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Oct 2024 00:31:12 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Some time ago, there was a relevant experiment to optimize vectorized Poly1305 implementation by utilizing VPMULDQ instruction on x86 (see [JDK-8219881](https://bugs.openjdk.org/browse/JDK-8219881) for details). The implementation used int-to-long vector casts and produced the following IR shape: `MulVL (VectorCastI2X src1) (VectorCastI2X src2)`. Does it make sense to cover it as part of this particular enhancement? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2412582542 From dlong at openjdk.org Tue Oct 15 04:34:09 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 15 Oct 2024 04:34:09 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: On Fri, 11 Oct 2024 16:51:16 GMT, Chad Rakoczy wrote: > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways > > Confirmed that added test fails before patch and passes after Why is part of the test a binary .class file? ------------- Changes requested by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21473#pullrequestreview-2368140781 From rrich at openjdk.org Tue Oct 15 06:36:52 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 15 Oct 2024 06:36:52 GMT Subject: RFR: 8341862: PPC64: C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR Message-ID: Make sure `LIR_Assembler::emit_unwind_handler()` jumps to the slow path directly for unlocking a synchronized method if `LM_MONITOR` is used. On the fast paths assertions are added that the mode is actually handled. The change passed our CI testing: Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. Testing was done on the main platforms and also on Linux/PPC64le and AIX. ------------- Commit messages: - C1: fix unlock in unwind handler for LM_MONITOR Changes: https://git.openjdk.org/jdk/pull/21497/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21497&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341862 Stats: 9 lines in 2 files changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21497.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21497/head:pull/21497 PR: https://git.openjdk.org/jdk/pull/21497 From mdoerr at openjdk.org Tue Oct 15 06:36:52 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 15 Oct 2024 06:36:52 GMT Subject: RFR: 8341862: PPC64: C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:51:21 GMT, Richard Reingruber wrote: > Make sure `LIR_Assembler::emit_unwind_handler()` jumps to the slow path directly for unlocking a synchronized method if `LM_MONITOR` is used. > On the fast paths assertions are added that the mode is actually handled. > > The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. Good catch! LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21497#pullrequestreview-2366762944 From amitkumar at openjdk.org Tue Oct 15 06:42:51 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 15 Oct 2024 06:42:51 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v2] In-Reply-To: References: Message-ID: > This is trivial PR to change data type of some "*Threshold" variables form `intx` to `double`. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: updates tier2 threshold datatype ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21354/files - new: https://git.openjdk.org/jdk/pull/21354/files/a53535f5..ce4ff580 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21354&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21354&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21354/head:pull/21354 PR: https://git.openjdk.org/jdk/pull/21354 From amitkumar at openjdk.org Tue Oct 15 06:49:10 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 15 Oct 2024 06:49:10 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v2] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 19:39:07 GMT, Vladimir Kozlov wrote: >Any reasons Tier2*Threshold flags were bit changed? For consistency. I guess you're asking why I left them unchanged? I looked into the project, and couldn't find where those flags are being used, so I left them unchanged at first. However, I've now updated them to `double` as well. Thanks for the suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2413030380 From rrich at openjdk.org Tue Oct 15 06:55:37 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 15 Oct 2024 06:55:37 GMT Subject: RFR: 8341715: PPC64: ObjectMonitor::_owner should be reset unconditionally in nmethod unlocking Message-ID: This removes the `ObjectMonitor::_owner` check when a nmethod unlocks an inflated monitor on ppc64. Monitor operations by nmethods are guaranteed to be balanced (see JBS-item for a reference) therefore the check is redundant. Other platforms don't have it either. I've removed the assertion that the unlocking thread owns the monitor again because it won't work with vthread monitor support in the loom repository. The fix passed our CI testing with `LockingMode` set to `LM_LEGACY` Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. Testing was done on the main platforms and also on Linux/PPC64le and AIX. The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with `LM_LEGACY`. ------------- Commit messages: - Remove assertion - compiler_fast_unlock_object: no need to check ObjectMonitor::_owner Changes: https://git.openjdk.org/jdk/pull/21494/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21494&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341715 Stats: 33 lines in 1 file changed: 26 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21494.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21494/head:pull/21494 PR: https://git.openjdk.org/jdk/pull/21494 From rrich at openjdk.org Tue Oct 15 06:59:39 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 15 Oct 2024 06:59:39 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success Message-ID: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: // flag == EQ indicates success, decrement held monitor count // flag == NE indicates failure The fix passed our CI testing with LockingMode set to LM_LEGACY Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. Testing was done on the main platforms and also on Linux/PPC64le and AIX. The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. ------------- Commit messages: - Must reach success with flag == EQ Changes: https://git.openjdk.org/jdk/pull/21496/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21496&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342042 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21496.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21496/head:pull/21496 PR: https://git.openjdk.org/jdk/pull/21496 From mdoerr at openjdk.org Tue Oct 15 06:59:39 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 15 Oct 2024 06:59:39 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success In-Reply-To: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: <3GAiIctv0lvgogniDgMHaVVuiRorbnLQgN7GwgxN-ek=.b58817d5-2d00-4507-8451-2e2313aa561f@github.com> On Mon, 14 Oct 2024 13:29:43 GMT, Richard Reingruber wrote: > This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: > > > // flag == EQ indicates success, decrement held monitor count > // flag == NE indicates failure > > > The fix passed our CI testing with LockingMode set to LM_LEGACY > Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. Good catch! LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21496#pullrequestreview-2366744824 From epeter at openjdk.org Tue Oct 15 07:00:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 07:00:11 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 18:35:52 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/x86.ad line 3679: >> >>> 3677: >>> 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{ >>> 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); >> >> Ok, and so what alternative path is the matcher now going to take if we have `vector_length = 8`? > > @eme64 It is now going to do the load from memory in register and use the register/register version for the conversion and then use the store to memory. Please see before and after code snippets below. > > Generated code snippet for 2 element float vector to float16 vector conversion > Before: > vmovq 0x10(%rdx,%rbx,4),%xmm2 ; load 8 bytes from memory into xmm2 (correct) > vcvtps2ph $0x4,%xmm2,0x10(%rsi,%rbx,2) ; convert to float16 and store 8 bytes to memory (incorrect) > > After: > vmovq 0x10(%rdx,%rbx,4),%xmm15 ; load 8 bytes from memory into xmm15 (correct) > vcvtps2ph $0x4,%xmm15,%xmm0 ; convert to float16 into register (correct) > vmovd %xmm0,0x10(%rsi,%rbx,2) ; store 4 byte into memory (correct) Ah, I see. You are using a 4-element register-only `vcvtps2ph` instruction, but only use the first 2-elements of it. Great :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1800564054 From rrich at openjdk.org Tue Oct 15 06:59:39 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 15 Oct 2024 06:59:39 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success In-Reply-To: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: On Mon, 14 Oct 2024 13:29:43 GMT, Richard Reingruber wrote: > This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: > > > // flag == EQ indicates success, decrement held monitor count > // flag == NE indicates failure > > > The fix passed our CI testing with LockingMode set to LM_LEGACY > Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. Thanks for the quick review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21496#issuecomment-2411369527 From chagedorn at openjdk.org Tue Oct 15 07:03:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Oct 2024 07:03:10 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 15:15:54 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Suggestions from review Looks good, thanks for the update! I'll give this another spinning in our testing. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21439#pullrequestreview-2368344706 From epeter at openjdk.org Tue Oct 15 07:04:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 07:04:11 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> Message-ID: On Mon, 14 Oct 2024 23:35:43 GMT, Sandhya Viswanathan wrote: >> When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. >> Also a regression test case is added accordingly. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Update test case Thanks for the updates! It looks good to me now. I have one more wish: Could you allow to run the test on all platforms please? `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java` Currently, it only runs on selected platforms, see `@requires`. We should really run this on all platforms and compilers. The IR rules can be limited to platforms and features. That ensures that we do not have similar bugs elsewhere, and our test-coverage would be increased. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21480#pullrequestreview-2368347957 From chagedorn at openjdk.org Tue Oct 15 07:08:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Oct 2024 07:08:17 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v24] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 12:33:44 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/lib/compile_framework/README.md > > Co-authored-by: Christian Hagedorn > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20184#pullrequestreview-2368354538 From mli at openjdk.org Tue Oct 15 08:06:25 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 15 Oct 2024 08:06:25 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Message-ID: Hi, Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. This pr is based on https://github.com/openjdk/jdk/pull/20781. Thanks! ## Test ### tests: * test/jdk/jdk/incubator/vector/ * test/hotspot/jtreg/compiler/vectorapi/ ### options: * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs * -XX:+EnableVectorSupport -XX:-UseVectorStubs ## Performance ### Tests jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests in another pr). ### Options * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' ### Performance data I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/21502/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21502&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312425 Stats: 161 lines in 6 files changed: 156 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21502.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21502/head:pull/21502 PR: https://git.openjdk.org/jdk/pull/21502 From thartmann at openjdk.org Tue Oct 15 08:07:13 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 08:07:13 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 19:29:52 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Modified ciTypeFlow::can_trap > > src/hotspot/share/ci/ciTypeFlow.cpp line 2220: > >> 2218: case Bytecodes::_ldc_w: >> 2219: case Bytecodes::_ldc2_w: >> 2220: return str.is_in_error() || !str.get_constant().is_loaded(); > > There is also `con.is_valid()` check in `do_ldc()`. But I do know what memory is referenced in "OutOfMemoryError in the CI while loading a String constant" when it is invalid. But in that case no exception is installed and we bail out from compilation, right? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciTypeFlow.cpp#L746 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21470#discussion_r1800656778 From tschatzl at openjdk.org Tue Oct 15 08:11:20 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 15 Oct 2024 08:11:20 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 19:57:25 GMT, Dean Long wrote: > @tschatzl , when we call register_nmethod(), do we really need to scan the oops immediately, or could that be delayed until the next safepoint? Could be delayed at least for the STW collectors, but we want to avoid doing any work during gc as much as possible. This may be more tricky with concurrent gcs. After some talk with @TobiHartmann we think that it is best and safer to extend the lock scope. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2413184441 From thartmann at openjdk.org Tue Oct 15 08:15:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 08:15:10 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 21:16:51 GMT, Vladimir Ivanov wrote: > Proposed fix is broader than strictly needed to fix the immediate problem observed with condy Right, to be on the safe side, I could add a `str.is_dynamic_constant()` check to limit the trap to condy. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21470#issuecomment-2413191943 From jbhateja at openjdk.org Tue Oct 15 08:20:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 15 Oct 2024 08:20:24 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> Message-ID: On Mon, 14 Oct 2024 23:35:43 GMT, Sandhya Viswanathan wrote: >> When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. >> Also a regression test case is added accordingly. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Update test case src/hotspot/cpu/x86/x86.ad line 3679: > 3677: > 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{ > 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); You can add an eligant prediction check like following instead of accesing bare inputs. n->as_StoreVector()->memory_size() >= 16. test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java line 110: > 108: } > 109: > 110: // Verifying the result Since we are using IR framework, we can leverage existing[ @Check](https://github.com/openjdk/jdk/blob/521effe017b9b6322036f1851220056a637d6b1c/test/hotspot/jtreg/compiler/lib/ir_framework/Check.java#L32) annotation for verification which works in conjunction with @Test method, it will automatically invoke validation after test method execution. We may need little refactoring for this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1800662857 PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1800673072 From thartmann at openjdk.org Tue Oct 15 09:17:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 09:17:50 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann wrote: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > In short: > - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. > - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. > - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. > > Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. > I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remo... Thanks for the discussions! I updated the PR to extend the scope of the `Patching_lock`. I also had to decrease the iterations in the test due to timeouts with debug on slow machines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2413338038 From thartmann at openjdk.org Tue Oct 15 09:17:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 09:17:50 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v2] In-Reply-To: References: Message-ID: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > In short: > - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. > - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. > - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. > > Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. > I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remo... Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Extend the Patching_lock instead - Merge branch 'master' into 8340313 - Extending patching lock - Increased timeout - Removed platform specific asserts from shared code - 8340313: Crash due to invalid oop in nmethod after C1 patching ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21389/files - new: https://git.openjdk.org/jdk/pull/21389/files/050e2c8f..ec5d105b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21389&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21389&range=00-01 Stats: 16085 lines in 245 files changed: 13548 ins; 1161 del; 1376 mod Patch: https://git.openjdk.org/jdk/pull/21389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21389/head:pull/21389 PR: https://git.openjdk.org/jdk/pull/21389 From epeter at openjdk.org Tue Oct 15 09:38:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 09:38:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sun, 13 Oct 2024 11:18:01 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating tests to use floorMod I gave it a quick scan, and I have no further comments. LGTM. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2368730929 From epeter at openjdk.org Tue Oct 15 10:22:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 10:22:21 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v24] In-Reply-To: References: Message-ID: <704HcEgAdeR1380vEK4z0bG0KiJ1kjRVSBCa9EFrt0w=.bee85693-033c-4d85-9f89-3e186ca3c2fb@github.com> On Sun, 13 Oct 2024 09:57:00 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update adlc changes. Are there any IR rules that verify that the correct C2 nodes are used? Is that a thing you generally do with the VectorAPI, just to make sure things get correctly intrinsified? src/hotspot/share/opto/vectornode.hpp line 161: > 159: // Needed for proper cloning. > 160: virtual uint size_of() const { return sizeof(*this); } > 161: bool is_unsigned() { return _is_unsigned; } Can you put this in the `print_spec`, so the IR dump shows if it is unsigned? ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2368845862 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1800870852 From ihse at openjdk.org Tue Oct 15 11:07:16 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 15 Oct 2024 11:07:16 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 14:57:46 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. > This pr is based on https://github.com/openjdk/jdk/pull/20781. > > Thanks! > > ## Test > ### tests: > * test/jdk/jdk/incubator/vector/ > * test/hotspot/jtreg/compiler/vectorapi/ > > ### options: > * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:+EnableVectorSupport -XX:-UseVectorStubs > > ## Performance > > ### Tests > jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). > > ### Options > * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' > * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' > > ### Performance data > I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. make/autoconf/flags-cflags.m4 line 920: > 918: # ACLE and this flag are required to build the aarch64 SVE related functions in > 919: # libvectormath. > 920: if test "x${OPENJDK_TARGET_CPU}" = "xaarch64"; then Suggestion: if test "x$OPENJDK_TARGET_CPU" = "xaarch64"; then ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21502#discussion_r1800940513 From eosterlund at openjdk.org Tue Oct 15 11:50:12 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 15 Oct 2024 11:50:12 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v2] In-Reply-To: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> References: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> Message-ID: On Tue, 15 Oct 2024 09:17:50 GMT, Tobias Hartmann wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> In short: >> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. >> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. >> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. >> >> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. >> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) t... > > Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Extend the Patching_lock instead > - Merge branch 'master' into 8340313 > - Extending patching lock > - Increased timeout > - Removed platform specific asserts from shared code > - 8340313: Crash due to invalid oop in nmethod after C1 patching I'm fine with the fix. I can't help though but to reflect on the ever diminishing role of the Patching_lock. It used to be used quite a lot, but has had its lunch eaten by the CompiledMethod_lock, CompiledIC_lock and CodeCache_lock over time. Today, the Patching_lock is used in exactly one place: in this exact C1 patching that we are looking at now. And now we found that holding that lock wasn't enough because we need the CodeCache_lock as well. We could instead extend the CodeCache_lock critical section a bit, and then there is no need for the Patching_lock at all. Is it time for this lock to retire? It's had a good run. Thoughts? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2413691556 From thartmann at openjdk.org Tue Oct 15 11:58:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 11:58:10 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v2] In-Reply-To: References: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> Message-ID: On Tue, 15 Oct 2024 11:47:22 GMT, Erik ?sterlund wrote: > We could instead extend the CodeCache_lock critical section a bit, and then there is no need for the Patching_lock at all. Is it time for this lock to retire? +1 to retiring the `Patching_lock` and using the `CodeCache_lock` instead. Let's see what others think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2413709220 From mli at openjdk.org Tue Oct 15 12:16:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 15 Oct 2024 12:16:28 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. > This pr is based on https://github.com/openjdk/jdk/pull/20781. > > Thanks! > > ## Test > ### tests: > * test/jdk/jdk/incubator/vector/ > * test/hotspot/jtreg/compiler/vectorapi/ > > ### options: > * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:+EnableVectorSupport -XX:-UseVectorStubs > > ## Performance > > ### Tests > jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). > > ### Options > * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' > * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' > > ### Performance data > I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Update make/autoconf/flags-cflags.m4 Co-authored-by: Magnus Ihse Bursie ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21502/files - new: https://git.openjdk.org/jdk/pull/21502/files/9baa41d9..3aaf1c46 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21502&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21502&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21502.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21502/head:pull/21502 PR: https://git.openjdk.org/jdk/pull/21502 From mli at openjdk.org Tue Oct 15 12:16:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 15 Oct 2024 12:16:28 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 11:04:40 GMT, Magnus Ihse Bursie wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Update make/autoconf/flags-cflags.m4 >> >> Co-authored-by: Magnus Ihse Bursie > > make/autoconf/flags-cflags.m4 line 920: > >> 918: # ACLE and this flag are required to build the aarch64 SVE related functions in >> 919: # libvectormath. >> 920: if test "x${OPENJDK_TARGET_CPU}" = "xaarch64"; then > > Suggestion: > > if test "x$OPENJDK_TARGET_CPU" = "xaarch64"; then Thanks, Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21502#discussion_r1801048212 From thartmann at openjdk.org Tue Oct 15 12:33:11 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 12:33:11 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: On Fri, 11 Oct 2024 16:51:16 GMT, Chad Rakoczy wrote: > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways > > Confirmed that added test fails before patch and passes after The class file is from the original bug report, it should be converted to a jasm file. test/hotspot/jtreg/compiler/c1/Test8335662.java line 27: > 25: * @test > 26: * @bug 8335662 > 27: * @summary Execute main() method Please use a more descriptive summary of the test. test/hotspot/jtreg/compiler/c1/Test8335662.java line 35: > 33: import java.lang.reflect.Method; > 34: > 35: public class Test8335662 { We don't use bug numbers for test names (anymore). ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21473#pullrequestreview-2369186037 PR Review Comment: https://git.openjdk.org/jdk/pull/21473#discussion_r1801070515 PR Review Comment: https://git.openjdk.org/jdk/pull/21473#discussion_r1801073113 From thartmann at openjdk.org Tue Oct 15 12:49:15 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 12:49:15 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v22] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Wed, 9 Oct 2024 18:21:30 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > remove <= test cases, disable StressLongCountedLoop and PerMethodTrapLimit Thanks for the detailed investigation and feedback. The changes look good to me, I'll re-run testing and report back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2413818288 From thartmann at openjdk.org Tue Oct 15 13:03:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 13:03:10 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes In-Reply-To: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> References: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> Message-ID: <1hBYun5fdCgGojbidnasoaJ7r0qYYQOXu4pYaIOukqU=.26ecdf8f-e40e-4ae1-90ff-5ac52fc318c4@github.com> On Thu, 10 Oct 2024 09:04:21 GMT, Christian Hagedorn wrote: > This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). > > The patch includes the following changes: > - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: > - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). > - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. > - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) > - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). > - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) > - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. > > Thanks, > Christian Difficult to review but looks good to me overall. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21446#pullrequestreview-2369272515 From chagedorn at openjdk.org Tue Oct 15 13:13:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Oct 2024 13:13:11 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes In-Reply-To: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> References: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> Message-ID: On Thu, 10 Oct 2024 09:04:21 GMT, Christian Hagedorn wrote: > This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). > > The patch includes the following changes: > - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: > - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). > - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. > - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) > - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). > - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) > - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. > > Thanks, > Christian Thanks Tobias for your review! I agree, it ended up more on the complex side than originally anticipated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21446#issuecomment-2413875570 From ihse at openjdk.org Tue Oct 15 13:53:11 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 15 Oct 2024 13:53:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 12:16:28 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Update make/autoconf/flags-cflags.m4 > > Co-authored-by: Magnus Ihse Bursie Build changes look fine. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21502#pullrequestreview-2369422806 From enikitin at openjdk.org Tue Oct 15 14:11:23 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Tue, 15 Oct 2024 14:11:23 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v24] In-Reply-To: References: Message-ID: <49ybhAOwbGMbp4G0gdR9cj14L20sWrSLrSIrxKmzfsw=.7eceb93c-be34-428c-b531-a3ce592bcb9a@github.com> On Mon, 14 Oct 2024 12:33:44 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/lib/compile_framework/README.md > > Co-authored-by: Christian Hagedorn > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn test/hotspot/jtreg/compiler/lib/compile_framework/Compile.java line 65: > 63: List command = new ArrayList<>(); > 64: > 65: command.add("%s/bin/javac".formatted(System.getProperty("compile.jdk"))); 1. Use ```jdk.test.lib.JDKToolFinder.getJDKTool("javac");``` ? 2. Store in a static variable once during initialization? To not request properties / call format string parsing every time? test/hotspot/jtreg/compiler/lib/compile_framework/Compile.java line 101: > 99: List command = new ArrayList<>(); > 100: > 101: command.add("%s/bin/java".formatted(System.getProperty("compile.jdk"))); 1. Use ```jdk.test.lib.JDKToolFinder.getJDKTool("java");``` ? 2. Store in a static variable once during initialization? To not request properties / call format string parsing every time? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1801209700 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1801210692 From psandoz at openjdk.org Tue Oct 15 16:06:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 16:06:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 15 Oct 2024 09:35:23 GMT, Emanuel Peter wrote: > I gave it a quick scan, and I have no further comments. LGTM. Thank you, i will kick off an internal test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2414431367 From epeter at openjdk.org Tue Oct 15 16:09:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 16:09:15 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 06:42:51 GMT, Amit Kumar wrote: >> This is trivial PR to change data type of some "*Threshold" variables form `intx` to `double`. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > updates tier2 threshold datatype Instead of changing the `product` flags (is a CSR needed for that?), you could also just cast to `double` at every use site. Would that also work? src/hotspot/share/opto/bytecodeInfo.cpp line 316: > 314: int call_site_count = caller_method->scale_count(profile.count()); > 315: int invoke_count = caller_method->interpreter_invocation_count(); > 316: assert(invoke_count >= 0, "require invocation count greater than zero"); Technically, the comment is now wrong. It is no longer "greater than" but "greater than or equal to zero". Is that intended? Otherwise you should use `>`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2414437954 PR Review Comment: https://git.openjdk.org/jdk/pull/21354#discussion_r1801504146 From epeter at openjdk.org Tue Oct 15 16:32:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 16:32:15 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Sun, 6 Oct 2024 08:32:20 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > [vectorapi] Refactor VectorShuffle implementation src/hotspot/cpu/x86/x86.ad line 2172: > 2170: > 2171: // Return true if Vector::rearrange needs preparation of the shuffle argument > 2172: bool Matcher::vector_needs_load_shuffle(BasicType elem_bt, int vlen) { I think the name needs to be more expressive. If I read it alone, then I would think that it is about all kinds of vectors ... and it is confusing because what is a "load shuffle"? Are we shuffling loads or loading shuffles? src/hotspot/share/opto/vectornode.hpp line 1618: > 1616: public: > 1617: VectorLoadShuffleNode(Node* in, const TypeVect* vt) > 1618: : VectorNode(in, vt) {} Can you add a comment above "class VectorLoadShuffleNode" to say what its semantics are? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1801531980 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1801536233 From qamai at openjdk.org Tue Oct 15 16:33:20 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 15 Oct 2024 16:33:20 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Personally, I think this optimization is not essential, so we should proceed with introducing lowering first, then add this transformation to that phase, instead of trying to integrate this transformation then refactor it into phase lowering, which seems like a net extra step. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414491182 From psandoz at openjdk.org Tue Oct 15 16:42:18 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 16:42:18 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v23] In-Reply-To: References: <0LkBvvNPq5jmWfOdjItIXGedRDtpiivJM06BAx7vP0I=.c5417544-edef-4623-beaa-08cd7c565361@github.com> Message-ID: On Thu, 10 Oct 2024 16:24:35 GMT, Jatin Bhateja wrote: > Hi @vnkozlov , Can you kindly run this through your test infrastructure. We have two review approvals for Java and x86 backend code. I have kicked off some internal tests (FYI @vnkozlov) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2414510216 From jkarthikeyan at openjdk.org Tue Oct 15 17:03:15 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 15 Oct 2024 17:03:15 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414553899 From epeter at openjdk.org Tue Oct 15 17:05:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 17:05:40 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v25] In-Reply-To: References: Message-ID: <6wHlTOx8Nc4wPZg0L7XSvTjWX_WTET45i_YsvV2ByKY=.ce572319-e1ca-47e8-9be5-95c823fb32e4@github.com> > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: use JDKToolFinder for Evgeny ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/4eeab363..d50b6e1a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=23-24 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Tue Oct 15 17:10:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 17:10:19 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v25] In-Reply-To: <6wHlTOx8Nc4wPZg0L7XSvTjWX_WTET45i_YsvV2ByKY=.ce572319-e1ca-47e8-9be5-95c823fb32e4@github.com> References: <6wHlTOx8Nc4wPZg0L7XSvTjWX_WTET45i_YsvV2ByKY=.ce572319-e1ca-47e8-9be5-95c823fb32e4@github.com> Message-ID: <7Qlg20-QukNORu89brmvzlj6IyyOIf8taAfUHQF5Ve4=.33d85c10-749a-4226-83ef-e0ee35d79a60@github.com> On Tue, 15 Oct 2024 17:05:40 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use JDKToolFinder for Evgeny @lepestock thanks for the hint! I applied your suggestion :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20184#issuecomment-2414567493 From qamai at openjdk.org Tue Oct 15 17:29:12 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 15 Oct 2024 17:29:12 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Tue, 15 Oct 2024 17:00:26 GMT, Jasmine Karthikeyan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 >> - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction > > I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? > > About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) @jaskarth Please proceed with it, I have a really simple prototype for it but I don't have any plan to proceed further soon. Thanks a lot :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414605470 From kvn at openjdk.org Tue Oct 15 17:36:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Oct 2024 17:36:12 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: <02XI0hQTUSx-TDvEN78_ZYXqES3q9hXXLQ8gqJINUNs=.2220892b-aa87-4247-a749-253288a33996@github.com> On Mon, 14 Oct 2024 13:42:45 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> refine comments + typo > > Thanks for the source code. That's really interesting, running the benchmark multiple times may give different results, and even when there is a difference in the observed throughputs, the 2 compiled methods are exactly the same. So I think we are running into different quirks here, probably due to the fact that this benchmark saturates the memory bandwidth. @merykitty can you run this with regular Java benchmarks (SPECjvm, SPECjbb, Renaissance, DaCapo) to see if they are affected? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2414619344 From kvn at openjdk.org Tue Oct 15 17:55:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Oct 2024 17:55:14 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:42:45 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> refine comments + typo > > Thanks for the source code. That's really interesting, running the benchmark multiple times may give different results, and even when there is a difference in the observed throughputs, the 2 compiled methods are exactly the same. So I think we are running into different quirks here, probably due to the fact that this benchmark saturates the memory bandwidth. > @merykitty can you run this with regular Java benchmarks (SPECjvm, SPECjbb, Renaissance, DaCapo) to see if they are affected? We will also run our set of benchmarks to make sure there is no regression. If we see significant regression only in some benchmarks and improvement in others we can set `LoopAwareSpilling` to false in these changes and address regression in following PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2414657865 From duke at openjdk.org Tue Oct 15 18:53:45 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 15 Oct 2024 18:53:45 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn [v2] In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways > > Confirmed that added test fails before patch and passes after Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: - Add blank line at end of test - Add jasm and update test description ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21473/files - new: https://git.openjdk.org/jdk/pull/21473/files/a6d2f814..51298397 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21473&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21473&range=00-01 Stats: 278 lines in 4 files changed: 235 ins; 43 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21473.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21473/head:pull/21473 PR: https://git.openjdk.org/jdk/pull/21473 From aph at openjdk.org Tue Oct 15 19:35:18 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Oct 2024 19:35:18 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn [v2] In-Reply-To: References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: On Tue, 15 Oct 2024 18:53:45 GMT, Chad Rakoczy wrote: >> [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) >> >> Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. >> >> I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways >> >> Confirmed that added test fails before patch and passes after > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Add blank line at end of test > - Add jasm and update test description One thing for you to think about ifm you are interested in sone further work in this area.. This is a generic problem. It might be very beneficial to look for every base + immediate offset instruction, see if there is a possibility that there may be an overflow, and insert a `form_address()`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21473#issuecomment-2414846924 From kvn at openjdk.org Tue Oct 15 19:39:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Oct 2024 19:39:11 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes In-Reply-To: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> References: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> Message-ID: On Thu, 10 Oct 2024 09:04:21 GMT, Christian Hagedorn wrote: > This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). > > The patch includes the following changes: > - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: > - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). > - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. > - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) > - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). > - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) > - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. > > Thanks, > Christian Good refactoring. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21446#pullrequestreview-2370392250 From psandoz at openjdk.org Tue Oct 15 19:43:24 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 19:43:24 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v24] In-Reply-To: References: Message-ID: On Sun, 13 Oct 2024 09:57:00 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update adlc changes. The compiler test `test/hotspot/jtreg/compiler/vectorapi/VectorCompareWithZeroTest.java` fails to compile and needs to update to use the renamed constants (`UNSIGNED_GT` -> `UGT` and `UNSIGNED_GE` -> `UGE`). This test is only compiled and run on aarch64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2414859793 From kvn at openjdk.org Tue Oct 15 19:55:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Oct 2024 19:55:13 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 16:06:04 GMT, Emanuel Peter wrote: > Instead of changing the `product` flags (is a CSR needed for that?), you could also just cast to `double` at every use site. Would that also work? Yes, we need CSR for these changes if we do as they are now. Have cast or assign to local variable is preferable, I agree. > src/hotspot/share/opto/bytecodeInfo.cpp line 316: > >> 314: int call_site_count = caller_method->scale_count(profile.count()); >> 315: int invoke_count = caller_method->interpreter_invocation_count(); >> 316: assert(invoke_count >= 0, "require invocation count greater than zero"); > > Technically, the comment is now wrong. It is no longer "greater than" but "greater than or equal to zero". Is that intended? Otherwise you should use `>`. Actually it should be `>` because we divide by it in next line. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2414880608 PR Review Comment: https://git.openjdk.org/jdk/pull/21354#discussion_r1801823705 From chagedorn at openjdk.org Tue Oct 15 20:47:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Oct 2024 20:47:12 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes In-Reply-To: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> References: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> Message-ID: <0R2dApC5dSwIjJozNvYCW46NIzArP6ZGLKVrr5Zn4XI=.edcf29b9-0693-47c3-af89-07e9b6d364aa@github.com> On Thu, 10 Oct 2024 09:04:21 GMT, Christian Hagedorn wrote: > This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). > > The patch includes the following changes: > - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: > - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). > - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. > - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) > - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). > - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) > - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21446#issuecomment-2415048929 From psandoz at openjdk.org Tue Oct 15 21:00:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 21:00:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 15 Oct 2024 16:03:13 GMT, Paul Sandoz wrote: > > I gave it a quick scan, and I have no further comments. LGTM. > > Thank you, i will kick off an internal test. Tier 1 to 3 tests pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2415121395 From psandoz at openjdk.org Tue Oct 15 21:00:25 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 21:00:25 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v23] In-Reply-To: References: <0LkBvvNPq5jmWfOdjItIXGedRDtpiivJM06BAx7vP0I=.c5417544-edef-4623-beaa-08cd7c565361@github.com> Message-ID: On Tue, 15 Oct 2024 16:39:57 GMT, Paul Sandoz wrote: > > Hi @vnkozlov , Can you kindly run this through your test infrastructure. We have two review approvals for Java and x86 backend code. > > I have kicked off some internal tests (FYI @vnkozlov) Tier 1 to 3 test past, except for the trivial source compilation error previously mentioned. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2415124207 From psandoz at openjdk.org Tue Oct 15 21:40:20 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 21:40:20 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v24] In-Reply-To: <704HcEgAdeR1380vEK4z0bG0KiJ1kjRVSBCa9EFrt0w=.bee85693-033c-4d85-9f89-3e186ca3c2fb@github.com> References: <704HcEgAdeR1380vEK4z0bG0KiJ1kjRVSBCa9EFrt0w=.bee85693-033c-4d85-9f89-3e186ca3c2fb@github.com> Message-ID: On Tue, 15 Oct 2024 10:19:46 GMT, Emanuel Peter wrote: > Are there any IR rules that verify that the correct C2 nodes are used? Is that a thing you generally do with the VectorAPI, just to make sure things get correctly intrinsified? Not systematically. We have some IR testing for more complex areas, located under `test/hotspot/jtreg/compiler/vectorapi/`. When we started out testing there was no IR testing framework so we relied on classic unit tests running a test N times for C2 to kick in. That is still the case for the majority of tests. It would be nice to have a better balance, and a way to systematically generate IR tests for the various vector operations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2415212261 From dlong at openjdk.org Wed Oct 16 01:33:30 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Oct 2024 01:33:30 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v6] In-Reply-To: References: Message-ID: <-JXW7rxwUFheUwXdmlnVo_MhlJDct8NlANLLBE4Triw=.1227da53-0ff8-4ab3-ae6a-33f1ed904755@github.com> > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request incrementally with one additional commit since the last revision: bail out on old methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21148/files - new: https://git.openjdk.org/jdk/pull/21148/files/2c7fc099..701373f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=04-05 Stats: 24 lines in 7 files changed: 15 ins; 2 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From sviswanathan at openjdk.org Wed Oct 16 01:39:50 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Oct 2024 01:39:50 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v3] In-Reply-To: References: Message-ID: > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Run test on all platforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21480/files - new: https://git.openjdk.org/jdk/pull/21480/files/ed299327..f2981374 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=01-02 Stats: 11 lines in 2 files changed: 5 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21480/head:pull/21480 PR: https://git.openjdk.org/jdk/pull/21480 From sviswanathan at openjdk.org Wed Oct 16 01:39:50 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Oct 2024 01:39:50 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> Message-ID: On Tue, 15 Oct 2024 07:02:00 GMT, Emanuel Peter wrote: > Thanks for the updates! It looks good to me now. > > I have one more wish: Could you allow to run the test on all platforms please? `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java` > > Currently, it only runs on selected platforms, see `@requires`. We should really run this on all platforms and compilers. The IR rules can be limited to platforms and features. That ensures that we do not have similar bugs elsewhere, and our test-coverage would be increased. @eme64 I have attempted to update the test accordingly. Please take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21480#issuecomment-2415546350 From sviswanathan at openjdk.org Wed Oct 16 01:39:51 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Oct 2024 01:39:51 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> Message-ID: <-5pIV32xfg2DMexItjaDQWkkTK4FVIfbB7G73LKFoxA=.1a7c8ada-6641-486f-ba83-6b9f5b5eb7ec@github.com> On Tue, 15 Oct 2024 08:08:59 GMT, Jatin Bhateja wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test case > > src/hotspot/cpu/x86/x86.ad line 3679: > >> 3677: >> 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{ >> 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); > > You can add an eligant prediction check like following instead of accesing bare inputs. > > n->as_StoreVector()->memory_size() >= 16. We have used bare inputs at many places in the ad file in the predicate. > test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java line 110: > >> 108: } >> 109: >> 110: // Verifying the result > > Since we are using IR framework, we can leverage existing[ @Check](https://github.com/openjdk/jdk/blob/521effe017b9b6322036f1851220056a637d6b1c/test/hotspot/jtreg/compiler/lib/ir_framework/Check.java#L32) annotation for verification which works in conjunction with @Test method, it will automatically invoke validation after test method execution. We may need little refactoring for this. The added test follows the verification mechanism used already in the test. I would prefer not to get into refactoring. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1802214527 PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1802214007 From dlong at openjdk.org Wed Oct 16 01:44:42 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Oct 2024 01:44:42 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v7] In-Reply-To: References: Message-ID: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - remove blank line - Merge master - bail out on old methods - redo VM state - fix errors - make sure to be in VM state when checking is_old - simplification based on reviewer comments - rename and restrict usage ------------- Changes: https://git.openjdk.org/jdk/pull/21148/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=06 Stats: 62 lines in 9 files changed: 34 ins; 20 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From dlong at openjdk.org Wed Oct 16 01:49:16 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Oct 2024 01:49:16 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v7] In-Reply-To: References: Message-ID: <8dgg6gjL38JLjdMSorUpsW6c8YEsaNmFV3vnAh1zqyQ=.4006ed15-8ebe-4d83-a568-cf33b7d8a23b@github.com> On Wed, 16 Oct 2024 01:44:42 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - remove blank line > - Merge master > - bail out on old methods > - redo VM state > - fix errors > - make sure to be in VM state when checking is_old > - simplification based on reviewer comments > - rename and restrict usage Added bailouts. Because we record failure in CI layer, C1 and C2 need to check for failure there. C2 already did that, but C1 did not. For C1 I decided to delegate all failure recording to the CI layer. This is a step towards implementing JDK-8132354, and avoids string copying issues like those in JDK-8325095. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2415558355 From jbhateja at openjdk.org Wed Oct 16 01:57:20 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 01:57:20 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: <-5pIV32xfg2DMexItjaDQWkkTK4FVIfbB7G73LKFoxA=.1a7c8ada-6641-486f-ba83-6b9f5b5eb7ec@github.com> References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> <-5pIV32xfg2DMexItjaDQWkkTK4FVIfbB7G73LKFoxA=.1a7c8ada-6641-486f-ba83-6b9f5b5eb7ec@github.com> Message-ID: On Wed, 16 Oct 2024 01:36:49 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/x86.ad line 3679: >> >>> 3677: >>> 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{ >>> 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); >> >> You can add an eligant prediction check like following instead of accesing bare inputs. >> >> n->as_StoreVector()->memory_size() >= 16. > > We have used bare inputs at many places in the ad file in the predicate. I think its ok to use safe cast if its available atleast for newly added code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1802223433 From jbhateja at openjdk.org Wed Oct 16 01:59:18 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 01:59:18 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 15 Oct 2024 20:57:05 GMT, Paul Sandoz wrote: >>> I gave it a quick scan, and I have no further comments. LGTM. >> >> Thank you, i will kick off an internal test. > >> > I gave it a quick scan, and I have no further comments. LGTM. >> >> Thank you, i will kick off an internal test. > > Tier 1 to 3 tests pass. Thanks @PaulSandoz , @sviswa7 and @eme64 for review suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2415566209 From dlong at openjdk.org Wed Oct 16 02:13:13 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Oct 2024 02:13:13 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v2] In-Reply-To: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> References: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> Message-ID: On Tue, 15 Oct 2024 09:17:50 GMT, Tobias Hartmann wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> In short: >> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. >> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. >> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. >> >> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. >> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) t... > > Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Extend the Patching_lock instead > - Merge branch 'master' into 8340313 > - Extending patching lock > - Increased timeout > - Removed platform specific asserts from shared code > - 8340313: Crash due to invalid oop in nmethod after C1 patching Yes, please retire Patching_lock. It doesn't matter to me if it's part of this PR or a follow-up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2415581949 From jbhateja at openjdk.org Wed Oct 16 02:25:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 02:25:11 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: <-5pIV32xfg2DMexItjaDQWkkTK4FVIfbB7G73LKFoxA=.1a7c8ada-6641-486f-ba83-6b9f5b5eb7ec@github.com> References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> <-5pIV32xfg2DMexItjaDQWkkTK4FVIfbB7G73LKFoxA=.1a7c8ada-6641-486f-ba83-6b9f5b5eb7ec@github.com> Message-ID: On Wed, 16 Oct 2024 01:35:48 GMT, Sandhya Viswanathan wrote: >> test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java line 110: >> >>> 108: } >>> 109: >>> 110: // Verifying the result >> >> Since we are using IR framework, we can leverage existing[ @Check](https://github.com/openjdk/jdk/blob/521effe017b9b6322036f1851220056a637d6b1c/test/hotspot/jtreg/compiler/lib/ir_framework/Check.java#L32) annotation for verification which works in conjunction with @Test method, it will automatically invoke validation after test method execution. We may need little refactoring for this. > > The added test follows the verification mechanism used already in the test. I would prefer not to get into refactoring. Okay, we can do it separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1802242680 From thartmann at openjdk.org Wed Oct 16 05:15:14 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Oct 2024 05:15:14 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v21] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <1R6tWNQF3WSJ4joPaoWR3N_JKy0W-BB0Zntg00N-mlU=.e8b25703-93ea-42be-ba1d-46e06c17987c@github.com> Message-ID: On Wed, 9 Oct 2024 18:18:21 GMT, Kangcheng Xu wrote: >> `compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java` times out in our testing both with `-XX:StressLongCountedLoop=200000000` and with `-XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0`: >> >> >> "main" #1 [2771172] prio=5 os_prio=0 cpu=500187.70ms elapsed=503.08s allocated=6554K defined_classes=227 tid=0x0000ffff9002d550 nid=2771172 runnable [0x0000ffff972bf000] >> java.lang.Thread.State: RUNNABLE >> Thread: 0x0000ffff9002d550 [0x2a48e4] State: _at_safepoint _at_poll_safepoint 1 >> JavaThread state: _thread_blocked >> at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:93) >> at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.runTestIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:103) >> at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base at 24-internal/DirectMethodHandle$Holder) >> at java.lang.invoke.LambdaForm$MH/0x0000ffff58460870.invoke(java.base at 24-internal/LambdaForm$MH) >> at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base at 24-internal/Invokers$Holder) >> at jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(java.base at 24-internal/DirectMethodHandleAccessor.java:154) >> at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(java.base at 24-internal/DirectMethodHandleAccessor.java:104) >> at java.lang.reflect.Method.invoke(java.base at 24-internal/Method.java:573) >> at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159) >> at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:98) >> at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:89) >> at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861) >> at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252) >> at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165) > > @TobiHartmann. Thanks for the feedback! I did some investigation, reasons for timeouts comes three folds: > > 1. Tests with `i <= stop` is not a counted loop in the first place and should be removed: > > Now I remember why I originally didn't test for it. Consider `for (int i = 0; i <= stop; i++);` when `stop = Integer.MAX_VALUE`. Overflow in Java is well-defined, which means the code must loop indefinitely and optimizations of any kind can't break this. Therefore, `<=` are not counted loops to begin with. `@IR(failOn = {IRNode.COUNTED_LOOP})` doesn't fail either. I removed these test cases. > > 2. It is normal to timeout with `-XX:StressLongCountedLoop=200000000` for all test cases: > > An value other than `0` for this flag will forcefully convert int counted loops to long counted loops, which C2 doesn't do parallel IV at this point. This is same issue as [JDK-8294839](https://bugs.openjdk.org/browse/JDK-8294838). Loops are still loops. For a large random `stop` value, this will take a long time to loop through. > > 3. It is normal to timeout with `-XX:PerMethodTrapLimit=0` for test cases with stride other than `1`: > > Take `for (int i = 0; i < stop; i += 2)` for an example. Since there is a chance for increment to `i` go beyond `stop` (and eventually overflows), there must be some sort of runtime check for `stop`. Normally, a `loop_limit_check` trap is compiled to take the slow path (deoptimization). However, the zero trap limit forces C2 to loop and check `i < stop` on every iteration. For a large random `stop` value, this will take a long time. > > For the latter two reasons, I added `runWithFlags()` to essentially disable the flags in questions. > > https://github.com/openjdk/jdk/blob/845e34cc7a82ef5cb69620a12f487adaca9d2613/test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java#L47-L51 @tabjy We are still seeing timeouts with `-XX:+UnlockDiagnosticVMOptions -XX:TieredStopAtLevel=3 -XX:+StressLoopInvariantCodeMotion -XX:+StressRangeCheckElimination -XX:+StressLinearScan`. Maybe the test should be enabled only if C2 is available. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2415748974 From jbhateja at openjdk.org Wed Oct 16 05:36:19 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 05:36:19 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v24] In-Reply-To: <704HcEgAdeR1380vEK4z0bG0KiJ1kjRVSBCa9EFrt0w=.bee85693-033c-4d85-9f89-3e186ca3c2fb@github.com> References: <704HcEgAdeR1380vEK4z0bG0KiJ1kjRVSBCa9EFrt0w=.bee85693-033c-4d85-9f89-3e186ca3c2fb@github.com> Message-ID: On Tue, 15 Oct 2024 10:17:40 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update adlc changes. > > src/hotspot/share/opto/vectornode.hpp line 161: > >> 159: // Needed for proper cloning. >> 160: virtual uint size_of() const { return sizeof(*this); } >> 161: bool is_unsigned() { return _is_unsigned; } > > Can you put this in the `print_spec`, so the IR dump shows if it is unsigned? Hi @eme64 , I see print_spec is mainly used for dumping information about VTransformVectorNode, please note newly added situating operations are not being auto-vectorized and are are emitted through VectorAPI based explicit vectorization flow. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1802382779 From thartmann at openjdk.org Wed Oct 16 06:26:47 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Oct 2024 06:26:47 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v3] In-Reply-To: References: Message-ID: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > In short: > - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. > - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. > - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. > > Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. > I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remo... Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Remove Patching_lock ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21389/files - new: https://git.openjdk.org/jdk/pull/21389/files/ec5d105b..53135b83 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21389&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21389&range=01-02 Stats: 27 lines in 9 files changed: 0 ins; 9 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/21389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21389/head:pull/21389 PR: https://git.openjdk.org/jdk/pull/21389 From thartmann at openjdk.org Wed Oct 16 06:26:48 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Oct 2024 06:26:48 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v2] In-Reply-To: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> References: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> Message-ID: On Tue, 15 Oct 2024 09:17:50 GMT, Tobias Hartmann wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> In short: >> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. >> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. >> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. >> >> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. >> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) t... > > Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Extend the Patching_lock instead > - Merge branch 'master' into 8340313 > - Extending patching lock > - Increased timeout > - Removed platform specific asserts from shared code > - 8340313: Crash due to invalid oop in nmethod after C1 patching Sounds good. I pushed a new change that completely removes the `Patching_lock` and uses the `CodeCache_lock` instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2415833999 From epeter at openjdk.org Wed Oct 16 07:06:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 16 Oct 2024 07:06:23 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v21] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 15:05:12 GMT, Tobias Holenstein wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > Nice framework! Looks good to me so far. > Could you add an Example how to use the framework with VM flags? @tobiasholenstein @chhagedorn @lepestock can I get another approval so I can integrate, please? ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20184#issuecomment-2415904623 From amitkumar at openjdk.org Wed Oct 16 07:17:13 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 16 Oct 2024 07:17:13 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 19:52:51 GMT, Vladimir Kozlov wrote: > you could also just cast to double at every use site. Would that also work? is that required ? Aren't integers, by default, will be treated as double if they are multiplied by a double data type value ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2415922636 From chagedorn at openjdk.org Wed Oct 16 07:22:16 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 16 Oct 2024 07:22:16 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v25] In-Reply-To: <6wHlTOx8Nc4wPZg0L7XSvTjWX_WTET45i_YsvV2ByKY=.ce572319-e1ca-47e8-9be5-95c823fb32e4@github.com> References: <6wHlTOx8Nc4wPZg0L7XSvTjWX_WTET45i_YsvV2ByKY=.ce572319-e1ca-47e8-9be5-95c823fb32e4@github.com> Message-ID: On Tue, 15 Oct 2024 17:05:40 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use JDKToolFinder for Evgeny Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20184#pullrequestreview-2371420798 From epeter at openjdk.org Wed Oct 16 07:25:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 16 Oct 2024 07:25:25 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v25] In-Reply-To: References: <6wHlTOx8Nc4wPZg0L7XSvTjWX_WTET45i_YsvV2ByKY=.ce572319-e1ca-47e8-9be5-95c823fb32e4@github.com> Message-ID: On Wed, 16 Oct 2024 07:19:13 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use JDKToolFinder for Evgeny > > Still good! Thanks @chhagedorn @tobiasholenstein @lepestock for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20184#issuecomment-2415937500 From epeter at openjdk.org Wed Oct 16 07:25:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 16 Oct 2024 07:25:25 GMT Subject: Integrated: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing In-Reply-To: References: Message-ID: On Mon, 15 Jul 2024 15:56:10 GMT, Emanuel Peter wrote: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... This pull request has now been integrated. Changeset: b9b0bd08 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/b9b0bd0871886eb65f87864f262424b119f2c748 Stats: 1604 lines in 18 files changed: 1604 ins; 0 del; 0 mod 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing Reviewed-by: chagedorn, tholenstein ------------- PR: https://git.openjdk.org/jdk/pull/20184 From aboldtch at openjdk.org Wed Oct 16 07:42:15 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 16 Oct 2024 07:42:15 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success In-Reply-To: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: On Mon, 14 Oct 2024 13:29:43 GMT, Richard Reingruber wrote: > This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: > > > // flag == EQ indicates success, decrement held monitor count > // flag == NE indicates failure > > > The fix passed our CI testing with LockingMode set to LM_LEGACY > Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. Seems like `compiler_fast_unlock_lightweight_object` could do the same. It does the same branch, but correctly, by going through a `crorc(CCR0, Assembler::equal, CCR0, Assembler::equal)` in the success path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21496#issuecomment-2415968958 From thartmann at openjdk.org Wed Oct 16 07:48:35 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Oct 2024 07:48:35 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v4] In-Reply-To: References: Message-ID: <0h3TEfNHFDT6g59BudHi_SJHLoxyhY3rDMirzHviNgY=.b85485e1-effd-4644-9a9b-e023ed6e4305@github.com> > After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 > > Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 > > -> > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 > > The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. > > Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: > https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 > > I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: More cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21470/files - new: https://git.openjdk.org/jdk/pull/21470/files/4a48a793..8cbfef69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=02-03 Stats: 22 lines in 2 files changed: 2 ins; 16 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21470/head:pull/21470 PR: https://git.openjdk.org/jdk/pull/21470 From duke at openjdk.org Wed Oct 16 07:53:30 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Wed, 16 Oct 2024 07:53:30 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v7] In-Reply-To: References: Message-ID: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request incrementally with two additional commits since the last revision: - Format - Review comments applied ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/5c933c06..3f85b901 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=05-06 Stats: 35 lines in 3 files changed: 14 ins; 12 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From duke at openjdk.org Wed Oct 16 07:53:31 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Wed, 16 Oct 2024 07:53:31 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 17:36:16 GMT, Axel Boldt-Christmas wrote: >> Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace LogDecorators::Decorator::NoDecorator with LogDecorators::None > > src/hotspot/share/logging/logDecorators.hpp line 96: > >> 94: >> 95: const LogSelection& selection() const { return _selection; } >> 96: }; > > I am uncomfortable with this type erasure. `LogTagType[LogTag::MaxTags + 1 /* = 6 */]` -> `LogTagType*` -> `LogTagType[LogTag::MaxTags /* = 5 */]`. I think this should be rewritten so that `tag_arr` is typed as a `LogTagType[5]`. I think everywhere we have a `const LogTagType parameter[LogTag::MaxTags]` really should have been `const LogTagType (¶meter)[LogTag::MaxTags]` so that this would have been a compile error. > > My suggestion is to either do the following: > Suggestion: > > public: > DefaultUndecoratedSelection(LogLevelType level, LogTagType t0, LogTagType t1 = LogTag::__NO_TAG, > LogTagType t2 = LogTag::__NO_TAG, LogTagType t3 = LogTag::__NO_TAG, > LogTagType t4 = LogTag::__NO_TAG, LogTagType guard_tag = LogTag::__NO_TAG) : _selection(LogSelection::Invalid) { > assert(guard_tag == LogTag::__NO_TAG, "Too many tags specified!"); > > LogTagType tag_arr[LogTag::MaxTags] = { t0, t1, t2, t3, t4 }; > > _selection = LogSelection(tag_arr, false, level); > } > > const LogSelection& selection() const { return _selection; } > }; > > > or maybe even better, do what we do for the `LogTagSet` and have a static helper and a private constructor, so that we can turn all the asserts into compile errors. > > Something like: > Suggestion: > > DefaultUndecoratedSelection(LogLevelType level, LogTagType t0, LogTagType t1, LogTagType t2, > LogTagType t3, LogTagType t4) : _selection(LogSelection::Invalid) { > LogTagType tag_arr[LogTag::MaxTags] = { t0, t1, t2, t3, t4 }; > _selection = LogSelection(tag_arr, false, level); > } > public: > > template LogTagType T3 = LogTag::__NO_TAG, LogTagType T4 = LogTag::__NO_TAG, > LogTagType GuardTag = LogTag::__NO_TAG> > static DefaultUndecoratedSelection make() { > STATIC_ASSERT(GuardTag == LogTag::__NO_TAG); > return DefaultUndecoratedSelection(Level, T0, T1, T2, T3, T4); > } > > const LogSelection& selection() const { return _selection; } > }; > > > And we can then use `LogDecorators::DefaultUndecoratedSelection::make()` to create them. Thanks for your comments! I think having some static selection maker that calls a private constructor is the neatest choice here. I'll go for that ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1802537723 From rrich at openjdk.org Wed Oct 16 07:58:09 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 16 Oct 2024 07:58:09 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success In-Reply-To: References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: On Wed, 16 Oct 2024 07:39:09 GMT, Axel Boldt-Christmas wrote: > Seems like `compiler_fast_unlock_lightweight_object` could do the same. > > It does the same branch, but correctly, by going through a `crorc(CCR0, Assembler::equal, CCR0, Assembler::equal)` in the success path. I think you're right. I wanted to see if [`set_eq_unlocked`](https://github.com/openjdk/jdk/blob/dcac4b0a532f2ca6cb374da7ece331e8266ab351/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp#L3087) could be eliminated and then forgot about it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21496#issuecomment-2416001910 From thartmann at openjdk.org Wed Oct 16 08:03:12 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Oct 2024 08:03:12 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v4] In-Reply-To: <0h3TEfNHFDT6g59BudHi_SJHLoxyhY3rDMirzHviNgY=.b85485e1-effd-4644-9a9b-e023ed6e4305@github.com> References: <0h3TEfNHFDT6g59BudHi_SJHLoxyhY3rDMirzHviNgY=.b85485e1-effd-4644-9a9b-e023ed6e4305@github.com> Message-ID: On Wed, 16 Oct 2024 07:48:35 GMT, Tobias Hartmann wrote: >> After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 >> >> Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 >> >> -> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 >> >> The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. >> >> Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: >> https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 >> >> I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > More cleanups I cleaned this up a bit more. The `iter().is_in_error()` branch in parsing is already dead without my changes because type flow will always trap. With my changes, the `!constant.is_loaded()` branch is now dead as well. I removed both and added asserts. Given that even without my changes, we would always trap for an unresolved constant during parsing, I don't think continuing type flow analysis does make sense. In theory, it could even have a negative impact. Here's an example: Class obj = ...; if (b) { obj = LoadedClass.class; } else { // We always trap here during parsing obj = UnloadedClass.class; } If we don't trap in the else branch during type flow analysis, the type of `obj` will be set to unloaded after the if. This is suboptimal because we always trap in the else branch during parsing, so the type can never be unloaded. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21470#issuecomment-2416013432 From aboldtch at openjdk.org Wed Oct 16 08:23:12 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 16 Oct 2024 08:23:12 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v7] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 07:53:30 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with two additional commits since the last revision: > > - Format > - Review comments applied lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2371586841 From dqu at openjdk.org Wed Oct 16 08:41:30 2024 From: dqu at openjdk.org (Daohan Qu) Date: Wed, 16 Oct 2024 08:41:30 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v5] In-Reply-To: References: Message-ID: <1YFs6guITV4h9BzVV7NSV-TG1xOI3ONLoCwwRP7i6Qc=.7f5e6b47-a04c-4c44-b480-7cca21ba68f4@github.com> > # Description > > [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. > > But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. > > This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". > > Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with > > > make test \ > TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ > MICRO="FORK=1;WARMUP_ITER=2" \ > TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" > > > shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) > > |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | > |---|---|---| > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.452 ?(99.9%) 0.185 ops/s | 62.060 ?(99.9%) 0.878 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.922 ?(99.9%) 1.710 ops/s | 67.961 ?(99.9%) 0.850 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 28.382 ?(99.9%) 1.021 ops/s | 67.998 ?(99.9%) 0.751 ops/s | > > # Reproduction > > Compiled and run the reduced test case `Test.java` in the appendix below using > > > java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test > > > and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and see some output like > > > > I've removed the assertion that the unlocking thread owns the monitor again because it won't work with vthread monitor support in the loom repository. > > The fix passed our CI testing with `LockingMode` set to `LM_LEGACY` > Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with `LM_LEGACY`. LGTM. I think it should get integrated after https://github.com/openjdk/jdk/pull/21496. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21494#pullrequestreview-2371675657 From dqu at openjdk.org Wed Oct 16 09:02:12 2024 From: dqu at openjdk.org (Daohan Qu) Date: Wed, 16 Oct 2024 09:02:12 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v4] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 07:29:22 GMT, Christian Hagedorn wrote: >> Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add jtreg requirements and fix some format issues > > No worries! Take your time, there is no rush :-) Thanks for letting me know. Hi, @chhagedorn, I just update with a minimal fix, which: 1. Doesn't affect the execution that didn't cause infinite split before this patch 2. Avoids infinite splitting for the execution that triggered it before and allow them to be compiled by C2 This fix makes some of [code above](https://github.com/quadhier/jdk/blob/2b2c7d5b3e8193db706470725743003a5d25759c/src/hotspot/share/opto/memnode.cpp#L1683-L1710) dead. If this fix makes sense, I could do some cleanup. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21134#issuecomment-2416158202 From rrich at openjdk.org Wed Oct 16 09:19:48 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 16 Oct 2024 09:19:48 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success [v2] In-Reply-To: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: > This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: > > > // flag == EQ indicates success, decrement held monitor count > // flag == NE indicates failure > > > The fix passed our CI testing with LockingMode set to LM_LEGACY > Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Same enhancement for compiler_fast_unlock_lightweight_object as suggested by Axel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21496/files - new: https://git.openjdk.org/jdk/pull/21496/files/1963e884..b24743a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21496&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21496&range=00-01 Stats: 11 lines in 1 file changed: 2 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21496.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21496/head:pull/21496 PR: https://git.openjdk.org/jdk/pull/21496 From mdoerr at openjdk.org Wed Oct 16 09:33:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 16 Oct 2024 09:33:11 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success [v2] In-Reply-To: References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: On Wed, 16 Oct 2024 09:19:48 GMT, Richard Reingruber wrote: >> This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: >> >> >> // flag == EQ indicates success, decrement held monitor count >> // flag == NE indicates failure >> >> >> The fix passed our CI testing with LockingMode set to LM_LEGACY >> Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Same enhancement for compiler_fast_unlock_lightweight_object as suggested by Axel LGTM. Thanks for improving it! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21496#pullrequestreview-2371813436 From aboldtch at openjdk.org Wed Oct 16 09:47:11 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 16 Oct 2024 09:47:11 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success [v2] In-Reply-To: References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: On Wed, 16 Oct 2024 09:19:48 GMT, Richard Reingruber wrote: >> This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: >> >> >> // flag == EQ indicates success, decrement held monitor count >> // flag == NE indicates failure >> >> >> The fix passed our CI testing with LockingMode set to LM_LEGACY >> Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Same enhancement for compiler_fast_unlock_lightweight_object as suggested by Axel Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21496#pullrequestreview-2371853894 From tschatzl at openjdk.org Wed Oct 16 10:09:13 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 16 Oct 2024 10:09:13 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v3] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 06:26:47 GMT, Tobias Hartmann wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> In short: >> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. >> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. >> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. >> >> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. >> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) t... > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Remove Patching_lock Seems good to me. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21389#pullrequestreview-2371909629 From thartmann at openjdk.org Wed Oct 16 10:24:12 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Oct 2024 10:24:12 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v3] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 06:26:47 GMT, Tobias Hartmann wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> In short: >> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. >> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. >> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. >> >> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. >> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) t... > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Remove Patching_lock Thanks Thomas! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2416376434 From fgao at openjdk.org Wed Oct 16 10:33:14 2024 From: fgao at openjdk.org (Fei Gao) Date: Wed, 16 Oct 2024 10:33:14 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 12:16:28 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Update make/autoconf/flags-cflags.m4 > > Co-authored-by: Magnus Ihse Bursie src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8218: > 8216: > 8217: snprintf(ebuf, sizeof(ebuf), "%sdx_%ssve", VectorSupport::mathname[op], ulf); > 8218: StubRoutines::_vector_d_math[VectorSupport::VEC_SIZE_SCALABLE][op] = (address)os::dll_lookup(libsleef, ebuf); May I ask why `aarch64` doesn't have C file including macro expansion of function names while `RISC-V` needs it, see added in https://github.com/openjdk/jdk/pull/21083/files#diff-65f5198005719e644115782e7f4dd5a17c0969b01cbb50a1224b6800bbf8f177? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21502#discussion_r1802837028 From yzheng at openjdk.org Wed Oct 16 11:50:37 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 16 Oct 2024 11:50:37 GMT Subject: RFR: 8342332: [JVMCI] Export CompilerToVM::Data::dtanh Message-ID: https://github.com/openjdk/jdk/pull/20657 adds x86_64 intrinsic for tanh. Exporting CompilerToVM::Data::dtanh allows JVMCI compiler to reuse the same HotSpot stub. ------------- Commit messages: - [JVMCI] Export CompilerToVM::Data::dtanh Changes: https://git.openjdk.org/jdk/pull/21535/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21535&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342332 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21535/head:pull/21535 PR: https://git.openjdk.org/jdk/pull/21535 From thartmann at openjdk.org Wed Oct 16 12:12:12 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Oct 2024 12:12:12 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn [v2] In-Reply-To: References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: <1DFWV4Lskaxs2UF7xBN9RUqnpipLyrSKPmsSHfR0XKo=.458f23e9-62bb-4013-af6f-f05df89ee7ad@github.com> On Tue, 15 Oct 2024 18:53:45 GMT, Chad Rakoczy wrote: >> [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) >> >> Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. >> >> I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways >> >> Confirmed that added test fails before patch and passes after > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Add blank line at end of test > - Add jasm and update test description test/hotspot/jtreg/compiler/c1/ComplexLockingAndMultiThreading.jasm line 13: > 11: return; > 12: } > 13: public static Method main:"([Ljava/lang/String;)V" There's no need to have the main method in jasm, right? I think it's only `synchronizedMethod` which triggers the issue. So ideally, the jasm file is as compact as possible to ease future maintenance of the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21473#discussion_r1802980004 From luhenry at openjdk.org Wed Oct 16 12:18:11 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 16 Oct 2024 12:18:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 10:29:58 GMT, Fei Gao wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Update make/autoconf/flags-cflags.m4 >> >> Co-authored-by: Magnus Ihse Bursie > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8218: > >> 8216: >> 8217: snprintf(ebuf, sizeof(ebuf), "%sdx_%ssve", VectorSupport::mathname[op], ulf); >> 8218: StubRoutines::_vector_d_math[VectorSupport::VEC_SIZE_SCALABLE][op] = (address)os::dll_lookup(libsleef, ebuf); > > May I ask why `aarch64` doesn't have C file including macro expansion of function names while `RISC-V` needs it, see added in https://github.com/openjdk/jdk/pull/21083/files#diff-65f5198005719e644115782e7f4dd5a17c0969b01cbb50a1224b6800bbf8f177? Thanks. Agreed, it seems to be missing `src/jdk.incubator.vector/linux/native/libsleef/lib/vector_math_sve.c` in this PR ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21502#discussion_r1802992418 From mli at openjdk.org Wed Oct 16 13:47:21 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 16 Oct 2024 13:47:21 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 12:15:51 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8218: >> >>> 8216: >>> 8217: snprintf(ebuf, sizeof(ebuf), "%sdx_%ssve", VectorSupport::mathname[op], ulf); >>> 8218: StubRoutines::_vector_d_math[VectorSupport::VEC_SIZE_SCALABLE][op] = (address)os::dll_lookup(libsleef, ebuf); >> >> May I ask why `aarch64` doesn't have C file including macro expansion of function names while `RISC-V` needs it, see added in https://github.com/openjdk/jdk/pull/21083/files#diff-65f5198005719e644115782e7f4dd5a17c0969b01cbb50a1224b6800bbf8f177? Thanks. > > Agreed, it seems to be missing `src/jdk.incubator.vector/linux/native/libsleef/lib/vector_math_sve.c` in this PR Thanks for catching! Yes, somehow I missed this file and another one for neon in this pr when commit, will add it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21502#discussion_r1803137855 From yzheng at openjdk.org Wed Oct 16 13:49:38 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 16 Oct 2024 13:49:38 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses [v2] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - address comments. - Merge master - trim trailing whitespace - make JVMCI aware that some klass pointers are not compressible ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20949/files - new: https://git.openjdk.org/jdk/pull/20949/files/712272bb..92001a87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20949&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20949&range=00-01 Stats: 230149 lines in 1914 files changed: 207890 ins; 11763 del; 10496 mod Patch: https://git.openjdk.org/jdk/pull/20949.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20949/head:pull/20949 PR: https://git.openjdk.org/jdk/pull/20949 From mbaesken at openjdk.org Wed Oct 16 13:54:12 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 16 Oct 2024 13:54:12 GMT Subject: RFR: 8341862: PPC64: C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:51:21 GMT, Richard Reingruber wrote: > Make sure `LIR_Assembler::emit_unwind_handler()` jumps to the slow path directly for unlocking a synchronized method if `LM_MONITOR` is used. > On the fast paths assertions are added that the mode is actually handled. > > The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. Marked as reviewed by mbaesken (Reviewer). Do you think src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp `LIR_Assembler::emit_unwind_handler() ` should be adjusted in a similar way (separate issue/change however). ------------- PR Review: https://git.openjdk.org/jdk/pull/21497#pullrequestreview-2372556029 PR Comment: https://git.openjdk.org/jdk/pull/21497#issuecomment-2416912098 From mli at openjdk.org Wed Oct 16 14:00:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 16 Oct 2024 14:00:37 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. > This pr is based on https://github.com/openjdk/jdk/pull/20781. > > Thanks! > > ## Test > ### tests: > * test/jdk/jdk/incubator/vector/ > * test/hotspot/jtreg/compiler/vectorapi/ > > ### options: > * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:+EnableVectorSupport -XX:-UseVectorStubs > > ## Performance > > ### Tests > jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). > > ### Options > * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' > * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' > > ### Performance data > I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: add missing files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21502/files - new: https://git.openjdk.org/jdk/pull/21502/files/3aaf1c46..e4b98bfb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21502&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21502&range=01-02 Stats: 180 lines in 2 files changed: 180 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21502.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21502/head:pull/21502 PR: https://git.openjdk.org/jdk/pull/21502 From rrich at openjdk.org Wed Oct 16 14:16:11 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 16 Oct 2024 14:16:11 GMT Subject: RFR: 8341862: PPC64: C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:51:21 GMT, Richard Reingruber wrote: > Make sure `LIR_Assembler::emit_unwind_handler()` jumps to the slow path directly for unlocking a synchronized method if `LM_MONITOR` is used. > On the fast paths assertions are added that the mode is actually handled. > > The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. Thanks for the review! > Do you think src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp > > `LIR_Assembler::emit_unwind_handler() ` > > should be adjusted in a similar way (separate issue/change however). You are right indeed. @offamitkumar as Matthias noticed it looks like s390 is affected by the issue too. You might want to check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21497#issuecomment-2416971346 From eastigeevich at openjdk.org Wed Oct 16 14:48:12 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 16 Oct 2024 14:48:12 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn [v2] In-Reply-To: <1DFWV4Lskaxs2UF7xBN9RUqnpipLyrSKPmsSHfR0XKo=.458f23e9-62bb-4013-af6f-f05df89ee7ad@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> <1DFWV4Lskaxs2UF7xBN9RUqnpipLyrSKPmsSHfR0XKo=.458f23e9-62bb-4013-af6f-f05df89ee7ad@github.com> Message-ID: On Wed, 16 Oct 2024 12:09:22 GMT, Tobias Hartmann wrote: >> Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add blank line at end of test >> - Add jasm and update test description > > test/hotspot/jtreg/compiler/c1/ComplexLockingAndMultiThreading.jasm line 13: > >> 11: return; >> 12: } >> 13: public static Method main:"([Ljava/lang/String;)V" > > There's no need to have the main method in jasm, right? I think it's only `synchronizedMethod` which triggers the issue. So ideally, the jasm file is as compact as possible to ease future maintenance of the test. IMO we even don't need to run the code to trigger compilation. We can use WhiteBox to compile it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21473#discussion_r1803261299 From yzheng at openjdk.org Wed Oct 16 15:02:27 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 16 Oct 2024 15:02:27 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses [v3] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: Fix JIT error. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20949/files - new: https://git.openjdk.org/jdk/pull/20949/files/92001a87..e44d98a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20949&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20949&range=01-02 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20949.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20949/head:pull/20949 PR: https://git.openjdk.org/jdk/pull/20949 From amitkumar at openjdk.org Wed Oct 16 15:17:10 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 16 Oct 2024 15:17:10 GMT Subject: RFR: 8341862: PPC64: C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:51:21 GMT, Richard Reingruber wrote: > Make sure `LIR_Assembler::emit_unwind_handler()` jumps to the slow path directly for unlocking a synchronized method if `LM_MONITOR` is used. > On the fast paths assertions are added that the mode is actually handled. > > The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. Thanks Richard, Matthias for pointing it out. I have opened https://bugs.openjdk.org/browse/JDK-8342409 and will fix it soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21497#issuecomment-2417132837 From kxu at openjdk.org Wed Oct 16 15:18:52 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 16 Oct 2024 15:18:52 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v23] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: requires c2 enabled for IR tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/845e34cc..04b2c6ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=21-22 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Wed Oct 16 15:18:52 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 16 Oct 2024 15:18:52 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v21] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <1R6tWNQF3WSJ4joPaoWR3N_JKy0W-BB0Zntg00N-mlU=.e8b25703-93ea-42be-ba1d-46e06c17987c@github.com> Message-ID: On Wed, 16 Oct 2024 05:13:01 GMT, Tobias Hartmann wrote: >> @TobiHartmann. Thanks for the feedback! I did some investigation, reasons for timeouts comes three folds: >> >> 1. Tests with `i <= stop` is not a counted loop in the first place and should be removed: >> >> Now I remember why I originally didn't test for it. Consider `for (int i = 0; i <= stop; i++);` when `stop = Integer.MAX_VALUE`. Overflow in Java is well-defined, which means the code must loop indefinitely and optimizations of any kind can't break this. Therefore, `<=` are not counted loops to begin with. `@IR(failOn = {IRNode.COUNTED_LOOP})` doesn't fail either. I removed these test cases. >> >> 2. It is normal to timeout with `-XX:StressLongCountedLoop=200000000` for all test cases: >> >> An value other than `0` for this flag will forcefully convert int counted loops to long counted loops, which C2 doesn't do parallel IV at this point. This is same issue as [JDK-8294839](https://bugs.openjdk.org/browse/JDK-8294838). Loops are still loops. For a large random `stop` value, this will take a long time to loop through. >> >> 3. It is normal to timeout with `-XX:PerMethodTrapLimit=0` for test cases with stride other than `1`: >> >> Take `for (int i = 0; i < stop; i += 2)` for an example. Since there is a chance for increment to `i` go beyond `stop` (and eventually overflows), there must be some sort of runtime check for `stop`. Normally, a `loop_limit_check` trap is compiled to take the slow path (deoptimization). However, the zero trap limit forces C2 to loop and check `i < stop` on every iteration. For a large random `stop` value, this will take a long time. >> >> For the latter two reasons, I added `runWithFlags()` to essentially disable the flags in questions. >> >> https://github.com/openjdk/jdk/blob/845e34cc7a82ef5cb69620a12f487adaca9d2613/test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java#L47-L51 > > @tabjy We are still seeing timeouts with `-XX:+UnlockDiagnosticVMOptions -XX:TieredStopAtLevel=3 -XX:+StressLoopInvariantCodeMotion -XX:+StressRangeCheckElimination -XX:+StressLinearScan`. Maybe the test should be enabled only if C2 is available. @TobiHartmann Yes `-XX:TieredStopAtLevel=3` will cause timeouts. I added the `@requires vm.compiler2.enabled` option to the test header. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2417134416 From jbhateja at openjdk.org Wed Oct 16 16:11:26 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 16:11:26 GMT Subject: Integrated: 8338023: Support two vector selectFrom API In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 8 Aug 2024 06:57:28 GMT, Jatin Bhateja wrote: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... This pull request has now been integrated. Changeset: 709914fc Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/709914fc92dd180c8f081ff70ef476554a04f4ce Stats: 2805 lines in 89 files changed: 2786 ins; 18 del; 1 mod 8338023: Support two vector selectFrom API Reviewed-by: psandoz, epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/20508 From sviswanathan at openjdk.org Wed Oct 16 16:07:38 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Oct 2024 16:07:38 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sun, 13 Oct 2024 11:18:01 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating tests to use floorMod Changes look good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2372983898 From sviswanathan at openjdk.org Wed Oct 16 16:28:50 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Oct 2024 16:28:50 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v4] In-Reply-To: References: Message-ID: <9LrhezvsYwS32PEUN9wn6hKDJJn0wybl3YXSHuohUC8=.eded7969-c03c-4a75-a2c2-2d0e9682722d@github.com> > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21480/files - new: https://git.openjdk.org/jdk/pull/21480/files/f2981374..c876dc95 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21480/head:pull/21480 PR: https://git.openjdk.org/jdk/pull/21480 From sviswanathan at openjdk.org Wed Oct 16 16:28:50 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Oct 2024 16:28:50 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> <-5pIV32xfg2DMexItjaDQWkkTK4FVIfbB7G73LKFoxA=.1a7c8ada-6641-486f-ba83-6b9f5b5eb7ec@github.com> Message-ID: On Wed, 16 Oct 2024 01:52:20 GMT, Jatin Bhateja wrote: >> We have used bare inputs at many places in the ad file in the predicate. > > I think its ok to use safe cast if its available atleast for newly added code. @jatin-bhateja I have made this change. Please take a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1803442065 From fbredberg at openjdk.org Wed Oct 16 16:38:15 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 16 Oct 2024 16:38:15 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success In-Reply-To: References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: On Wed, 16 Oct 2024 07:55:06 GMT, Richard Reingruber wrote: >> Seems like `compiler_fast_unlock_lightweight_object` could do the same. >> >> It does the same branch, but correctly, by going through a `crorc(CCR0, Assembler::equal, CCR0, Assembler::equal)` in the success path. > >> Seems like `compiler_fast_unlock_lightweight_object` could do the same. >> >> It does the same branch, but correctly, by going through a `crorc(CCR0, Assembler::equal, CCR0, Assembler::equal)` in the success path. > > I think you're right. I wanted to see if [`set_eq_unlocked`](https://github.com/openjdk/jdk/blob/dcac4b0a532f2ca6cb374da7ece331e8266ab351/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp#L3087) could be eliminated and then forgot about it. @reinrich Thank you for finding and fixing this. Also sorry for not finding this when I tested my PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21496#issuecomment-2417341796 From fbredberg at openjdk.org Wed Oct 16 16:38:14 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 16 Oct 2024 16:38:14 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success [v2] In-Reply-To: References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: On Wed, 16 Oct 2024 09:19:48 GMT, Richard Reingruber wrote: >> This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: >> >> >> // flag == EQ indicates success, decrement held monitor count >> // flag == NE indicates failure >> >> >> The fix passed our CI testing with LockingMode set to LM_LEGACY >> Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Same enhancement for compiler_fast_unlock_lightweight_object as suggested by Axel Marked as reviewed by fbredberg (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21496#pullrequestreview-2373071793 From jbhateja at openjdk.org Wed Oct 16 17:29:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 17:29:04 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v25] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: New IR tests + additional IR transformations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/506ae299..c5650889 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=23-24 Stats: 1010 lines in 7 files changed: 1007 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From liach at openjdk.org Wed Oct 16 17:55:24 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 16 Oct 2024 17:55:24 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sun, 13 Oct 2024 11:18:01 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating tests to use floorMod This patch failed on the lastest master. Another reason OpenJDK guide asks to merge master despite all these commit churns... ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2417516748 From jbhateja at openjdk.org Wed Oct 16 18:22:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 18:22:38 GMT Subject: RFR: 8342439: Build failure after 8338023 Message-ID: <6nSmTXRBMkyaZ-YoRacPTyUU_3NcASt9q-47e5OO05Q=.59cf3942-a9e1-4a97-8cff-33da4cdf75a8@github.com> Fix failing build. Due to notification problem with integration system. Thanks, Jatin ------------- Commit messages: - Build fix Changes: https://git.openjdk.org/jdk/pull/21547/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21547&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342439 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21547.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21547/head:pull/21547 PR: https://git.openjdk.org/jdk/pull/21547 From liach at openjdk.org Wed Oct 16 18:22:38 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 16 Oct 2024 18:22:38 GMT Subject: RFR: 8342439: Build failure after 8338023 In-Reply-To: <6nSmTXRBMkyaZ-YoRacPTyUU_3NcASt9q-47e5OO05Q=.59cf3942-a9e1-4a97-8cff-33da4cdf75a8@github.com> References: <6nSmTXRBMkyaZ-YoRacPTyUU_3NcASt9q-47e5OO05Q=.59cf3942-a9e1-4a97-8cff-33da4cdf75a8@github.com> Message-ID: <8sbmlX0WSV7GVEbvwioogNRnteuplsd5LBbEWiMGc5I=.611ce77c-cf94-4440-8e0d-a0a14ba73fae@github.com> On Wed, 16 Oct 2024 18:14:26 GMT, Jatin Bhateja wrote: > Fix failing build. > > Due to notification problem with integration system. > > Thanks, > Jatin Can we change the title of the JBS issue to be more descriptive, like "Build failure after 8338023"? So the PR title will be like `8342439: Build failure after 8338023`. This is in part our fault, that we should test in CI after merging master. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21547#issuecomment-2417582684 From liach at openjdk.org Wed Oct 16 18:30:14 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 16 Oct 2024 18:30:14 GMT Subject: RFR: 8342439: Build failure after 8338023 In-Reply-To: <6nSmTXRBMkyaZ-YoRacPTyUU_3NcASt9q-47e5OO05Q=.59cf3942-a9e1-4a97-8cff-33da4cdf75a8@github.com> References: <6nSmTXRBMkyaZ-YoRacPTyUU_3NcASt9q-47e5OO05Q=.59cf3942-a9e1-4a97-8cff-33da4cdf75a8@github.com> Message-ID: On Wed, 16 Oct 2024 18:14:26 GMT, Jatin Bhateja wrote: > Fix failing build. > > Due to notification problem with integration system. > > Thanks, > Jatin Build fixed in my local OracleJDK linux-x64 setup. Yep, running a build locally with your branch merged into master. Will approve once build passes. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21547#pullrequestreview-2373330332 PR Comment: https://git.openjdk.org/jdk/pull/21547#issuecomment-2417592187 From jbhateja at openjdk.org Wed Oct 16 18:30:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 18:30:14 GMT Subject: RFR: 8342439: Build failure after 8338023 In-Reply-To: <6nSmTXRBMkyaZ-YoRacPTyUU_3NcASt9q-47e5OO05Q=.59cf3942-a9e1-4a97-8cff-33da4cdf75a8@github.com> References: <6nSmTXRBMkyaZ-YoRacPTyUU_3NcASt9q-47e5OO05Q=.59cf3942-a9e1-4a97-8cff-33da4cdf75a8@github.com> Message-ID: On Wed, 16 Oct 2024 18:14:26 GMT, Jatin Bhateja wrote: > Fix failing build. > > Due to notification problem with integration system. > > Thanks, > Jatin @vnkozlov , @liach , Kindly approve. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21547#issuecomment-2417590659 From jbhateja at openjdk.org Wed Oct 16 18:30:15 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 18:30:15 GMT Subject: Integrated: 8342439: Build failure after 8338023 In-Reply-To: <6nSmTXRBMkyaZ-YoRacPTyUU_3NcASt9q-47e5OO05Q=.59cf3942-a9e1-4a97-8cff-33da4cdf75a8@github.com> References: <6nSmTXRBMkyaZ-YoRacPTyUU_3NcASt9q-47e5OO05Q=.59cf3942-a9e1-4a97-8cff-33da4cdf75a8@github.com> Message-ID: On Wed, 16 Oct 2024 18:14:26 GMT, Jatin Bhateja wrote: > Fix failing build. > > Due to notification problem with integration system. > > Thanks, > Jatin This pull request has now been integrated. Changeset: d4f0ba73 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/d4f0ba73f653a3886b17f283b9b6a92db1af52aa Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8342439: Build failure after 8338023 Reviewed-by: liach ------------- PR: https://git.openjdk.org/jdk/pull/21547 From jbhateja at openjdk.org Wed Oct 16 19:04:10 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 19:04:10 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v4] In-Reply-To: <9LrhezvsYwS32PEUN9wn6hKDJJn0wybl3YXSHuohUC8=.eded7969-c03c-4a75-a2c2-2d0e9682722d@github.com> References: <9LrhezvsYwS32PEUN9wn6hKDJJn0wybl3YXSHuohUC8=.eded7969-c03c-4a75-a2c2-2d0e9682722d@github.com> Message-ID: On Wed, 16 Oct 2024 16:28:50 GMT, Sandhya Viswanathan wrote: >> When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. >> Also a regression test case is added accordingly. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21480#pullrequestreview-2373423446 From rrich at openjdk.org Wed Oct 16 19:20:14 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 16 Oct 2024 19:20:14 GMT Subject: RFR: 8341862: PPC64: C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:51:21 GMT, Richard Reingruber wrote: > Make sure `LIR_Assembler::emit_unwind_handler()` jumps to the slow path directly for unlocking a synchronized method if `LM_MONITOR` is used. > On the fast paths assertions are added that the mode is actually handled. > > The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21497#issuecomment-2417743803 From rrich at openjdk.org Wed Oct 16 19:20:15 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 16 Oct 2024 19:20:15 GMT Subject: Integrated: 8341862: PPC64: C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:51:21 GMT, Richard Reingruber wrote: > Make sure `LIR_Assembler::emit_unwind_handler()` jumps to the slow path directly for unlocking a synchronized method if `LM_MONITOR` is used. > On the fast paths assertions are added that the mode is actually handled. > > The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. This pull request has now been integrated. Changeset: ed680966 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/ed6809666b12b0de66f68d5e7e389dde1708aaf3 Stats: 9 lines in 2 files changed: 8 ins; 0 del; 1 mod 8341862: PPC64: C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR Reviewed-by: mdoerr, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/21497 From rrich at openjdk.org Wed Oct 16 19:29:11 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 16 Oct 2024 19:29:11 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success In-Reply-To: References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: <27ImL-UJuqGoNl2ty7-tV952yEv4ssNKpIHkQ9SYSuE=.1a5fd01b-3a8c-4b97-83e9-82c5476ae02f@github.com> On Wed, 16 Oct 2024 07:55:06 GMT, Richard Reingruber wrote: >> Seems like `compiler_fast_unlock_lightweight_object` could do the same. >> >> It does the same branch, but correctly, by going through a `crorc(CCR0, Assembler::equal, CCR0, Assembler::equal)` in the success path. > >> Seems like `compiler_fast_unlock_lightweight_object` could do the same. >> >> It does the same branch, but correctly, by going through a `crorc(CCR0, Assembler::equal, CCR0, Assembler::equal)` in the success path. > > I think you're right. I wanted to see if [`set_eq_unlocked`](https://github.com/openjdk/jdk/blob/dcac4b0a532f2ca6cb374da7ece331e8266ab351/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp#L3087) could be eliminated and then forgot about it. > @reinrich Thank you for finding and fixing this. Also sorry for not finding this when I tested my PR. Thanks for looking at this @fbredber. Even with the assertions from https://github.com/openjdk/jdk/pull/21494 this didn't cause failures testing jdk 24. So no worries about the testing of your PR. I found the issue working on the PPC port of the vthread monitor support in the loom repo. There an assertion in the runtime failed when the slow path was taken. I don't know why though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21496#issuecomment-2417759654 From psandoz at openjdk.org Wed Oct 16 19:47:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 16 Oct 2024 19:47:17 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v25] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 17:29:04 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > New IR tests + additional IR transformations Rather than adding more IR test functionality to this PR that requires additional review my recommendation would be to follow up in another PR or before hand rethink our approach. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2417790144 From never at openjdk.org Wed Oct 16 19:53:13 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 16 Oct 2024 19:53:13 GMT Subject: RFR: 8342332: [JVMCI] Export CompilerToVM::Data::dtanh In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 11:44:53 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/20657 adds x86_64 intrinsic for tanh. Exporting CompilerToVM::Data::dtanh allows JVMCI compiler to reuse the same HotSpot stub. looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21535#pullrequestreview-2373529713 From yzheng at openjdk.org Wed Oct 16 20:01:15 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 16 Oct 2024 20:01:15 GMT Subject: RFR: 8342332: [JVMCI] Export CompilerToVM::Data::dtanh In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 11:44:53 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/20657 adds x86_64 intrinsic for tanh. Exporting CompilerToVM::Data::dtanh allows JVMCI compiler to reuse the same HotSpot stub. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21535#issuecomment-2417826824 From yzheng at openjdk.org Wed Oct 16 20:01:16 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 16 Oct 2024 20:01:16 GMT Subject: Integrated: 8342332: [JVMCI] Export CompilerToVM::Data::dtanh In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 11:44:53 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/20657 adds x86_64 intrinsic for tanh. Exporting CompilerToVM::Data::dtanh allows JVMCI compiler to reuse the same HotSpot stub. This pull request has now been integrated. Changeset: 28538524 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/285385247aaa262866697ed848040f05f4d94988 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8342332: [JVMCI] Export CompilerToVM::Data::dtanh Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/21535 From kvn at openjdk.org Wed Oct 16 20:12:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 16 Oct 2024 20:12:21 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v3] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 06:26:47 GMT, Tobias Hartmann wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> In short: >> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. >> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. >> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. >> >> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. >> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) t... > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Remove Patching_lock Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21389#pullrequestreview-2373570233 From duke at openjdk.org Wed Oct 16 20:29:09 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 16 Oct 2024 20:29:09 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v3] In-Reply-To: References: Message-ID: > Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > ### Generate test instructions > With `binutils = 2.43` > * `python3 x86-asmtest.py > asmtest.out.h` > ### Run test > * `make test TEST="gtest:AssemblerX86"` hanklo6 has updated the pull request incrementally with one additional commit since the last revision: Add comment and defined ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20857/files - new: https://git.openjdk.org/jdk/pull/20857/files/766582d8..e6b4abd8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=01-02 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20857.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20857/head:pull/20857 PR: https://git.openjdk.org/jdk/pull/20857 From duke at openjdk.org Wed Oct 16 20:29:10 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 16 Oct 2024 20:29:10 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v2] In-Reply-To: References: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> Message-ID: On Wed, 9 Oct 2024 22:14:29 GMT, Vladimir Kozlov wrote: >> hanklo6 has updated the pull request incrementally with one additional commit since the last revision: >> >> Add copyright header > > test/hotspot/gtest/x86/test_assemblerx86.cpp line 26: > >> 24: #include "precompiled.hpp" >> 25: >> 26: #if defined(X86) > > You may add ` && !defined(ZERO)` similar to `test_assembler_aarch64.cpp` test. Thanks, done. > test/hotspot/gtest/x86/test_assemblerx86.cpp line 93: > >> 91: address entry = __ pc(); >> 92: >> 93: // python x86-asmtest.py | expand > asmtest.out.h > > The PR description shows different instructions to build: > > With binutils = 2.43 > python3 x86-asmtest.py > asmtest.out.h > > > I would like to have comment with correct and detailed instructions how to build `asmtest.out.h` Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1803755497 PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1803755724 From duke at openjdk.org Wed Oct 16 20:43:10 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 16 Oct 2024 20:43:10 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v2] In-Reply-To: References: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> Message-ID: <0gJXjGw1Z4Bzh-jAn7foyanvzdckcJ6nWTc3GEmDZFU=.f71c7632-de09-431f-ac71-cf5806b045ae@github.com> On Wed, 9 Oct 2024 22:15:23 GMT, Vladimir Kozlov wrote: >> hanklo6 has updated the pull request incrementally with one additional commit since the last revision: >> >> Add copyright header > > Is this test for both 32- and 64-bits instructions/VMs? > > How complete the set of instructions covered by the test? @vnkozlov Yes, it tests both 32-bit and 64-bit instructions. This tool currently covers all legacy instructions in map0 and map1 that have explicit GPR or memory operands. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20857#issuecomment-2417910830 From duke at openjdk.org Wed Oct 16 21:10:13 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 16 Oct 2024 21:10:13 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v2] In-Reply-To: References: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> Message-ID: On Wed, 9 Oct 2024 22:15:23 GMT, Vladimir Kozlov wrote: >> hanklo6 has updated the pull request incrementally with one additional commit since the last revision: >> >> Add copyright header > > Is this test for both 32- and 64-bits instructions/VMs? > > How complete the set of instructions covered by the test? @vnkozlov Please let me know if I need to make any other change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20857#issuecomment-2417966605 From duke at openjdk.org Wed Oct 16 21:13:57 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 16 Oct 2024 21:13:57 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v4] In-Reply-To: References: Message-ID: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> > Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > ### Generate test instructions > With `binutils = 2.43` > * `python3 x86-asmtest.py > asmtest.out.h` > ### Run test > * `make test TEST="gtest:AssemblerX86"` hanklo6 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge branch 'master' of https://git.openjdk.java.net/jdk into apx-test-tool - Add comment and defined - Add copyright header - Remove tab - Remove whitespace - Replace whitespace with tab - Add flag before testing - Fix assertion error on MacOS - Add _LP64 flag - Add missing header - ... and 6 more: https://git.openjdk.org/jdk/compare/6194359b...ca48f240 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20857/files - new: https://git.openjdk.org/jdk/pull/20857/files/e6b4abd8..ca48f240 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=02-03 Stats: 236533 lines in 2002 files changed: 213511 ins; 11992 del; 11030 mod Patch: https://git.openjdk.org/jdk/pull/20857.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20857/head:pull/20857 PR: https://git.openjdk.org/jdk/pull/20857 From svkamath at openjdk.org Wed Oct 16 23:13:45 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Wed, 16 Oct 2024 23:13:45 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v5] In-Reply-To: References: Message-ID: > Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. Smita Kamath has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' of https://git.openjdk.org/jdk into sha-512 - Updated code as per review comments - Addressed a review comment - Updated code as per review comment & updated test case - Updated AMD64.java - Merge master - SHA-512 implementation using SHA-NI instructions ------------- Changes: https://git.openjdk.org/jdk/pull/20633/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20633&range=04 Stats: 271 lines in 10 files changed: 252 ins; 11 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20633.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20633/head:pull/20633 PR: https://git.openjdk.org/jdk/pull/20633 From dlong at openjdk.org Wed Oct 16 23:14:13 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Oct 2024 23:14:13 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v3] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 06:26:47 GMT, Tobias Hartmann wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> In short: >> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. >> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. >> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. >> >> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. >> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) t... > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Remove Patching_lock Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21389#pullrequestreview-2373835987 From kvn at openjdk.org Wed Oct 16 23:53:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 16 Oct 2024 23:53:15 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v4] In-Reply-To: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> References: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> Message-ID: On Wed, 16 Oct 2024 21:13:57 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into apx-test-tool > - Add comment and defined > - Add copyright header > - Remove tab > - Remove whitespace > - Replace whitespace with tab > - Add flag before testing > - Fix assertion error on MacOS > - Add _LP64 flag > - Add missing header > - ... and 6 more: https://git.openjdk.org/jdk/compare/1ffcb678...ca48f240 Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20857#pullrequestreview-2373870962 From jrose at openjdk.org Thu Oct 17 00:29:24 2024 From: jrose at openjdk.org (John R Rose) Date: Thu, 17 Oct 2024 00:29:24 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v4] In-Reply-To: References: Message-ID: On Sat, 13 Nov 2021 05:22:07 GMT, Quan Anh Mai wrote: >> This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. >> >> In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - add tests cover constant comparison and calling library > - add eq/ne, add correction test, refine micro More completely, it is x +-^ Integer.MIN_VALUE That is, add, sub, xor are all legitimate idioms for flipping the sign ------------- PR Comment: https://git.openjdk.org/jdk/pull/6101#issuecomment-2418223166 From dqu at openjdk.org Thu Oct 17 02:33:58 2024 From: dqu at openjdk.org (Daohan Qu) Date: Thu, 17 Oct 2024 02:33:58 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v6] In-Reply-To: References: Message-ID: > # Description > > [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. > > But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. > > This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". > > Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with > > > make test \ > TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ > TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" > > > shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) > > |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | > |---|---|---| > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.678 ?(99.9%) 0.574 ops/s | 55.692 ?(99.9%) 4.419 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.792 ?(99.9%) 1.924 ops/s | 64.882 ?(99.9%) 4.175 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 27.023 ?(99.9%) 1.116 ops/s | 66.313 ?(99.9%) 0.802 ops/s | > > # Reproduction > > Compiled and run the reduced test case `Test.java` in the appendix below using > > > java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test > > > and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and see some output like > > > " > > > And when `-XX:+AbortVMOn... Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: Bug fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21134/files - new: https://git.openjdk.org/jdk/pull/21134/files/2b2c7d5b..0afff42d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21134&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21134&range=04-05 Stats: 8 lines in 1 file changed: 2 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21134.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21134/head:pull/21134 PR: https://git.openjdk.org/jdk/pull/21134 From vlivanov at openjdk.org Thu Oct 17 02:39:14 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 17 Oct 2024 02:39:14 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v4] In-Reply-To: <0h3TEfNHFDT6g59BudHi_SJHLoxyhY3rDMirzHviNgY=.b85485e1-effd-4644-9a9b-e023ed6e4305@github.com> References: <0h3TEfNHFDT6g59BudHi_SJHLoxyhY3rDMirzHviNgY=.b85485e1-effd-4644-9a9b-e023ed6e4305@github.com> Message-ID: On Wed, 16 Oct 2024 07:48:35 GMT, Tobias Hartmann wrote: >> After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 >> >> Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 >> >> -> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 >> >> The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. >> >> Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: >> https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 >> >> I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > More cleanups Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21470#pullrequestreview-2374016859 From jbhateja at openjdk.org Thu Oct 17 02:41:21 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 02:41:21 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v4] In-Reply-To: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> References: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> Message-ID: On Wed, 16 Oct 2024 21:13:57 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into apx-test-tool > - Add comment and defined > - Add copyright header > - Remove tab > - Remove whitespace > - Replace whitespace with tab > - Add flag before testing > - Fix assertion error on MacOS > - Add _LP64 flag > - Add missing header > - ... and 6 more: https://git.openjdk.org/jdk/compare/8405335e...ca48f240 test/hotspot/gtest/x86/x86-asmtest.py line 655: > 653: } > 654: > 655: for RegOp, ops in instruction_set.items(): Rest of the code is modular, can you kindly refactor following code into a top level routine called from __main__ method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1804008474 From jbhateja at openjdk.org Thu Oct 17 02:41:21 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 02:41:21 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v4] In-Reply-To: References: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> Message-ID: On Thu, 17 Oct 2024 02:06:05 GMT, Jatin Bhateja wrote: >> hanklo6 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into apx-test-tool >> - Add comment and defined >> - Add copyright header >> - Remove tab >> - Remove whitespace >> - Replace whitespace with tab >> - Add flag before testing >> - Fix assertion error on MacOS >> - Add _LP64 flag >> - Add missing header >> - ... and 6 more: https://git.openjdk.org/jdk/compare/8405335e...ca48f240 > > test/hotspot/gtest/x86/x86-asmtest.py line 655: > >> 653: } >> 654: >> 655: for RegOp, ops in instruction_set.items(): > > Rest of the code is modular, can you kindly refactor following code into a top level routine called from __main__ method. Binutils toolset is specfic for linux. Please add relevant OS.name check and exit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1804009409 From vlivanov at openjdk.org Thu Oct 17 02:43:13 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 17 Oct 2024 02:43:13 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 08:12:20 GMT, Tobias Hartmann wrote: > Right, to be on the safe side, I could add a str.is_dynamic_constant() check to limit the trap to condy. What do you think? No need to. It makes sense to improve ciTypeFlow analysis for all not-yet-loaded cases. Probably, additional test cases for such cases would help, but I'd expect existing tests to provide enough coverage. So, up to you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21470#issuecomment-2418373855 From thartmann at openjdk.org Thu Oct 17 05:06:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Oct 2024 05:06:24 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v3] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 06:26:47 GMT, Tobias Hartmann wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> In short: >> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. >> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. >> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. >> >> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. >> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) t... > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Remove Patching_lock Thanks for the reviews, Vladimir and Dean! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2418504320 From thartmann at openjdk.org Thu Oct 17 05:06:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Oct 2024 05:06:24 GMT Subject: Integrated: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann wrote: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > In short: > - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. > - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. > - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. > > Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. > I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remo... This pull request has now been integrated. Changeset: 58d39c31 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/58d39c317e332fda994f66529fcd1a0ea0e53151 Stats: 167 lines in 10 files changed: 145 ins; 7 del; 15 mod 8340313: Crash due to invalid oop in nmethod after C1 patching Reviewed-by: tschatzl, kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/21389 From chagedorn at openjdk.org Thu Oct 17 05:12:14 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 17 Oct 2024 05:12:14 GMT Subject: Integrated: 8341328: Refactor initial Assertion Predicate creation into separate classes In-Reply-To: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> References: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> Message-ID: <6oRBL0iQhdITH-Sz-fszWyjk1Ok9U4ugQHRNeK1w-KQ=.eb85628d-b16a-4faf-917e-aa96a8c842b8@github.com> On Thu, 10 Oct 2024 09:04:21 GMT, Christian Hagedorn wrote: > This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). > > The patch includes the following changes: > - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: > - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). > - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. > - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) > - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). > - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) > - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. > > Thanks, > Christian This pull request has now been integrated. Changeset: 22a1feea Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/22a1feea7484c9d640eeac22943d237a0e549942 Stats: 529 lines in 6 files changed: 302 ins; 118 del; 109 mod 8341328: Refactor initial Assertion Predicate creation into separate classes Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21446 From thartmann at openjdk.org Thu Oct 17 05:17:09 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Oct 2024 05:17:09 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v4] In-Reply-To: <0h3TEfNHFDT6g59BudHi_SJHLoxyhY3rDMirzHviNgY=.b85485e1-effd-4644-9a9b-e023ed6e4305@github.com> References: <0h3TEfNHFDT6g59BudHi_SJHLoxyhY3rDMirzHviNgY=.b85485e1-effd-4644-9a9b-e023ed6e4305@github.com> Message-ID: On Wed, 16 Oct 2024 07:48:35 GMT, Tobias Hartmann wrote: >> After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 >> >> Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 >> >> -> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 >> >> The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. >> >> Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: >> https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 >> >> I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > More cleanups Thanks for the review Vladimir. I think coverage of existing tests is sufficient. I did quite a few experiments and even running `java -version` with `-Xcomp -XX:-TieredCompilation` triggers such cases. It's mostly constant `.class` references to not-yet-loaded classes. I verified manually that we would already bail out during parsing in these cases even without my changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21470#issuecomment-2418518881 From chagedorn at openjdk.org Thu Oct 17 05:28:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 17 Oct 2024 05:28:17 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 15:15:54 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Suggestions from review Testing passed! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21439#issuecomment-2418529527 From amitkumar at openjdk.org Thu Oct 17 06:43:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 17 Oct 2024 06:43:42 GMT Subject: RFR: 8342409: [s390x] C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR Message-ID: Make sure LIR_Assembler::emit_unwind_handler() jumps to the slow path directly for unlocking a synchronized method if LM_MONITOR is used. On the fast paths assertions are added that the mode is actually handled. Testing: Tier1 test for fastdebug vm showed no regression. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/21557/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21557&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342409 Stats: 9 lines in 2 files changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21557.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21557/head:pull/21557 PR: https://git.openjdk.org/jdk/pull/21557 From lucy at openjdk.org Thu Oct 17 07:13:17 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 17 Oct 2024 07:13:17 GMT Subject: RFR: 8341715: PPC64: ObjectMonitor::_owner should be reset unconditionally in nmethod unlocking In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:14:52 GMT, Richard Reingruber wrote: > This removes the `ObjectMonitor::_owner` check when a nmethod unlocks an inflated monitor on ppc64. > Monitor operations by nmethods are guaranteed to be balanced (see JBS-item for a reference) therefore the check is redundant. Other platforms don't have it either. > > I've removed the assertion that the unlocking thread owns the monitor again because it won't work with vthread monitor support in the loom repository. > > The fix passed our CI testing with `LockingMode` set to `LM_LEGACY` > Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with `LM_LEGACY`. LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21494#pullrequestreview-2374347279 From rrich at openjdk.org Thu Oct 17 07:22:18 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 17 Oct 2024 07:22:18 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success [v2] In-Reply-To: References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: On Wed, 16 Oct 2024 09:19:48 GMT, Richard Reingruber wrote: >> This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: >> >> >> // flag == EQ indicates success, decrement held monitor count >> // flag == NE indicates failure >> >> >> The fix passed our CI testing with LockingMode set to LM_LEGACY >> Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Same enhancement for compiler_fast_unlock_lightweight_object as suggested by Axel Thanks again for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21496#issuecomment-2418756602 From rrich at openjdk.org Thu Oct 17 07:22:19 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 17 Oct 2024 07:22:19 GMT Subject: Integrated: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success In-Reply-To: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: On Mon, 14 Oct 2024 13:29:43 GMT, Richard Reingruber wrote: > This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: > > > // flag == EQ indicates success, decrement held monitor count > // flag == NE indicates failure > > > The fix passed our CI testing with LockingMode set to LM_LEGACY > Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. This pull request has now been integrated. Changeset: fa39e84d Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/fa39e84d64d79f6c66f98110e98d2562f35681e1 Stats: 16 lines in 1 file changed: 4 ins; 8 del; 4 mod 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success Reviewed-by: mdoerr, aboldtch, fbredberg ------------- PR: https://git.openjdk.org/jdk/pull/21496 From rrich at openjdk.org Thu Oct 17 07:24:23 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 17 Oct 2024 07:24:23 GMT Subject: RFR: 8341715: PPC64: ObjectMonitor::_owner should be reset unconditionally in nmethod unlocking In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:14:52 GMT, Richard Reingruber wrote: > This removes the `ObjectMonitor::_owner` check when a nmethod unlocks an inflated monitor on ppc64. > Monitor operations by nmethods are guaranteed to be balanced (see JBS-item for a reference) therefore the check is redundant. Other platforms don't have it either. > > I've removed the assertion that the unlocking thread owns the monitor again because it won't work with vthread monitor support in the loom repository. > > The fix passed our CI testing with `LockingMode` set to `LM_LEGACY` > Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with `LM_LEGACY`. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21494#issuecomment-2418759878 From rrich at openjdk.org Thu Oct 17 07:24:23 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 17 Oct 2024 07:24:23 GMT Subject: Integrated: 8341715: PPC64: ObjectMonitor::_owner should be reset unconditionally in nmethod unlocking In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:14:52 GMT, Richard Reingruber wrote: > This removes the `ObjectMonitor::_owner` check when a nmethod unlocks an inflated monitor on ppc64. > Monitor operations by nmethods are guaranteed to be balanced (see JBS-item for a reference) therefore the check is redundant. Other platforms don't have it either. > > I've removed the assertion that the unlocking thread owns the monitor again because it won't work with vthread monitor support in the loom repository. > > The fix passed our CI testing with `LockingMode` set to `LM_LEGACY` > Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with `LM_LEGACY`. This pull request has now been integrated. Changeset: f9208fad Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/f9208fadde8141e18a025ddb6ce28423861ba391 Stats: 33 lines in 1 file changed: 26 ins; 7 del; 0 mod 8341715: PPC64: ObjectMonitor::_owner should be reset unconditionally in nmethod unlocking Reviewed-by: mdoerr, lucy ------------- PR: https://git.openjdk.org/jdk/pull/21494 From rcastanedalo at openjdk.org Thu Oct 17 07:35:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 17 Oct 2024 07:35:12 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v7] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 07:53:30 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with two additional commits since the last revision: > > - Format > - Review comments applied Looks good, thanks for doing this! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2374393267 From duke at openjdk.org Thu Oct 17 07:45:51 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Thu, 17 Oct 2024 07:45:51 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v8] In-Reply-To: References: Message-ID: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Fix unused using, update copyright years ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/3f85b901..3bb0044b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=06-07 Stats: 7 lines in 7 files changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From jsjolen at openjdk.org Thu Oct 17 07:45:51 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 17 Oct 2024 07:45:51 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v8] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 07:42:46 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Fix unused using, update copyright years Still LGTM, thank you. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2374410349 From aboldtch at openjdk.org Thu Oct 17 07:45:51 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 17 Oct 2024 07:45:51 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v8] In-Reply-To: References: Message-ID: <6bRWUtLvuHPs397_9Uwy5cR68SD4HmadYZym9Vofp6w=.b27dde92-7a6a-491a-b3df-db684caefbc3@github.com> On Thu, 17 Oct 2024 07:42:46 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Fix unused using, update copyright years Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2374413569 From rcastanedalo at openjdk.org Thu Oct 17 08:00:14 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 17 Oct 2024 08:00:14 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v8] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 07:45:51 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Fix unused using, update copyright years Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2374448073 From duke at openjdk.org Thu Oct 17 09:21:25 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Thu, 17 Oct 2024 09:21:25 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v8] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 07:45:51 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Fix unused using, update copyright years Thanks to all for your input on this!!! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21383#issuecomment-2419000802 From duke at openjdk.org Thu Oct 17 09:21:27 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Thu, 17 Oct 2024 09:21:27 GMT Subject: Integrated: 8341622: Tag-specific disabled default decorators for UnifiedLogging In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:56:40 GMT, Ant?n Seoane wrote: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. This pull request has now been integrated. Changeset: 9bdface1 Author: Ant?n Seoane Ampudia Committer: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/9bdface14719d53f40a6572f1c3d4b816c32438b Stats: 301 lines in 9 files changed: 287 ins; 2 del; 12 mod 8341622: Tag-specific disabled default decorators for UnifiedLogging Reviewed-by: jsjolen, rcastanedalo, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/21383 From amitkumar at openjdk.org Thu Oct 17 09:50:44 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 17 Oct 2024 09:50:44 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long Message-ID: Add match rules for UDivI, UModI, UDivL, UModL. And also adds `dlr` and `dlgr` instruction. Tier1 test are clean for fastdebug vm; Without Patch: Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 1935.176 ? 2.191 ns/op IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 1934.915 ? 3.207 ns/op IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 1934.325 ? 1.108 ns/op IntegerDivMod.testDivideUnsigned 1024 mixed avgt 15 1809.782 ? 49.341 ns/op IntegerDivMod.testDivideUnsigned 1024 positive avgt 15 1769.326 ? 2.607 ns/op IntegerDivMod.testDivideUnsigned 1024 negative avgt 15 1784.053 ? 71.190 ns/op IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 2026.978 ? 1.534 ns/op IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 2028.039 ? 3.812 ns/op IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 2437.843 ? 636.808 ns/op Finished running test 'micro:java.lang.IntegerDivMod' Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 4524.897 ? 16.566 ns/op LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 4373.714 ? 9.514 ns/op LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2018.309 ? 1.788 ns/op LongDivMod.testDivideUnsigned 1024 mixed avgt 15 4320.382 ? 19.055 ns/op LongDivMod.testDivideUnsigned 1024 positive avgt 15 3988.953 ? 8.770 ns/op LongDivMod.testDivideUnsigned 1024 negative avgt 15 1069.703 ? 1.525 ns/op LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 5589.319 ? 4.247 ns/op LongDivMod.testRemainderUnsigned 1024 positive avgt 15 3904.555 ? 3.191 ns/op LongDivMod.testRemainderUnsigned 1024 negative avgt 15 1765.761 ? 1.539 ns/op Finished running test 'micro:java.lang.LongDivMod' with patch: Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 1999.134 ? 35.303 ns/op IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 2020.517 ? 2.988 ns/op IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2037.983 ? 4.524 ns/op IntegerDivMod.testDivideUnsigned 1024 mixed avgt 15 2053.458 ? 0.893 ns/op IntegerDivMod.testDivideUnsigned 1024 positive avgt 15 2049.918 ? 1.635 ns/op IntegerDivMod.testDivideUnsigned 1024 negative avgt 15 2050.901 ? 3.557 ns/op IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 2908.612 ? 1.366 ns/op IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 2909.734 ? 2.879 ns/op IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 2908.976 ? 1.950 ns/op Finished running test 'micro:java.lang.IntegerDivMod' Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 2647.412 ? 36.127 ns/op LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 2632.466 ? 1.573 ns/op LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2631.312 ? 2.185 ns/op LongDivMod.testDivideUnsigned 1024 mixed avgt 15 2052.435 ? 0.971 ns/op LongDivMod.testDivideUnsigned 1024 positive avgt 15 2053.224 ? 3.066 ns/op LongDivMod.testDivideUnsigned 1024 negative avgt 15 2052.801 ? 1.749 ns/op LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 2904.972 ? 3.510 ns/op LongDivMod.testRemainderUnsigned 1024 positive avgt 15 2904.937 ? 2.190 ns/op LongDivMod.testRemainderUnsigned 1024 negative avgt 15 2905.771 ? 6.689 ns/op Finished running test 'micro:java.lang.LongDivMod' ------------- Commit messages: - adds unsigned division & modulus intrinsic Changes: https://git.openjdk.org/jdk/pull/21559/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21559&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341068 Stats: 101 lines in 3 files changed: 99 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21559.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21559/head:pull/21559 PR: https://git.openjdk.org/jdk/pull/21559 From rrich at openjdk.org Thu Oct 17 09:52:12 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 17 Oct 2024 09:52:12 GMT Subject: RFR: 8342409: [s390x] C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 06:38:57 GMT, Amit Kumar wrote: > Make sure LIR_Assembler::emit_unwind_handler() jumps to the slow path directly for unlocking a synchronized method if LM_MONITOR is used. > On the fast paths assertions are added that the mode is actually handled. > > Testing: Tier1 test for fastdebug vm showed no regression. Looks good to me. Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21557#pullrequestreview-2374720766 From galder at openjdk.org Thu Oct 17 10:13:18 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 17 Oct 2024 10:13:18 GMT Subject: RFR: 8340272: C2 SuperWord: JMH benchmark for Reduction vectorization In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 07:53:40 GMT, Emanuel Peter wrote: > I'm adding some proper JMH benchmarks for vectorized reductions. There are already some others, but they are not comprehensive or not JMH. > > Plus, I wanted to do a performance-investigation, hopefully leading to some improvements. **See Future Work below**. > > **How I run my benchmarks** > > All benchmarks > `make test TEST="micro:vm.compiler.VectorReduction2" CONF=linux-x64` > > Some specific benchmark, with profiler that tells me which code snippet is hottest: > `make test TEST="micro:vm.compiler.VectorReduction2.*doubleMinDotProduct" CONF=linux-x64 MICRO="OPTIONS=-prof perfasm"` > > **JMH logs** > > Run on my AVX512 laptop, with master: > [run_avx512_master.txt](https://github.com/user-attachments/files/17025111/run_avx512_master.txt) > > Run on remote asimd (aarch64, NEON) machine: > [run_asimd_master.txt](https://github.com/user-attachments/files/17025579/run_asimd_master.txt) > > **Results** > > I ran it on 2 machines so far. Left on my AVX512 machine, right on a ASIMD/NEON/aarch64 machine. > > Here the interesting `int / long / float / double` results, discussion further below: > ![image](https://github.com/user-attachments/assets/20abfa7b-aee6-4654-bf4d-e3abc4bbfc8b) > > > And there the less spectacular `byte / char / short` results. There is no vectorization of these cases. But there seems to be some issue with over-unrolling on my AVX512 machine, one case I looked at would only unroll 4x without SuperWord, but 16x with, and that seems to be unfavourable. > > ![image](https://github.com/user-attachments/assets/6e1c69cf-db6c-4d33-8750-c8797ffc39a2) > > Here the PDF: > [benchmark_results.pdf](https://github.com/user-attachments/files/17027695/benchmark_results.pdf) > > > **Why are all the ...Simple benchmarks not vectorizing, i.e. "not profitable"?** > > Apparently, there must be sufficient "work" vectors to outweith the "reduction" vectors. > The idea used to be that one should have at least 2 work vectors which tend to be profitable, to outweigh the cost of a single reduction vector. > > // Check if reductions are connected > if (is_marked_reduction(p0)) { > Node* second_in = p0->in(2); > Node_List* second_pk = get_pack(second_in); > if ((second_pk == nullptr) || (_num_work_vecs == _num_reductions)) { > // No parent pack or not enough work > // to cover reduction expansion overhead > return false; > } else if (second_pk->size() != p->size()) { > return false; > } > } > > > But when I disable this code, then I see on the aarch64/ASIMD machine: > > VectorReduction2.NoSuperword.intAddSimpl... > @galderz You can use this JMH benchmark for your work in #20098 if you want. @eme64 Thanks for building this. I ended up creating a min/max specific benchmark in #20098. The main reason I created something different was to be able to control data in the arrays such that the branching factors could be pre-determined. The results can vary depending that. Then I used the opportunity to add both reduction and non-reduction vector benchmarks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21032#issuecomment-2419117787 From epeter at openjdk.org Thu Oct 17 11:48:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 17 Oct 2024 11:48:44 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing Message-ID: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> **Background** I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. **Details** The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. **Dealing with Overflows** We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. ------------- Commit messages: - Merge branch 'master' into JDK-8335392-MemPointer - Merge branch 'master' into JDK-8335392-MemPointer - fix build and test - add precompiled.hpp to gtest - finishing up more proofs - rm scaleL, was not even necessary! - more proof - improve the proof - move proof to hpp - first part of the proof - ... and 66 more: https://git.openjdk.org/jdk/compare/1ea1f33f...3c333baf Changes: https://git.openjdk.org/jdk/pull/19970/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335392 Stats: 2421 lines in 15 files changed: 2160 ins; 212 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From chagedorn at openjdk.org Thu Oct 17 11:48:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 17 Oct 2024 11:48:44 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 1 Jul 2024 13:32:01 GMT, Emanuel Peter wrote: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. src/hotspot/share/opto/mempointer.cpp line 114: > 112: // Decompose subtraction. > 113: Node* a = n->in((opc == Op_AddP) ? 2 : 1); > 114: Node* b = n->in((opc == Op_AddP) ? 3 : 2); rm AddP src/hotspot/share/opto/mempointer.cpp line 263: > 261: // Compute distance: > 262: NoOverflowInt distance = other.con() - con(); > 263: distance = distance.truncate_to_30_bits(); naming could be an issue, why coud it be NaN src/hotspot/share/opto/traceMergeStoresTag.hpp line 31: > 29: #include "utilities/stringUtils.hpp" > 30: > 31: namespace TraceMergeStores { can this be a class? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1718491966 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1718501402 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1718510434 From epeter at openjdk.org Thu Oct 17 11:48:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 17 Oct 2024 11:48:44 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Thu, 15 Aug 2024 14:34:18 GMT, Christian Hagedorn wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > src/hotspot/share/opto/mempointer.cpp line 114: > >> 112: // Decompose subtraction. >> 113: Node* a = n->in((opc == Op_AddP) ? 2 : 1); >> 114: Node* b = n->in((opc == Op_AddP) ? 3 : 2); > > rm AddP Good catch, will remove it here. > src/hotspot/share/opto/mempointer.cpp line 263: > >> 261: // Compute distance: >> 262: NoOverflowInt distance = other.con() - con(); >> 263: distance = distance.truncate_to_30_bits(); > > naming could be an issue, why coud it be NaN Ok, I now replaced it with a `bool` method `is_abs_less_than_2_to_30`. I think that is more clear. > src/hotspot/share/opto/traceMergeStoresTag.hpp line 31: > >> 29: #include "utilities/stringUtils.hpp" >> 30: >> 31: namespace TraceMergeStores { > > can this be a class? Not easily. Without the namespace, I get a clash with `traceAutoVectorization.hpp` names. A class could work, but then I need to initialize the arrays elsewhere - probably in a new cpp file. Annoying. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1718560644 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1718587908 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1718601006 From chagedorn at openjdk.org Thu Oct 17 11:57:29 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 17 Oct 2024 11:57:29 GMT Subject: RFR: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs Message-ID: ### Assertion Predicates Have the True Projection on the Success Path By design, Template and Initialized Assertion Predicates have the true projection on the success path and false projection on the failing path. ### Is a Node a Template Assertion Predicate? Template Assertion Predicates can have an uncommon trap on or a `Halt` node following the failing/false projection. When trying to find out if a node is a Template Assertion Predicate, we call `TemplateAssertionPredicate::is_predicate()` which checks if the provided node is the success projection of a Template Assertion Predicate. This involves checking if the other projection has a `Halt` node or an UCT (L141): https://github.com/openjdk/jdk/blob/7a64fbbb9292f4d65a6970206dec1a7d7645046b/src/hotspot/share/opto/predicates.cpp#L135-L144 ### New `PredicateIterator` Class [JDK-8340786](https://bugs.openjdk.org/browse/JDK-8340786) introduced a new `PredicateIterator` class to simplify iteration over predicates and replaced code that was doing some custom iteration. #### Usual Usage Normally, we always start from a loop entry and follow the predicates (i.e. reaching a predicate over its success path). #### Special Usage However, I also replaced the predicate skipping in `PhaseIdealLoop::build_loop_late_post_work()` with the new `PredicateIterator` class which could start at any node, including a failing path false projection. ### Problem: Two Uncommon Traps for a Template Assertion Predicate The fuzzer now found a case where a Template Assertion Predicate has a predicate UCT on the failing **and** the success path due to folding away some dead nodes: ![image](https://github.com/user-attachments/assets/c2a9395e-9eaf-46a0-b978-8847d8e21945) In the Special Usage mentioned above, we could be using the `PredicateIterator` with `505 IfFalse` as starting node. In the core method `RegularPredicateBlockIterator::for_each()` for the traversal, we call the following with `current` = `505 IfFalse`: https://github.com/openjdk/jdk/blob/1ea1f33f66326804ca2892fe0659a9acb7ee72ae/src/hotspot/share/opto/predicates.hpp#L628-L629 `TemplateAssertionPredicate::is_predicate()` succeeds because we have a valid Template Assertion Predicate If node and the other projection `504 IfTrue` has a predicate UCT. We then fail when trying to convert the `IfFalse` to an `IfTrue` with `current->as_IfTrue()` on L629. ### Solution The fix is straight forward: `TemplateAssertionPredicate::is_predicate()` (and `InitiliazedAssertionPredicate::is_predicate()`) should additionally check, if the provided node as an `IfTrue` projection which by definition is the success projection. This avoids to wrongly match a failing path projection as success projection. ### Additional Tweaks - Probably Not Worth to Do Separatly - While working on this, I noticed that predicate iteration can be limited if `UseLoopPredicate/UseProfiledLoopPredicate` is disabled. - Made `AssertionPredicatesWithHalt` work with only non-null nodes and moved the null check to the only usage from `CountedLoopNode::skip_assertion_predicates_with_halt()`. There it could be possible, that we have a dying `CountedLoopNode` where the entry control was already replaced with null. Thanks, Christian ------------- Commit messages: - 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs Changes: https://git.openjdk.org/jdk/pull/21561/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21561&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342287 Stats: 88 lines in 4 files changed: 82 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21561/head:pull/21561 PR: https://git.openjdk.org/jdk/pull/21561 From aph at openjdk.org Thu Oct 17 12:56:11 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 17 Oct 2024 12:56:11 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 09:45:19 GMT, Amit Kumar wrote: > Add match rules for UDivI, UModI, UDivL, UModL. And also adds `dlr` and `dlgr` instruction. > > Tier1 test are clean for fastdebug vm; > > Before this patch, `compiler/c2/TestDivModNodes.java` was failing (see jbs issue) but with this patch test is passing. > > Without Patch: > > > Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units > IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 1935.176 ? 2.191 ns/op > IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 1934.915 ? 3.207 ns/op > IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 1934.325 ? 1.108 ns/op > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 15 1809.782 ? 49.341 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 15 1769.326 ? 2.607 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 15 1784.053 ? 71.190 ns/op > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 2026.978 ? 1.534 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 2028.039 ? 3.812 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 2437.843 ? 636.808 ns/op > Finished running test 'micro:java.lang.IntegerDivMod' > > > Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units > LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 4524.897 ? 16.566 ns/op > LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 4373.714 ? 9.514 ns/op > LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2018.309 ? 1.788 ns/op > LongDivMod.testDivideUnsigned 1024 mixed avgt 15 4320.382 ? 19.055 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 15 3988.953 ? 8.770 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 15 1069.703 ? 1.525 ns/op > LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 5589.319 ? 4.247 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 15 3904.555 ? 3.191 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 15 1765.761 ? 1.539 ns/op > Finished ... src/hotspot/cpu/s390/s390.ad line 6258: > 6256: // Unsigned Integer Register Division > 6257: // NOTE: z_dlr requires even-odd pair. remainder will be in even register(r4) & quotient will be stored in odd register(r5) > 6258: // for dividend, leftmost 32bits will be in r4 and rightmost 32bits will be in r5 register. Suggestion: // for dividend, upper 32bits will be in r4 and lower 32bits will be in r5 register. This PR uses the terms "leftmost" and "rightmost" for numerically higher and lower parts of an integer. I wondered if this perhaps was common in s390.ad, but it is not. Please say "higher" and "lower". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21559#discussion_r1804731105 From aph at openjdk.org Thu Oct 17 12:56:11 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 17 Oct 2024 12:56:11 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 12:52:26 GMT, Andrew Haley wrote: >> Add match rules for UDivI, UModI, UDivL, UModL. And also adds `dlr` and `dlgr` instruction. >> >> Tier1 test are clean for fastdebug vm; >> >> Before this patch, `compiler/c2/TestDivModNodes.java` was failing (see jbs issue) but with this patch test is passing. >> >> Without Patch: >> >> >> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units >> IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 1935.176 ? 2.191 ns/op >> IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 1934.915 ? 3.207 ns/op >> IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 1934.325 ? 1.108 ns/op >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 15 1809.782 ? 49.341 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 15 1769.326 ? 2.607 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 15 1784.053 ? 71.190 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 2026.978 ? 1.534 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 2028.039 ? 3.812 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 2437.843 ? 636.808 ns/op >> Finished running test 'micro:java.lang.IntegerDivMod' >> >> >> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units >> LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 4524.897 ? 16.566 ns/op >> LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 4373.714 ? 9.514 ns/op >> LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2018.309 ? 1.788 ns/op >> LongDivMod.testDivideUnsigned 1024 mixed avgt 15 4320.382 ? 19.055 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 15 3988.953 ? 8.770 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 15 1069.703 ? 1.525 ns/op >> LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 5589.319 ? 4.247 ns/op >> LongDivMod.testRemainderUnsigned 1024 positive avgt 15 3904.555 ? 3.191 ns/op >> LongDivMod.testRemainderUnsigned 10... > > src/hotspot/cpu/s390/s390.ad line 6258: > >> 6256: // Unsigned Integer Register Division >> 6257: // NOTE: z_dlr requires even-odd pair. remainder will be in even register(r4) & quotient will be stored in odd register(r5) >> 6258: // for dividend, leftmost 32bits will be in r4 and rightmost 32bits will be in r5 register. > > Suggestion: > > // for dividend, upper 32bits will be in r4 and lower 32bits will be in r5 register. > > > This PR uses the terms "leftmost" and "rightmost" for numerically higher and lower parts of an integer. I wondered if this perhaps was common in s390.ad, but it is not. Please say "higher" and "lower". Or even "low" and "high". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21559#discussion_r1804733207 From amitkumar at openjdk.org Thu Oct 17 13:07:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 17 Oct 2024 13:07:49 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v2] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 12:53:47 GMT, Andrew Haley wrote: >> src/hotspot/cpu/s390/s390.ad line 6258: >> >>> 6256: // Unsigned Integer Register Division >>> 6257: // NOTE: z_dlr requires even-odd pair. remainder will be in even register(r4) & quotient will be stored in odd register(r5) >>> 6258: // for dividend, leftmost 32bits will be in r4 and rightmost 32bits will be in r5 register. >> >> Suggestion: >> >> // for dividend, upper 32bits will be in r4 and lower 32bits will be in r5 register. >> >> >> This PR uses the terms "leftmost" and "rightmost" for numerically higher and lower parts of an integer. I wondered if this perhaps was common in s390.ad, but it is not. Please say "higher" and "lower". > > Or even "low" and "high". I followed the PofZ book and took "leftmost" and "rightmost" from it. But now I have updated it to "upper" and "lower" that feels more intuitive. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21559#discussion_r1804749927 From amitkumar at openjdk.org Thu Oct 17 13:07:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 17 Oct 2024 13:07:49 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v2] In-Reply-To: References: Message-ID: > Add match rules for UDivI, UModI, UDivL, UModL. And also adds `dlr` and `dlgr` instruction. > > Tier1 test are clean for fastdebug vm; > > Before this patch, `compiler/c2/TestDivModNodes.java` was failing (see jbs issue) but with this patch test is passing. > > Without Patch: > > > Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units > IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 1935.176 ? 2.191 ns/op > IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 1934.915 ? 3.207 ns/op > IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 1934.325 ? 1.108 ns/op > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 15 1809.782 ? 49.341 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 15 1769.326 ? 2.607 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 15 1784.053 ? 71.190 ns/op > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 2026.978 ? 1.534 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 2028.039 ? 3.812 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 2437.843 ? 636.808 ns/op > Finished running test 'micro:java.lang.IntegerDivMod' > > > Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units > LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 4524.897 ? 16.566 ns/op > LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 4373.714 ? 9.514 ns/op > LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2018.309 ? 1.788 ns/op > LongDivMod.testDivideUnsigned 1024 mixed avgt 15 4320.382 ? 19.055 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 15 3988.953 ? 8.770 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 15 1069.703 ? 1.525 ns/op > LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 5589.319 ? 4.247 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 15 3904.555 ? 3.191 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 15 1765.761 ? 1.539 ns/op > Finished ... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: changes leftmost->upper and rightmost -> lower ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21559/files - new: https://git.openjdk.org/jdk/pull/21559/files/b39788e2..c94cc448 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21559&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21559&range=00-01 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21559.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21559/head:pull/21559 PR: https://git.openjdk.org/jdk/pull/21559 From roland at openjdk.org Thu Oct 17 13:17:46 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 17 Oct 2024 13:17:46 GMT Subject: RFR: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress Message-ID: The reason for the crash is that compiled code reads from an object that's null. All field loads from an object are guarded by a null check. Where is the null check in that case? After the field load: 0x00007ffaac91261f: 44 8B 69 0C mov r13d, dword ptr [rcx + 0xc] <- field load 0x00007ffaac912623: 85 C9 test ecx, ecx <- null check (oops!) 0x00007ffaac912625: 74 5C je 0x7ffaac912683 When the IR graph is constructed for the test case, the field load is correctly made dependent on the null check (through a `CastPP` node) but then something happens that's shenandoah specific and that causes the field load to become dependent on another check so it can execute before the null check. There are several load barriers involved in the process. One of them is expanded at the null check projection. In the process, control for the nodes that are control dependent on the null check is updated to be the region at the end of the just expanded barrier. The `CastPP` node for the null check gets the `Region` as new control. Another barrier is expanded right after that one. The 2 are back to back. They are merged. The `Region` that the `CastPP` depends on goes away, the `CastPP` is cloned in both branches at the `Region` and one of them becomes control dependent on the heap stable test of the first expanded barrier. At this point, one of the `CastPP` is control dependent on a heap stable test that's after the null check. But then, the heap stable test is moved out of loop and 2 copies of the loop are made so one can run without any overhead from barriers. When that happens, the `CastPP` becomes dependent on a test that dominates the null check and so the field load that depends on the `CastPP` can be scheduled before the null check. The fix I propose is not update the control when the barrier is expanded for nodes that can float when the test they depend on moves. This way the `CastPP` remains dependent on the null check. ------------- Commit messages: - more - fix & test Changes: https://git.openjdk.org/jdk/pull/21562/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21562&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342496 Stats: 70 lines in 2 files changed: 70 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21562.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21562/head:pull/21562 PR: https://git.openjdk.org/jdk/pull/21562 From shade at openjdk.org Thu Oct 17 13:43:15 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 17 Oct 2024 13:43:15 GMT Subject: RFR: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 13:12:11 GMT, Roland Westrelin wrote: > The reason for the crash is that compiled code reads from an object > that's null. All field loads from an object are guarded by a null > check. Where is the null check in that case? After the field load: > > > 0x00007ffaac91261f: 44 8B 69 0C mov r13d, dword ptr [rcx + 0xc] <- field load > 0x00007ffaac912623: 85 C9 test ecx, ecx <- null check (oops!) > 0x00007ffaac912625: 74 5C je 0x7ffaac912683 > > > When the IR graph is constructed for the test case, the field load is > correctly made dependent on the null check (through a `CastPP` node) > but then something happens that's shenandoah specific and that causes > the field load to become dependent on another check so it can execute > before the null check. > > There are several load barriers involved in the process. One of them > is expanded at the null check projection. In the process, control for > the nodes that are control dependent on the null check is updated to > be the region at the end of the just expanded barrier. The `CastPP` > node for the null check gets the `Region` as new control. > > Another barrier is expanded right after that one. The 2 are back to > back. They are merged. The `Region` that the `CastPP` depends on goes > away, the `CastPP` is cloned in both branches at the `Region` and one > of them becomes control dependent on the heap stable test of the first > expanded barrier. At this point, one of the `CastPP` is control > dependent on a heap stable test that's after the null check. But then, > the heap stable test is moved out of loop and 2 copies of the loop are > made so one can run without any overhead from barriers. When that > happens, the `CastPP` becomes dependent on a test that dominates the > null check and so the field load that depends on the `CastPP` can be > scheduled before the null check. > > The fix I propose is not update the control when the barrier is > expanded for nodes that can float when the test they depend on > moves. This way the `CastPP` remains dependent on the null check. Thanks! I am running tests with this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21562#issuecomment-2419581013 From jbhateja at openjdk.org Thu Oct 17 13:44:26 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 13:44:26 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v4] In-Reply-To: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> References: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> Message-ID: On Wed, 16 Oct 2024 21:13:57 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into apx-test-tool > - Add comment and defined > - Add copyright header > - Remove tab > - Remove whitespace > - Replace whitespace with tab > - Add flag before testing > - Fix assertion error on MacOS > - Add _LP64 flag > - Add missing header > - ... and 6 more: https://git.openjdk.org/jdk/compare/cbb32376...ca48f240 test/hotspot/gtest/x86/asmtest.out.h line 1: > 1: // BEGIN Generated code -- do not edit All the memory operand instructions being validated are checking for only one kind of memory addressing mode which is `- BASE + INDEX` We should also check for following flavors for at least some instructions :- - BASE - INDEX * SCALE + DISPLACEMENT - BASE + INDEX + DISPLACEMENT - BASE + INDEX * SCALE + DISPLACEMENT Where BASE and INDEX are EGPRs. test/hotspot/gtest/x86/asmtest.out.h line 1: > 1: // BEGIN Generated code -- do not edit Can you also emit the instruction IDs in the comments against each row in insns_strs and insns_lens tables, it e.g. // Generated by x86-asmtest.py __ shldl(rcx, rdx); // {load}shld ecx, edx IID0 __ shldl(rdx, rbx); // {load}shld edx, ebx IID1 ...... ..... static const uint8_t insns[] = { 0x0f, 0xa5, 0xd1, // IID0 0x0f, 0xa5, 0xda, // IID1 ... static const unsigned int insns_lens[] = { 3, // IID0 3, // IID1 #ifdef _LP64 ...... static const char* insns_strs[] = { "__ shldl(rcx, rdx);", // IID0 "__ shldl(rdx, rbx);", // IID1 #ifdef _LP64 It will ease correlating and manually inspecting these statically emitted tables. test/hotspot/gtest/x86/test_assemblerx86.cpp line 44: > 42: // Different encoding for GCC and OpenJDK > 43: {"shll", {'\xd3', '\xd1'}}, > 44: {"shlq", {'\xd3', '\xd1'}}, For the record. // For single register operand salq, C2 assumes shift will be passed through CL register and emits the encoding with opcode set to 'D3". void Assembler::shlq(Register dst) { int encode = prefixq_and_encode(dst->encoding()); emit_int16((unsigned char)0xD3, (0xE0 | encode)); } // With immediate shift operand we explicitly handle special case of shift by '1' bit... and emit D1 opcode. void Assembler::shlq(Register dst, int imm8) { assert(isShiftCount(imm8 >> 1), "illegal shift count"); int encode = prefixq_and_encode(dst->encoding()); if (imm8 == 1) { emit_int16((unsigned char)0xD1, (0xE0 | encode)); } else { emit_int24((unsigned char)0xC1, (0xE0 | encode), imm8); } } So, GCC toolchain is following a different convention than C2, but both are emitting correct encodings. Our test infrastructure is biased toward C2 and hence does not comply with GCC encoding, thus we are skipping over following cases. Please see below a small inline assembly snippet and its corresponding encoding. void micro(){ asm volatile( "shlq $1, %%r11 \n\t" [InstID 1] "shlq %%r11 \n\t" [InstID 2] "shlq %%cl, %%r11 \n\t" [InstID 3] : : : "%r11", "%rcx" ); } CPROMPT>objdump -D shl.o Disassembly of section .text: 0000000000000000 : 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 49 d1 e3 shl $1,%r11 [InstID 1] 7: 49 d1 e3 shl $1,%r11 [InstID 2] a: 49 d3 e3 shl %cl,%r1 1 [InstID 3] test/hotspot/gtest/x86/test_assemblerx86.cpp line 99: > 97: asm_check((const uint8_t *)entry, (const uint8_t *)insns, insns_lens, insns_strs, sizeof(insns_lens) / sizeof(insns_lens[0])); > 98: BufferBlob::free(b); > 99: } Following MAP0 and MAP1 instructions are missing :- bsfl bsfq bsrl bsrq bswapl bswapq btq call cmpb cmpl cmpq cmpw cmpxchgb cmpxchgl cmpxchgq cmpxchgw cvttsd2siq incl incq lea leal leaq mov mov64 movb movl movq movsbl movsbq movslq movswl movswq movw movzbl movzbq movzwl movzwq orw sall salq testb testl testq xaddb xaddl xaddq xaddw xchgb xchgl xchgq xchgw But, given that all assembly routines share same leaf level prefix emitting routines, we can skip them for the time being or add validate just one from each row Please do add following new MAP4 APX instructions since you are already taking care of their two operand counterparts with PPX. 1. popp 2. pushp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1804498306 PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1804565059 PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1804651875 PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1804707138 From roland at openjdk.org Thu Oct 17 14:08:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 17 Oct 2024 14:08:30 GMT Subject: RFR: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit Message-ID: That assert checks that during RC elimination, we have either: - not updated the limit of the main loop - or that the new limit is at the expected control The assert fires because the limit was updated but is not at the expected control. That happens because `new_limit_ctrl` is updated for a test that it attempts to eliminate before it actually proceeds with the elimination: if the test can't be eliminated, `new_limit_ctrl` gets updated anyway. While the assert could, maybe, be relaxed (it fires in this case but nothing is going wrong), it's better, I think, to simply not uselessly restrict the control of the limit. ------------- Commit messages: - more - more - more Changes: https://git.openjdk.org/jdk/pull/21564/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21564&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341407 Stats: 73 lines in 2 files changed: 64 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21564.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21564/head:pull/21564 PR: https://git.openjdk.org/jdk/pull/21564 From thartmann at openjdk.org Thu Oct 17 14:13:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Oct 2024 14:13:28 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v5] In-Reply-To: References: Message-ID: > After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 > > Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 > > -> > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 > > The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. > > Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: > https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 > > I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. > > Thanks, > Tobias Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge with master - More cleanups - Modified ciTypeFlow::can_trap - Missed a return - First prototype Fix ------------- Changes: https://git.openjdk.org/jdk/pull/21470/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=04 Stats: 135 lines in 5 files changed: 111 ins; 18 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21470/head:pull/21470 PR: https://git.openjdk.org/jdk/pull/21470 From roland at openjdk.org Thu Oct 17 14:54:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 17 Oct 2024 14:54:51 GMT Subject: RFR: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit [v2] In-Reply-To: References: Message-ID: > That assert checks that during RC elimination, we have either: > > - not updated the limit of the main loop > > - or that the new limit is at the expected control > > The assert fires because the limit was updated but is not at the > expected control. That happens because `new_limit_ctrl` is updated for > a test that it attempts to eliminate before it actually proceeds with > the elimination: if the test can't be eliminated, `new_limit_ctrl` > gets updated anyway. > > While the assert could, maybe, be relaxed (it fires in this case but > nothing is going wrong), it's better, I think, to simply not uselessly > restrict the control of the limit. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: fix & test ------------- Changes: https://git.openjdk.org/jdk/pull/21564/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21564&range=01 Stats: 72 lines in 2 files changed: 64 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21564.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21564/head:pull/21564 PR: https://git.openjdk.org/jdk/pull/21564 From jbhateja at openjdk.org Thu Oct 17 15:33:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 15:33:38 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v26] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 - New IR tests + additional IR transformations - Update adlc changes. - Merge branch 'JDK-8338201' of http://github.com/jatin-bhateja/jdk into JDK-8338201 - Update VectorMath.java - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 - Typographical error fixups - Doc fixups - Typographic error - Merge stashing and re-commit - ... and 19 more: https://git.openjdk.org/jdk/compare/44151f47...c92084d9 ------------- Changes: https://git.openjdk.org/jdk/pull/20507/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=25 Stats: 10221 lines in 55 files changed: 9784 ins; 27 del; 410 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Thu Oct 17 15:45:19 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 15:45:19 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v25] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 19:44:08 GMT, Paul Sandoz wrote: > Rather than adding more IR test functionality to this PR that requires additional review my recommendation would be to follow up in another PR or before hand rethink our approach. Agree, I am thinking of developing an automated IR validation infrastructure for all vector API operations, till then and for the sake of completeness of this patch we can let newly created IR based tests be part of this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2419895522 From sviswanathan at openjdk.org Thu Oct 17 16:19:22 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 17 Oct 2024 16:19:22 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v25] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 15:41:58 GMT, Jatin Bhateja wrote: > > Rather than adding more IR test functionality to this PR that requires additional review my recommendation would be to follow up in another PR or before hand rethink our approach. > > Agree, I am thinking of developing an automated IR validation infrastructure for all vector API operations, till then and for the sake of completeness of this patch we can let newly created IR based tests be part of this PR. @jatin-bhateja I agree with Paul, it would be good to remove the newly added IR test changes from this PR to reduce the load on reviewers. You can always send it as a separate PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2419973468 From sviswanathan at openjdk.org Thu Oct 17 16:25:21 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 17 Oct 2024 16:25:21 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> Message-ID: On Tue, 15 Oct 2024 07:02:00 GMT, Emanuel Peter wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test case > > Thanks for the updates! It looks good to me now. > > I have one more wish: > Could you allow to run the test on all platforms please? > `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java` > > Currently, it only runs on selected platforms, see `@requires`. We should really run this on all platforms and compilers. The IR rules can be limited to platforms and features. That ensures that we do not have similar bugs elsewhere, and our test-coverage would be increased. > That looks good to me. @eme64 should have a look as well. > > I submitted testing and will report back once it passed. @TobiHartmann Please do let me know if the testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21480#issuecomment-2419985872 From jbhateja at openjdk.org Thu Oct 17 16:25:34 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 16:25:34 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v27] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 - Prod build fix - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 - New IR tests + additional IR transformations - Update adlc changes. - Merge branch 'JDK-8338201' of http://github.com/jatin-bhateja/jdk into JDK-8338201 - Update VectorMath.java - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 - Typographical error fixups - Doc fixups - ... and 21 more: https://git.openjdk.org/jdk/compare/236c71ca...d9a379b2 ------------- Changes: https://git.openjdk.org/jdk/pull/20507/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=26 Stats: 10219 lines in 55 files changed: 9784 ins; 28 del; 407 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From kvn at openjdk.org Thu Oct 17 16:58:59 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 17 Oct 2024 16:58:59 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v5] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:13:28 GMT, Tobias Hartmann wrote: >> After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 >> >> Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 >> >> -> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 >> >> The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. >> >> Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: >> https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 >> >> I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge with master > - More cleanups > - Modified ciTypeFlow::can_trap > - Missed a return > - First prototype > > Fix Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21470#pullrequestreview-2375788874 From kvn at openjdk.org Thu Oct 17 16:58:59 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 17 Oct 2024 16:58:59 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 08:04:54 GMT, Tobias Hartmann wrote: >> src/hotspot/share/ci/ciTypeFlow.cpp line 2220: >> >>> 2218: case Bytecodes::_ldc_w: >>> 2219: case Bytecodes::_ldc2_w: >>> 2220: return str.is_in_error() || !str.get_constant().is_loaded(); >> >> There is also `con.is_valid()` check in `do_ldc()`. But I do know what memory is referenced in "OutOfMemoryError in the CI while loading a String constant" when it is invalid. > > But in that case no exception is installed and we bail out from compilation, right? > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciTypeFlow.cpp#L746 Right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21470#discussion_r1805115873 From kvn at openjdk.org Thu Oct 17 17:20:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 17 Oct 2024 17:20:52 GMT Subject: RFR: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 11:52:02 GMT, Christian Hagedorn wrote: > ### Assertion Predicates Have the True Projection on the Success Path > By design, Template and Initialized Assertion Predicates have the true projection on the success path and false projection on the failing path. > > ### Is a Node a Template Assertion Predicate? > Template Assertion Predicates can have an uncommon trap on or a `Halt` node following the failing/false projection. When trying to find out if a node is a Template Assertion Predicate, we call `TemplateAssertionPredicate::is_predicate()` which checks if the provided node is the success projection of a Template Assertion Predicate. This involves checking if the other projection has a `Halt` node or an UCT (L141): > https://github.com/openjdk/jdk/blob/7a64fbbb9292f4d65a6970206dec1a7d7645046b/src/hotspot/share/opto/predicates.cpp#L135-L144 > > ### New `PredicateIterator` Class > > [JDK-8340786](https://bugs.openjdk.org/browse/JDK-8340786) introduced a new `PredicateIterator` class to simplify iteration over predicates and replaced code that was doing some custom iteration. > > #### Usual Usage > Normally, we always start from a loop entry and follow the predicates (i.e. reaching a predicate over its success path). > > #### Special Usage > However, I also replaced the predicate skipping in `PhaseIdealLoop::build_loop_late_post_work()` with the new `PredicateIterator` class which could start at any node, including a failing path false projection. > > ### Problem: Two Uncommon Traps for a Template Assertion Predicate > The fuzzer now found a case where a Template Assertion Predicate has a predicate UCT on the failing **and** the success path due to folding away some dead nodes: > > ![image](https://github.com/user-attachments/assets/c2a9395e-9eaf-46a0-b978-8847d8e21945) > > In the Special Usage mentioned above, we could be using the `PredicateIterator` with `505 IfFalse` as starting node. In the core method `RegularPredicateBlockIterator::for_each()` for the traversal, we call the following with `current` = `505 IfFalse`: > https://github.com/openjdk/jdk/blob/1ea1f33f66326804ca2892fe0659a9acb7ee72ae/src/hotspot/share/opto/predicates.hpp#L628-L629 > `TemplateAssertionPredicate::is_predicate()` succeeds because we have a valid Template Assertion Predicate If node and the other projection `504 IfTrue` has a predicate UCT. We then fail when trying to convert the `IfFalse` to an `IfTrue` with `current->as_IfTrue()` on L629. > > ### Solution > The fix is straight forward: `TemplateAssertionPredicate::is_predicate()` (and `InitiliazedAssertionPredi... src/hotspot/share/opto/predicates.cpp line 136: > 134: // An Assertion Predicate has always a true projection on the success path. > 135: bool AssertionPredicate::may_be_predicate_if(const Node* node) { > 136: return node->is_IfTrue() && RegularPredicate::may_be_predicate_if(node->as_IfProj()); Can you add `assert(node != nullptr` here in case this method will be used in other places? src/hotspot/share/opto/predicates.hpp line 377: > 375: static bool may_be_predicate_if(const Node* node); > 376: }; > 377: Do you really need separate class for one static method? Can the method be a local static in `predicates.cpp` file? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21561#discussion_r1805130461 PR Review Comment: https://git.openjdk.org/jdk/pull/21561#discussion_r1805132161 From jbhateja at openjdk.org Thu Oct 17 18:26:03 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 18:26:03 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v28] In-Reply-To: References: Message-ID: <-wd5wj8QkFZ6vORqWPFZdV_CYCQl2Y7zPNSdW_luNSY=.3c848ace-6cdd-40a5-b924-8feef4346e2d@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Restrict IR validation to newly added UMin/UMax transforms. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/d9a379b2..2b0fa016 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=26-27 Stats: 1373 lines in 3 files changed: 507 ins; 866 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Thu Oct 17 18:26:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 18:26:04 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v25] In-Reply-To: References: Message-ID: <28MMdFkPrGQkT3nD8dUA2GAx6uZyRwERWn3bVvEPSR8=.79c2b5e5-7317-47c5-8878-30c4ac3171ab@github.com> On Thu, 17 Oct 2024 16:17:02 GMT, Sandhya Viswanathan wrote: > > > Rather than adding more IR test functionality to this PR that requires additional review my recommendation would be to follow up in another PR or before hand rethink our approach. > > > > > > Agree, I am thinking of developing an automated IR validation infrastructure for all vector API operations, till then and for the sake of completeness of this patch we can let newly created IR based tests be part of this PR. > > @jatin-bhateja I agree with Paul, it would be good to remove the newly added IR test changes from this PR to reduce the load on reviewers. You can always send it as a separate PR. I have restricted the IR validation to just newly added UMin/UMax transformations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2420238761 From jbhateja at openjdk.org Thu Oct 17 18:32:53 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 18:32:53 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v29] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Remove Saturating IRNode patterns. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/2b0fa016..3ee0de07 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=27-28 Stats: 40 lines in 1 file changed: 0 ins; 40 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Thu Oct 17 19:43:21 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 19:43:21 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v3] In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: <0DYInN_-F8fysBMmkpd_MNYVqRE6gPeVqqsphPdKneQ=.6cc6ca2d-5d0b-4e20-a291-b5ba3f9f8717@github.com> > This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. > > > MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) > MulL (URShift SRC1 , 32) (URShift SRC2, 32) > MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) > MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 1299.407 ops/ms > VectorXXH3HashingB... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'JDK-8341137' of http://github.com/jatin-bhateja/jdk into JDK-8341137 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 - Review resoultions - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction ------------- Changes: https://git.openjdk.org/jdk/pull/21244/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=02 Stats: 349 lines in 12 files changed: 338 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From jbhateja at openjdk.org Thu Oct 17 19:43:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 19:43:22 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Tue, 15 Oct 2024 00:28:25 GMT, Vladimir Ivanov wrote: > MulVL (VectorCastI2X src1) (VectorCastI2X src2 It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, thus we may not be able to neglect partial products of upper doublewords while performing 64x64 bit multiplication. Existing patterns guarantees clearing of upper double words thereby result computation only depends on lower doubleword multiplication. > Personally, I think this optimization is not essential, so we should proceed with introducing lowering first, then add this transformation to that phase, instead of trying to integrate this transformation then refactor it into phase lowering, which seems like a net extra step. I think we should not block inflight patches in anticipation of new refactoring. We can always tune it later. > I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? > > About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) It will be good to float an RFP with some use-cases upfront before development. As @jaskarth pointed out some vectorization improvements. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420384086 From jbhateja at openjdk.org Thu Oct 17 19:43:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 17 Oct 2024 19:43:24 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Mon, 14 Oct 2024 21:26:41 GMT, Jasmine Karthikeyan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 >> - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction > > src/hotspot/share/opto/vectornode.cpp line 2124: > >> 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> 2123: if ((is_lower_double_word_and_mask_op(in(1)) || >> 2124: is_lower_double_word_and_mask_op(in(1)) || > > `is_lower_double_word_and_mask_op(in(1)) || is_lower_double_word_and_mask_op(in(1))` is redundant, right? Shouldn't you only need it once? Same for the other 3 calls, which are similarly repeated. Ah, these harmless cunning typos :-), but we should not rely on c-compiler's short circuiting. > test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 41: > >> 39: */ >> 40: >> 41: public class VectorMultiplyOpt { > > Could it be possible to also do IR verification in this test? It would be good to check that we don't generate `AndVL` or `URShiftVL` with this transform. We do need those nodes to chop off the upper double words of quadword lanes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805324915 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805324796 From chagedorn at openjdk.org Thu Oct 17 21:12:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 17 Oct 2024 21:12:28 GMT Subject: RFR: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 11:52:02 GMT, Christian Hagedorn wrote: > ### Assertion Predicates Have the True Projection on the Success Path > By design, Template and Initialized Assertion Predicates have the true projection on the success path and false projection on the failing path. > > ### Is a Node a Template Assertion Predicate? > Template Assertion Predicates can have an uncommon trap on or a `Halt` node following the failing/false projection. When trying to find out if a node is a Template Assertion Predicate, we call `TemplateAssertionPredicate::is_predicate()` which checks if the provided node is the success projection of a Template Assertion Predicate. This involves checking if the other projection has a `Halt` node or an UCT (L141): > https://github.com/openjdk/jdk/blob/7a64fbbb9292f4d65a6970206dec1a7d7645046b/src/hotspot/share/opto/predicates.cpp#L135-L144 > > ### New `PredicateIterator` Class > > [JDK-8340786](https://bugs.openjdk.org/browse/JDK-8340786) introduced a new `PredicateIterator` class to simplify iteration over predicates and replaced code that was doing some custom iteration. > > #### Usual Usage > Normally, we always start from a loop entry and follow the predicates (i.e. reaching a predicate over its success path). > > #### Special Usage > However, I also replaced the predicate skipping in `PhaseIdealLoop::build_loop_late_post_work()` with the new `PredicateIterator` class which could start at any node, including a failing path false projection. > > ### Problem: Two Uncommon Traps for a Template Assertion Predicate > The fuzzer now found a case where a Template Assertion Predicate has a predicate UCT on the failing **and** the success path due to folding away some dead nodes: > > ![image](https://github.com/user-attachments/assets/c2a9395e-9eaf-46a0-b978-8847d8e21945) > > In the Special Usage mentioned above, we could be using the `PredicateIterator` with `505 IfFalse` as starting node. In the core method `RegularPredicateBlockIterator::for_each()` for the traversal, we call the following with `current` = `505 IfFalse`: > https://github.com/openjdk/jdk/blob/1ea1f33f66326804ca2892fe0659a9acb7ee72ae/src/hotspot/share/opto/predicates.hpp#L628-L629 > `TemplateAssertionPredicate::is_predicate()` succeeds because we have a valid Template Assertion Predicate If node and the other projection `504 IfTrue` has a predicate UCT. We then fail when trying to convert the `IfFalse` to an `IfTrue` with `current->as_IfTrue()` on L629. > > ### Solution > The fix is straight forward: `TemplateAssertionPredicate::is_predicate()` (and `InitiliazedAssertionPredi... Thanks Vladimir for your review. I've pushed an update with your suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21561#issuecomment-2420578187 From chagedorn at openjdk.org Thu Oct 17 21:12:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 17 Oct 2024 21:12:28 GMT Subject: RFR: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs [v2] In-Reply-To: References: Message-ID: > ### Assertion Predicates Have the True Projection on the Success Path > By design, Template and Initialized Assertion Predicates have the true projection on the success path and false projection on the failing path. > > ### Is a Node a Template Assertion Predicate? > Template Assertion Predicates can have an uncommon trap on or a `Halt` node following the failing/false projection. When trying to find out if a node is a Template Assertion Predicate, we call `TemplateAssertionPredicate::is_predicate()` which checks if the provided node is the success projection of a Template Assertion Predicate. This involves checking if the other projection has a `Halt` node or an UCT (L141): > https://github.com/openjdk/jdk/blob/7a64fbbb9292f4d65a6970206dec1a7d7645046b/src/hotspot/share/opto/predicates.cpp#L135-L144 > > ### New `PredicateIterator` Class > > [JDK-8340786](https://bugs.openjdk.org/browse/JDK-8340786) introduced a new `PredicateIterator` class to simplify iteration over predicates and replaced code that was doing some custom iteration. > > #### Usual Usage > Normally, we always start from a loop entry and follow the predicates (i.e. reaching a predicate over its success path). > > #### Special Usage > However, I also replaced the predicate skipping in `PhaseIdealLoop::build_loop_late_post_work()` with the new `PredicateIterator` class which could start at any node, including a failing path false projection. > > ### Problem: Two Uncommon Traps for a Template Assertion Predicate > The fuzzer now found a case where a Template Assertion Predicate has a predicate UCT on the failing **and** the success path due to folding away some dead nodes: > > ![image](https://github.com/user-attachments/assets/c2a9395e-9eaf-46a0-b978-8847d8e21945) > > In the Special Usage mentioned above, we could be using the `PredicateIterator` with `505 IfFalse` as starting node. In the core method `RegularPredicateBlockIterator::for_each()` for the traversal, we call the following with `current` = `505 IfFalse`: > https://github.com/openjdk/jdk/blob/1ea1f33f66326804ca2892fe0659a9acb7ee72ae/src/hotspot/share/opto/predicates.hpp#L628-L629 > `TemplateAssertionPredicate::is_predicate()` succeeds because we have a valid Template Assertion Predicate If node and the other projection `504 IfTrue` has a predicate UCT. We then fail when trying to convert the `IfFalse` to an `IfTrue` with `current->as_IfTrue()` on L629. > > ### Solution > The fix is straight forward: `TemplateAssertionPredicate::is_predicate()` (and `InitiliazedAssertionPredi... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Review Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21561/files - new: https://git.openjdk.org/jdk/pull/21561/files/aeb74099..9c65f9a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21561&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21561&range=00-01 Stats: 19 lines in 2 files changed: 6 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21561/head:pull/21561 PR: https://git.openjdk.org/jdk/pull/21561 From duke at openjdk.org Thu Oct 17 21:12:35 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 17 Oct 2024 21:12:35 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn [v3] In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: <4yh-9E3dPvmUGKVoo3sEHCoc_TdBbGdgMJbGQbti50s=.2290e266-e224-4e10-a282-9627f19e3c0c@github.com> > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways > > Confirmed that added test fails before patch and passes after Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: - Remove jasm file - Update test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21473/files - new: https://git.openjdk.org/jdk/pull/21473/files/51298397..dfbe03de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21473&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21473&range=01-02 Stats: 337 lines in 3 files changed: 102 ins; 235 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21473.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21473/head:pull/21473 PR: https://git.openjdk.org/jdk/pull/21473 From kvn at openjdk.org Thu Oct 17 21:46:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 17 Oct 2024 21:46:41 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 1 Jul 2024 13:32:01 GMT, Emanuel Peter wrote: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. For me it is confusing to call `pointer = con + sum_i(scale_i * variable_i)` as "pointer" unless it is Unsafe address which has base address as constant. It misses base address. All out pointer types are correspond to an address of some object in Java heap, out of heap, VM's object or some native (C heap) VM object. This looks like `address_offset`, `displacement`, ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2420654868 From kvn at openjdk.org Thu Oct 17 21:50:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 17 Oct 2024 21:50:47 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 1 Jul 2024 13:32:01 GMT, Emanuel Peter wrote: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. src/hotspot/share/opto/mempointer.hpp line 353: > 351: // > 352: // array[j] -> array_base + j + con -> 2 summands > 353: // nativeMemorySegment.get(j) -> null + address + offset + j + con -> 3 summands Does this means it supports only Unsafe and Array access? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1805484515 From vlivanov at openjdk.org Thu Oct 17 21:57:06 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 17 Oct 2024 21:57:06 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Thu, 17 Oct 2024 19:40:52 GMT, Jatin Bhateja wrote: >> MulVL (VectorCastI2X src1) (VectorCastI2X src2) > It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, ... Hm, I don't see any problems with it if `VPMULDQ` is used. Sign extension becomes redundant when 64-bit multiplication is strength-reduced to 32-bit one (32x32->64). Am I missing something important here? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420668490 From kvn at openjdk.org Thu Oct 17 22:06:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 17 Oct 2024 22:06:15 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 1 Jul 2024 13:32:01 GMT, Emanuel Peter wrote: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. src/hotspot/share/opto/mempointer.hpp line 43: > 41: // Where each summand_i in summands has the form: > 42: // > 43: // summand_i = scale_i * variable_i I see you treat `scale` as compile time integer value (NoOverflowInt) and not Node. Why is the check/filter for that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1805496171 From kvn at openjdk.org Thu Oct 17 22:11:50 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 17 Oct 2024 22:11:50 GMT Subject: RFR: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs [v2] In-Reply-To: References: Message-ID: <43JQCtakHQi1S_JPiZ8Ztq9xV9BCE7JgZheORh1Ja18=.a7eef304-ee01-459d-bd3d-3f34a0941383@github.com> On Thu, 17 Oct 2024 21:12:28 GMT, Christian Hagedorn wrote: >> ### Assertion Predicates Have the True Projection on the Success Path >> By design, Template and Initialized Assertion Predicates have the true projection on the success path and false projection on the failing path. >> >> ### Is a Node a Template Assertion Predicate? >> Template Assertion Predicates can have an uncommon trap on or a `Halt` node following the failing/false projection. When trying to find out if a node is a Template Assertion Predicate, we call `TemplateAssertionPredicate::is_predicate()` which checks if the provided node is the success projection of a Template Assertion Predicate. This involves checking if the other projection has a `Halt` node or an UCT (L141): >> https://github.com/openjdk/jdk/blob/7a64fbbb9292f4d65a6970206dec1a7d7645046b/src/hotspot/share/opto/predicates.cpp#L135-L144 >> >> ### New `PredicateIterator` Class >> >> [JDK-8340786](https://bugs.openjdk.org/browse/JDK-8340786) introduced a new `PredicateIterator` class to simplify iteration over predicates and replaced code that was doing some custom iteration. >> >> #### Usual Usage >> Normally, we always start from a loop entry and follow the predicates (i.e. reaching a predicate over its success path). >> >> #### Special Usage >> However, I also replaced the predicate skipping in `PhaseIdealLoop::build_loop_late_post_work()` with the new `PredicateIterator` class which could start at any node, including a failing path false projection. >> >> ### Problem: Two Uncommon Traps for a Template Assertion Predicate >> The fuzzer now found a case where a Template Assertion Predicate has a predicate UCT on the failing **and** the success path due to folding away some dead nodes: >> >> ![image](https://github.com/user-attachments/assets/c2a9395e-9eaf-46a0-b978-8847d8e21945) >> >> In the Special Usage mentioned above, we could be using the `PredicateIterator` with `505 IfFalse` as starting node. In the core method `RegularPredicateBlockIterator::for_each()` for the traversal, we call the following with `current` = `505 IfFalse`: >> https://github.com/openjdk/jdk/blob/1ea1f33f66326804ca2892fe0659a9acb7ee72ae/src/hotspot/share/opto/predicates.hpp#L628-L629 >> `TemplateAssertionPredicate::is_predicate()` succeeds because we have a valid Template Assertion Predicate If node and the other projection `504 IfTrue` has a predicate UCT. We then fail when trying to convert the `IfFalse` to an `IfTrue` with `current->as_IfTrue()` on L629. >> >> ### Solution >> The fix is straight forward: `TemplateAssertionPr... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Review Vladimir Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21561#pullrequestreview-2376395105 From kvn at openjdk.org Thu Oct 17 22:12:57 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 17 Oct 2024 22:12:57 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Thu, 17 Oct 2024 22:01:40 GMT, Vladimir Kozlov wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > src/hotspot/share/opto/mempointer.hpp line 43: > >> 41: // Where each summand_i in summands has the form: >> 42: // >> 43: // summand_i = scale_i * variable_i > > I see you treat `scale` as compile time integer value (NoOverflowInt) and not Node. Why is the check/filter for that? I mean: _Where_ is the check/filter for that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1805501990 From vlivanov at openjdk.org Thu Oct 17 22:36:23 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 17 Oct 2024 22:36:23 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Tue, 15 Oct 2024 17:26:49 GMT, Quan Anh Mai wrote: >> I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? >> >> About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) > > @jaskarth Please proceed with it, I have a really simple prototype for it but I don't have any plan to proceed further soon. Thanks a lot :) @merykitty The approach @jatin-bhateja proposes looks well-justified to me. Matching is essentially a lowering step which transforms platform-independent Ideal IR into platform-specific Mach IR. And collapsing non-trivial IR trees into platform-specific instructions is a well-established pattern in the code. Indeed, there are some constraints matching imposes, so it may not be flexible enough to cover all use cases. In particular, for `VPTERNLOGD`/`VPTERNLOGQ` it was decided it's worth the effort to handle them specially (see `Compile::optimize_logic_cones()`). As it is implemented now, it's part of the shared code, but if there's platform-specific custom lowering phase available one day, it can be moved there, of course. But speaking of `VPMULDQ`/`VPMULUDQ`, what kind of benefits do you see from custom logic to support them? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420732705 From psandoz at openjdk.org Thu Oct 17 23:38:28 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 17 Oct 2024 23:38:28 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v25] In-Reply-To: <28MMdFkPrGQkT3nD8dUA2GAx6uZyRwERWn3bVvEPSR8=.79c2b5e5-7317-47c5-8878-30c4ac3171ab@github.com> References: <28MMdFkPrGQkT3nD8dUA2GAx6uZyRwERWn3bVvEPSR8=.79c2b5e5-7317-47c5-8878-30c4ac3171ab@github.com> Message-ID: On Thu, 17 Oct 2024 18:23:12 GMT, Jatin Bhateja wrote: > I have restricted the IR validation to just newly added UMin/UMax transformations. Even then i think it better to do so in follow on PR, otherwise it is a moving target for review and testing. This new test fails on aarch64 e.g., STATUS:Failed.`main' threw exception: compiler.lib.ir_framework.shared.TestFormatException: Violations (16) --------------- - Could not find VM flag "UseAVX" in @IR rule 1 at public void compiler.vectorapi.VectorUnsignedMinMaxIRTransformsTest.umin_ir_transform1_byte() ... Testing tier 1 to 3 with latest PR looks good otherwise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2420845538 From duke at openjdk.org Thu Oct 17 23:39:35 2024 From: duke at openjdk.org (hanklo6) Date: Thu, 17 Oct 2024 23:39:35 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v4] In-Reply-To: References: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> Message-ID: <3e0pIvFYNGmrXzKkP0MQcGATfEVfV0ZWWbSWXwV2H-0=.36137749-b27d-47cc-b31b-124f2e76227f@github.com> On Thu, 17 Oct 2024 12:04:55 GMT, Jatin Bhateja wrote: >> hanklo6 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into apx-test-tool >> - Add comment and defined >> - Add copyright header >> - Remove tab >> - Remove whitespace >> - Replace whitespace with tab >> - Add flag before testing >> - Fix assertion error on MacOS >> - Add _LP64 flag >> - Add missing header >> - ... and 6 more: https://git.openjdk.org/jdk/compare/a9a1795a...ca48f240 > > test/hotspot/gtest/x86/test_assemblerx86.cpp line 44: > >> 42: // Different encoding for GCC and OpenJDK >> 43: {"shll", {'\xd3', '\xd1'}}, >> 44: {"shlq", {'\xd3', '\xd1'}}, > > For the record. > > // For single register operand salq, C2 assumes shift will be passed through CL register and emits the encoding with opcode set to 'D3". > void Assembler::shlq(Register dst) { > int encode = prefixq_and_encode(dst->encoding()); > emit_int16((unsigned char)0xD3, (0xE0 | encode)); > } > > // With immediate shift operand we explicitly handle special case of shift by '1' bit... and emit D1 opcode. > void Assembler::shlq(Register dst, int imm8) { > assert(isShiftCount(imm8 >> 1), "illegal shift count"); > int encode = prefixq_and_encode(dst->encoding()); > if (imm8 == 1) { > emit_int16((unsigned char)0xD1, (0xE0 | encode)); > } else { > emit_int24((unsigned char)0xC1, (0xE0 | encode), imm8); > } > } > > So, GCC toolchain is following a different convention than C2, but both are emitting correct encodings. > Our test infrastructure is biased toward C2 and hence does not comply with GCC encoding, thus we are > skipping over following cases. Please see below a small inline assembly snippet and its corresponding encoding. > > void micro(){ > asm volatile( > "shlq $1, %%r11 \n\t" [InstID 1] > "shlq %%r11 \n\t" [InstID 2] > "shlq %%cl, %%r11 \n\t" [InstID 3] > : : : "%r11", "%rcx" > ); > } > CPROMPT>objdump -D shl.o > Disassembly of section .text: > 0000000000000000 : > 0: 55 push %rbp > 1: 48 89 e5 mov %rsp,%rbp > 4: 49 d1 e3 shl $1,%r11 [InstID 1] > 7: 49 d1 e3 shl $1,%r11 [InstID 2] > a: 49 d3 e3 shl %cl,%r1 1 [InstID 3] I think I can put the `cl` register in the GCC assembly to align it with the JDK assembler. This will allow us to remove these parts of checking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1805590978 From duke at openjdk.org Fri Oct 18 00:07:05 2024 From: duke at openjdk.org (hanklo6) Date: Fri, 18 Oct 2024 00:07:05 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v5] In-Reply-To: References: Message-ID: > Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > ### Generate test instructions > With `binutils = 2.43` > * `python3 x86-asmtest.py > asmtest.out.h` > ### Run test > * `make test TEST="gtest:AssemblerX86"` hanklo6 has updated the pull request incrementally with two additional commits since the last revision: - Add cl register in shift/rotate instructions - Refactor and add instruction id ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20857/files - new: https://git.openjdk.org/jdk/pull/20857/files/ca48f240..da9c54bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=03-04 Stats: 60623 lines in 3 files changed: 14578 ins; 2252 del; 43793 mod Patch: https://git.openjdk.org/jdk/pull/20857.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20857/head:pull/20857 PR: https://git.openjdk.org/jdk/pull/20857 From jbhateja at openjdk.org Fri Oct 18 02:03:21 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 18 Oct 2024 02:03:21 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: > This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. > > > MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) > MulL (URShift SRC1 , 32) (URShift SRC2, 32) > MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) > MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 1299.407 ops/ms > VectorXXH3HashingB... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Review resolutions - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction ------------- Changes: https://git.openjdk.org/jdk/pull/21244/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=03 Stats: 351 lines in 12 files changed: 339 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From jbhateja at openjdk.org Fri Oct 18 02:27:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 18 Oct 2024 02:27:06 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Thu, 17 Oct 2024 21:53:16 GMT, Vladimir Ivanov wrote: > > > MulVL (VectorCastI2X src1) (VectorCastI2X src2) > > > It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, ... > > Hm, I don't see any problems with it if `VPMULDQ` is used. Sign extension becomes redundant when 64-bit multiplication is strength-reduced to 32-bit one (32x32->64). Am I missing something important here? @iwanowww , agree!, I missed noticing that you were talking about **VPMULDQ**, its a signed doubleword multiplier with quadword saturation, so it should be ok to include suggested pattern. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421132055 From fyang at openjdk.org Fri Oct 18 02:39:51 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 18 Oct 2024 02:39:51 GMT Subject: RFR: 8342579: RISC-V: C2: Cleanup effect of killing flag register for call nodes Message-ID: Previously found and discussed at: https://github.com/openjdk/jdk/pull/21406#discussion_r1803232561 For C2 call nodes, it's not necessary to add effect listing flag register as being killed. This cleans them up and thus aligns with other CPU platforms. Testing on linux-riscv64: - [x] Tier1 (release build) ------------- Commit messages: - 8342579: RISC-V: C2: Cleanup effect of killing flag register for call nodes Changes: https://git.openjdk.org/jdk/pull/21576/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21576&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342579 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21576.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21576/head:pull/21576 PR: https://git.openjdk.org/jdk/pull/21576 From qamai at openjdk.org Fri Oct 18 02:44:52 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 18 Oct 2024 02:44:52 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 02:03:21 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Review resolutions > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction The issues I have with this patch are that: - It convolutes the graph with machine-dependent nodes early in the compiling process. - It overloads `MulVL` with alternative behaviours, it is fine now as we do not perform much analysis on this node but it would be problematic later. I think it is more preferable to have a separate IR node for this like `MulVLowIToLNode`, or have this transformation be done only just before matching, or both. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421157206 From jbhateja at openjdk.org Fri Oct 18 02:47:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 18 Oct 2024 02:47:55 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v4] In-Reply-To: <3e0pIvFYNGmrXzKkP0MQcGATfEVfV0ZWWbSWXwV2H-0=.36137749-b27d-47cc-b31b-124f2e76227f@github.com> References: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> <3e0pIvFYNGmrXzKkP0MQcGATfEVfV0ZWWbSWXwV2H-0=.36137749-b27d-47cc-b31b-124f2e76227f@github.com> Message-ID: <010De2p3_0FDAr6yBuz22bhrXSeY-LG13oCKzycjl3U=.69df1b15-5375-4450-8272-c12895ecfc21@github.com> On Thu, 17 Oct 2024 23:35:06 GMT, hanklo6 wrote: > I think I can put the `cl` register in the GCC assembly to align it with the JDK assembler. This will allow us to remove these parts of checking. Good, this should prevent skipping over their validation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1805758365 From amitkumar at openjdk.org Fri Oct 18 03:49:25 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 18 Oct 2024 03:49:25 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v3] In-Reply-To: References: Message-ID: > This is trivial PR to change data type of some "*Threshold" variables form `intx` to `double`. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: review/comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21354/files - new: https://git.openjdk.org/jdk/pull/21354/files/ce4ff580..97d63a29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21354&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21354&range=01-02 Stats: 19 lines in 4 files changed: 0 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/21354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21354/head:pull/21354 PR: https://git.openjdk.org/jdk/pull/21354 From amitkumar at openjdk.org Fri Oct 18 03:49:25 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 18 Oct 2024 03:49:25 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 19:49:40 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/opto/bytecodeInfo.cpp line 316: >> >>> 314: int call_site_count = caller_method->scale_count(profile.count()); >>> 315: int invoke_count = caller_method->interpreter_invocation_count(); >>> 316: assert(invoke_count >= 0, "require invocation count greater than zero"); >> >> Technically, the comment is now wrong. It is no longer "greater than" but "greater than or equal to zero". Is that intended? Otherwise you should use `>`. > > Actually it should be `>` because we divide by it in next line. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21354#discussion_r1805804832 From amitkumar at openjdk.org Fri Oct 18 04:02:38 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 18 Oct 2024 04:02:38 GMT Subject: RFR: 8342409: [s390x] C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 06:38:57 GMT, Amit Kumar wrote: > Make sure LIR_Assembler::emit_unwind_handler() jumps to the slow path directly for unlocking a synchronized method if LM_MONITOR is used. > On the fast paths assertions are added that the mode is actually handled. > > Testing: Tier1 test for fastdebug vm showed no regression. @RealLucy can I get another approval for this :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21557#issuecomment-2421285274 From jbhateja at openjdk.org Fri Oct 18 04:20:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 18 Oct 2024 04:20:22 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 02:41:47 GMT, Quan Anh Mai wrote: > The issues I have with this patch are that: > > * It convolutes the graph with machine-dependent nodes early in the compiling process. MulVL is a machine independent IR, we create a machine dependent IR post matching. > * It overloads `MulVL` with alternative behaviours, it is fine now as we do not perform much analysis on this node but it would be problematic later. I think it is more preferable to have a separate IR node for this like `MulVLowIToLNode`, or have this transformation be done only just before matching, or both. I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421300738 From vlivanov at openjdk.org Fri Oct 18 05:03:22 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 18 Oct 2024 05:03:22 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: <34KZVRjCMAl5-KAG6hLnJUe2RZF2fThQAWuresTL5Pk=.83d5f516-4a5a-4f0e-9eeb-67b78cfc074b@github.com> On Fri, 18 Oct 2024 04:16:15 GMT, Jatin Bhateja wrote: > It convolutes the graph with machine-dependent nodes early in the compiling process. Ah, I see your point now! I took a closer look at the patch and indeed `MulVLNode::_mult_lower_double_word` with `MulVLNode::Ideal()` don't look pretty. @jatin-bhateja why don't you turn the logic it into match rules instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421372120 From qamai at openjdk.org Fri Oct 18 05:09:34 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 18 Oct 2024 05:09:34 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 04:16:15 GMT, Jatin Bhateja wrote: > I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. The issue is that a node is not immutable. This puts a burden on every place to keep the annotation sane when doing transformations, which is easily missed when there are a lot of kinds of `Node`s out there. That's why I think it is most suitable to be done only right before matching. `Node::Ideal` is invoked in a really generous manner so I would prefer not to add analysis to it that can be done more efficiently somewhere else. Additionally, if you have a separate IR node for this operation, you can do some more beneficial transformations such as `MulVL (AndV x max_juint) (AndV y max_juint)` into `MulVLowIToL x y`. My suggestions are based on this PR as a standalone, so they may not be optimal when looking at a wider perspective, in case you think this approach would fit more nicely into a larger landscape of your planned enhancements please let us know. Thanks for your patience. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421376285 From vlivanov at openjdk.org Fri Oct 18 05:19:39 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 18 Oct 2024 05:19:39 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 05:05:16 GMT, Quan Anh Mai wrote: > The issue is that a node is not immutable. I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421412061 From qamai at openjdk.org Fri Oct 18 05:41:20 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 18 Oct 2024 05:41:20 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 05:16:04 GMT, Vladimir Ivanov wrote: >>> I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. >> >> The issue is that a node is not immutable. This puts a burden on every place to keep the annotation sane when doing transformations, which is easily missed when there are a lot of kinds of `Node`s out there. That's why I think it is most suitable to be done only right before matching. `Node::Ideal` is invoked in a really generous manner so I would prefer not to add analysis to it that can be done more efficiently somewhere else. Additionally, if you have a separate IR node for this operation, you can do some more beneficial transformations such as `MulVL (AndV x max_juint) (AndV y max_juint)` into `MulVLowIToL x y`. >> >> My suggestions are based on this PR as a standalone, so they may not be optimal when looking at a wider perspective, in case you think this approach would fit more nicely into a larger landscape of your planned enhancements please let us know. Thanks for your patience. > >> The issue is that a node is not immutable. > > I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. @iwanowww IMO there are 2 ways to view this: - You can see a `MulVL` nodes with `_mult_lower_double_word` being an entirely different kind of nodes which do a different thing (a.k.a throw away the upper bits and only multiply the lower bits), in this case it is a machine-dependent IR node hiding behind the opcode of `MulVL` and changing the inputs of it is not worrying because the node does not care about that anyway, its semantics is predetermined already. - Or you can see `_mult_lower_double_word` being an annotation that adds information to `MulVL`, which means it is still a `MulVL` but annotated with information saying that all upper bits of the operands are 0. I think this is Jatin's point of view right now. The issue here would be to keep the annotation sane when the node inputs may be changed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421441405 From vlivanov at openjdk.org Fri Oct 18 05:41:41 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 18 Oct 2024 05:41:41 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 02:03:21 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Review resolutions > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction src/hotspot/share/opto/vectornode.cpp line 2122: > 2120: // MulL (URShift SRC1 , 32) (URShift SRC2, 32) > 2121: // MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) > 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) I don't understand how it works... According to the documentation, `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands and pass them into `vpmuludq` which doesn't look right... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805886268 From qamai at openjdk.org Fri Oct 18 05:41:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 18 Oct 2024 05:41:41 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 05:35:28 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Review resolutions >> - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction > > src/hotspot/share/opto/vectornode.cpp line 2122: > >> 2120: // MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> 2121: // MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > > I don't understand how it works... According to the documentation, `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands and pass them into `vpmuludq` which doesn't look right... `vpmuludq` does a long multiplication but throws away the upper bits of the operands, effectively does a `(x & max_juint) * (y & max_juint)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805887594 From qamai at openjdk.org Fri Oct 18 05:41:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 18 Oct 2024 05:41:41 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 05:37:16 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/vectornode.cpp line 2122: >> >>> 2120: // MulL (URShift SRC1 , 32) (URShift SRC2, 32) >>> 2121: // MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >>> 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> I don't understand how it works... According to the documentation, `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands and pass them into `vpmuludq` which doesn't look right... > > `vpmuludq` does a long multiplication but throws away the upper bits of the operands, effectively does a `(x & max_juint) * (y & max_juint)` You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq VPMULUDQ (VEX.256 Encoded Version)[ ?](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) DEST[63:0] := SRC1[31:0] * SRC2[31:0] DEST[127:64] := SRC1[95:64] * SRC2[95:64] DEST[191:128] := SRC1[159:128] * SRC2[159:128] DEST[255:192] := SRC1[223:192] * SRC2[223:192] DEST[MAXVL-1:256] := 0 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805888984 From jbhateja at openjdk.org Fri Oct 18 05:44:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 18 Oct 2024 05:44:22 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 05:16:04 GMT, Vladimir Ivanov wrote: >>> I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. >> >> The issue is that a node is not immutable. This puts a burden on every place to keep the annotation sane when doing transformations, which is easily missed when there are a lot of kinds of `Node`s out there. That's why I think it is most suitable to be done only right before matching. `Node::Ideal` is invoked in a really generous manner so I would prefer not to add analysis to it that can be done more efficiently somewhere else. Additionally, if you have a separate IR node for this operation, you can do some more beneficial transformations such as `MulVL (AndV x max_juint) (AndV y max_juint)` into `MulVLowIToL x y`. >> >> My suggestions are based on this PR as a standalone, so they may not be optimal when looking at a wider perspective, in case you think this approach would fit more nicely into a larger landscape of your planned enhancements please let us know. Thanks for your patience. > >> The issue is that a node is not immutable. > > I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. Hi @iwanowww , @merykitty , I am in process of addressing all your concerns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421448784 From vlivanov at openjdk.org Fri Oct 18 05:49:54 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 18 Oct 2024 05:49:54 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 05:39:08 GMT, Quan Anh Mai wrote: >> `vpmuludq` does a long multiplication but throws away the upper bits of the operands, effectively does a `(x & max_juint) * (y & max_juint)` > > You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq > > VPMULUDQ (VEX.256 Encoded Version)[ ?](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) > DEST[63:0] := SRC1[31:0] * SRC2[31:0] > DEST[127:64] := SRC1[95:64] * SRC2[95:64] > DEST[191:128] := SRC1[159:128] * SRC2[159:128] > DEST[255:192] := SRC1[223:192] * SRC2[223:192] > DEST[MAXVL-1:256] := 0 Got it. Now it makes perfect sense. Thanks for the clarifications! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805894106 From vlivanov at openjdk.org Fri Oct 18 05:55:07 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 18 Oct 2024 05:55:07 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 05:46:25 GMT, Vladimir Ivanov wrote: >> You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq >> >> VPMULUDQ (VEX.256 Encoded Version)[ ?](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) >> DEST[63:0] := SRC1[31:0] * SRC2[31:0] >> DEST[127:64] := SRC1[95:64] * SRC2[95:64] >> DEST[191:128] := SRC1[159:128] * SRC2[159:128] >> DEST[255:192] := SRC1[223:192] * SRC2[223:192] >> DEST[MAXVL-1:256] := 0 > > Got it. Now it makes perfect sense. Thanks for the clarifications! Actually, it makes detecting the pattern during matching even simpler than I initially thought. Since there's no need to match any non-trivial ideal IR tree, AD instruction can just match a single `MulVL`, but detect operand shapes using a predicate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805903273 From vlivanov at openjdk.org Fri Oct 18 06:08:48 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 18 Oct 2024 06:08:48 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 05:35:27 GMT, Quan Anh Mai wrote: >>> The issue is that a node is not immutable. >> >> I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. > > @iwanowww IMO there are 2 ways to view this: > > - You can see a `MulVL` nodes with `_mult_lower_double_word` being an entirely different kind of nodes which do a different thing (a.k.a throw away the upper bits and only multiply the lower bits), in this case it is a machine-dependent IR node hiding behind the opcode of `MulVL` and changing the inputs of it is not worrying because the node does not care about that anyway, its semantics is predetermined already. > - Or you can see `_mult_lower_double_word` being an annotation that adds information to `MulVL`, which means it is still a `MulVL` but annotated with information saying that all upper bits of the operands are 0. I think this is Jatin's point of view right now. The issue here would be to keep the annotation sane when the node inputs may be changed. @merykitty I was under an erroneous impression that `MulVL::Ideal()` folds operands of particular shapes into `MulVL::_mult_lower_double_word == true`. Now I see it's not the case. Indeed, what `MulVL::Ideal()` does is it caches the info about operand shapes in `MulVL::_mult_lower_double_word` which introduces unnecessary redundancy. I doubt it is possible for IR to diverge so much (through a sequence of equivalent transformations) that the bit gets out of sync (unless there's a bug in compiler or a paradoxical situation in effectively dead code occurs). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421504978 From thartmann at openjdk.org Fri Oct 18 06:14:11 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 18 Oct 2024 06:14:11 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v5] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:13:28 GMT, Tobias Hartmann wrote: >> After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 >> >> Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 >> >> -> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 >> >> The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. >> >> Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: >> https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 >> >> I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge with master > - More cleanups > - Modified ciTypeFlow::can_trap > - Missed a return > - First prototype > > Fix Thanks for the review Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21470#issuecomment-2421510555 From jbhateja at openjdk.org Fri Oct 18 06:22:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 18 Oct 2024 06:22:38 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 05:35:27 GMT, Quan Anh Mai wrote: >>> The issue is that a node is not immutable. >> >> I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. > > @iwanowww IMO there are 2 ways to view this: > > - You can see a `MulVL` nodes with `_mult_lower_double_word` being an entirely different kind of nodes which do a different thing (a.k.a throw away the upper bits and only multiply the lower bits), in this case it is a machine-dependent IR node hiding behind the opcode of `MulVL` and changing the inputs of it is not worrying because the node does not care about that anyway, its semantics is predetermined already. > - Or you can see `_mult_lower_double_word` being an annotation that adds information to `MulVL`, which means it is still a `MulVL` but annotated with information saying that all upper bits of the operands are 0. I think this is Jatin's point of view right now. The issue here would be to keep the annotation sane when the node inputs may be changed. > @merykitty I was under an erroneous impression that `MulVL::Ideal()` folds operands of particular shapes into `MulVL::_mult_lower_double_word == true`. Now I see it's not the case. Indeed, what `MulVL::Ideal()` does is it caches the info about operand shapes in `MulVL::_mult_lower_double_word` which introduces unnecessary redundancy. I doubt it is possible for IR to diverge so much (through a sequence of equivalent transformations) that the bit gets out of sync (unless there's a bug in compiler or a paradoxical situation in effectively dead code occurs). Hi @iwanowww , @merykitty , Thanks for your inputs!! I still feel idealization is the right place to execute this pattern detection, we just need to re-wire the effective inputs bypassing doubleword clearing logic to newly annotated MulVL node and allow clearing IR to sweepout during successive passes, moving it to final graph reshaping just before instruction selection will prevent dead IR cleanups. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421511006 From qamai at openjdk.org Fri Oct 18 06:22:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 18 Oct 2024 06:22:38 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: <8p95gYaAnNAIfqVBosZgvMMCVhHn2M0fQx7FLLgCn9U=.94cd8db9-894c-4590-ae08-45afecfae2ad@github.com> On Fri, 18 Oct 2024 06:10:54 GMT, Jatin Bhateja wrote: >> @iwanowww IMO there are 2 ways to view this: >> >> - You can see a `MulVL` nodes with `_mult_lower_double_word` being an entirely different kind of nodes which do a different thing (a.k.a throw away the upper bits and only multiply the lower bits), in this case it is a machine-dependent IR node hiding behind the opcode of `MulVL` and changing the inputs of it is not worrying because the node does not care about that anyway, its semantics is predetermined already. >> - Or you can see `_mult_lower_double_word` being an annotation that adds information to `MulVL`, which means it is still a `MulVL` but annotated with information saying that all upper bits of the operands are 0. I think this is Jatin's point of view right now. The issue here would be to keep the annotation sane when the node inputs may be changed. > >> @merykitty I was under an erroneous impression that `MulVL::Ideal()` folds operands of particular shapes into `MulVL::_mult_lower_double_word == true`. Now I see it's not the case. Indeed, what `MulVL::Ideal()` does is it caches the info about operand shapes in `MulVL::_mult_lower_double_word` which introduces unnecessary redundancy. I doubt it is possible for IR to diverge so much (through a sequence of equivalent transformations) that the bit gets out of sync (unless there's a bug in compiler or a paradoxical situation in effectively dead code occurs). > > Hi @iwanowww , @merykitty , Thanks for your inputs!! > > I still feel idealization is the right place to execute this pattern detection, we just need to re-wire the effective inputs bypassing doubleword clearing logic to newly annotated MulVL node and allow clearing IR to sweepout during successive passes, moving it to final graph reshaping just before instruction selection will prevent dead IR cleanups. @jatin-bhateja I think you can do it at the same place as `Compile::optimize_logic_cones`, we do perform IGVN there. Unless you think this information is needed early in the compiling process, currently I see it is used during matching only, which makes it unnecessary to repeatedly checking it in `Node::Ideal` ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421519087 From vlivanov at openjdk.org Fri Oct 18 06:30:38 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 18 Oct 2024 06:30:38 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 02:03:21 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Review resolutions > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction IMO until C2 type system starts to track bitwise constant information ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there are not enough benefits to rely on IGVN here. So far, all the discussed patterns are simple enough for matcher to handle them without too much tweaking. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421529658 From thartmann at openjdk.org Fri Oct 18 06:49:52 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 18 Oct 2024 06:49:52 GMT Subject: Integrated: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type In-Reply-To: References: Message-ID: <94P7KX24Xn3Q0lx6H5p2qDGZlJj4IcVuDgyc8uU8JCo=.b3181b6d-136e-4141-b5e8-b6439a61b184@github.com> On Fri, 11 Oct 2024 13:44:46 GMT, Tobias Hartmann wrote: > After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 > > Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 > > -> > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 > > The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. > > Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: > https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 > > I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. > > Thanks, > Tobias This pull request has now been integrated. Changeset: c51a086c Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/c51a086ce32dd4e97aa83dfba3bcf9b0636193cc Stats: 135 lines in 5 files changed: 111 ins; 18 del; 6 mod 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/21470 From rehn at openjdk.org Fri Oct 18 06:52:59 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 18 Oct 2024 06:52:59 GMT Subject: RFR: 8342579: RISC-V: C2: Cleanup effect of killing flag register for call instructs In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 02:24:16 GMT, Fei Yang wrote: > Previously found and discussed at: https://github.com/openjdk/jdk/pull/21406#discussion_r1803232561 > For C2 call nodes, it's not necessary to add effect listing flag register as being killed. > This cleans them up and thus aligns with other CPU platforms. > > Testing on linux-riscv64: > - [x] Tier1 (release build) Thank you, looks good! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21576#pullrequestreview-2377150885 From chagedorn at openjdk.org Fri Oct 18 06:55:00 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 18 Oct 2024 06:55:00 GMT Subject: RFR: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs [v2] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 21:12:28 GMT, Christian Hagedorn wrote: >> ### Assertion Predicates Have the True Projection on the Success Path >> By design, Template and Initialized Assertion Predicates have the true projection on the success path and false projection on the failing path. >> >> ### Is a Node a Template Assertion Predicate? >> Template Assertion Predicates can have an uncommon trap on or a `Halt` node following the failing/false projection. When trying to find out if a node is a Template Assertion Predicate, we call `TemplateAssertionPredicate::is_predicate()` which checks if the provided node is the success projection of a Template Assertion Predicate. This involves checking if the other projection has a `Halt` node or an UCT (L141): >> https://github.com/openjdk/jdk/blob/7a64fbbb9292f4d65a6970206dec1a7d7645046b/src/hotspot/share/opto/predicates.cpp#L135-L144 >> >> ### New `PredicateIterator` Class >> >> [JDK-8340786](https://bugs.openjdk.org/browse/JDK-8340786) introduced a new `PredicateIterator` class to simplify iteration over predicates and replaced code that was doing some custom iteration. >> >> #### Usual Usage >> Normally, we always start from a loop entry and follow the predicates (i.e. reaching a predicate over its success path). >> >> #### Special Usage >> However, I also replaced the predicate skipping in `PhaseIdealLoop::build_loop_late_post_work()` with the new `PredicateIterator` class which could start at any node, including a failing path false projection. >> >> ### Problem: Two Uncommon Traps for a Template Assertion Predicate >> The fuzzer now found a case where a Template Assertion Predicate has a predicate UCT on the failing **and** the success path due to folding away some dead nodes: >> >> ![image](https://github.com/user-attachments/assets/c2a9395e-9eaf-46a0-b978-8847d8e21945) >> >> In the Special Usage mentioned above, we could be using the `PredicateIterator` with `505 IfFalse` as starting node. In the core method `RegularPredicateBlockIterator::for_each()` for the traversal, we call the following with `current` = `505 IfFalse`: >> https://github.com/openjdk/jdk/blob/1ea1f33f66326804ca2892fe0659a9acb7ee72ae/src/hotspot/share/opto/predicates.hpp#L628-L629 >> `TemplateAssertionPredicate::is_predicate()` succeeds because we have a valid Template Assertion Predicate If node and the other projection `504 IfTrue` has a predicate UCT. We then fail when trying to convert the `IfFalse` to an `IfTrue` with `current->as_IfTrue()` on L629. >> >> ### Solution >> The fix is straight forward: `TemplateAssertionPr... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Review Vladimir Thanks Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21561#issuecomment-2421573996 From thartmann at openjdk.org Fri Oct 18 06:58:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 18 Oct 2024 06:58:10 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v4] In-Reply-To: <9LrhezvsYwS32PEUN9wn6hKDJJn0wybl3YXSHuohUC8=.eded7969-c03c-4a75-a2c2-2d0e9682722d@github.com> References: <9LrhezvsYwS32PEUN9wn6hKDJJn0wybl3YXSHuohUC8=.eded7969-c03c-4a75-a2c2-2d0e9682722d@github.com> Message-ID: On Wed, 16 Oct 2024 16:28:50 GMT, Sandhya Viswanathan wrote: >> When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. >> Also a regression test case is added accordingly. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Marked as reviewed by thartmann (Reviewer). Sorry for the delay. I re-submitted testing with the latest version and it all passed. ------------- PR Review: https://git.openjdk.org/jdk/pull/21480#pullrequestreview-2377166421 PR Comment: https://git.openjdk.org/jdk/pull/21480#issuecomment-2421582354 From thartmann at openjdk.org Fri Oct 18 07:07:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 18 Oct 2024 07:07:40 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn [v3] In-Reply-To: <4yh-9E3dPvmUGKVoo3sEHCoc_TdBbGdgMJbGQbti50s=.2290e266-e224-4e10-a282-9627f19e3c0c@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> <4yh-9E3dPvmUGKVoo3sEHCoc_TdBbGdgMJbGQbti50s=.2290e266-e224-4e10-a282-9627f19e3c0c@github.com> Message-ID: On Thu, 17 Oct 2024 21:12:35 GMT, Chad Rakoczy wrote: >> [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) >> >> Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. >> >> I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways >> >> Confirmed that added test fails before patch and passes after > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Remove jasm file > - Update test Thanks for simplifying the test, very nice! The changes look good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21473#pullrequestreview-2377182867 From lucy at openjdk.org Fri Oct 18 07:19:22 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 18 Oct 2024 07:19:22 GMT Subject: RFR: 8342409: [s390x] C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 06:38:57 GMT, Amit Kumar wrote: > Make sure LIR_Assembler::emit_unwind_handler() jumps to the slow path directly for unlocking a synchronized method if LM_MONITOR is used. > On the fast paths assertions are added that the mode is actually handled. > > Testing: Tier1 test for fastdebug vm showed no regression. LGTM. Done. From: Amit Kumar ***@***.***> Date: Friday, 18. October 2024 at 06:00 To: openjdk/jdk ***@***.***> Cc: Schmidt, Lutz ***@***.***>, Mention ***@***.***> Subject: Re: [openjdk/jdk] 8342409: [s390x] C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR (PR #21557) @RealLucy can I get another approval for this :-) ? Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***> ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21557#pullrequestreview-2377206159 PR Comment: https://git.openjdk.org/jdk/pull/21557#issuecomment-2421623119 From thartmann at openjdk.org Fri Oct 18 07:27:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 18 Oct 2024 07:27:21 GMT Subject: RFR: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit [v2] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:54:51 GMT, Roland Westrelin wrote: >> That assert checks that during RC elimination, we have either: >> >> - not updated the limit of the main loop >> >> - or that the new limit is at the expected control >> >> The assert fires because the limit was updated but is not at the >> expected control. That happens because `new_limit_ctrl` is updated for >> a test that it attempts to eliminate before it actually proceeds with >> the elimination: if the test can't be eliminated, `new_limit_ctrl` >> gets updated anyway. >> >> While the assert could, maybe, be relaxed (it fires in this case but >> nothing is going wrong), it's better, I think, to simply not uselessly >> restrict the control of the limit. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > fix & test Looks good to me. src/hotspot/share/opto/loopTransform.cpp line 3004: > 3002: } > 3003: } > 3004: // Only updated variable tracking control for new nodes if it's indeed a range check that can be eliminated (and Suggestion: // Only update variable tracking control for new nodes if it's indeed a range check that can be eliminated (and ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21564#pullrequestreview-2377224737 PR Review Comment: https://git.openjdk.org/jdk/pull/21564#discussion_r1806006352 From roland at openjdk.org Fri Oct 18 07:30:54 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 18 Oct 2024 07:30:54 GMT Subject: RFR: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit [v3] In-Reply-To: References: Message-ID: <7LdbqP4RgXKa964-BAUdRkIdB8iBUTiuX3dyB21uB-4=.3ab80aff-56dd-4877-9cd8-d8b669645b9c@github.com> > That assert checks that during RC elimination, we have either: > > - not updated the limit of the main loop > > - or that the new limit is at the expected control > > The assert fires because the limit was updated but is not at the > expected control. That happens because `new_limit_ctrl` is updated for > a test that it attempts to eliminate before it actually proceeds with > the elimination: if the test can't be eliminated, `new_limit_ctrl` > gets updated anyway. > > While the assert could, maybe, be relaxed (it fires in this case but > nothing is going wrong), it's better, I think, to simply not uselessly > restrict the control of the limit. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21564/files - new: https://git.openjdk.org/jdk/pull/21564/files/ccb5c07c..58b95c1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21564&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21564&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21564.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21564/head:pull/21564 PR: https://git.openjdk.org/jdk/pull/21564 From fjiang at openjdk.org Fri Oct 18 07:59:07 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 18 Oct 2024 07:59:07 GMT Subject: RFR: 8342579: RISC-V: C2: Cleanup effect of killing flag register for call instructs In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 02:24:16 GMT, Fei Yang wrote: > Previously found and discussed at: https://github.com/openjdk/jdk/pull/21406#discussion_r1803232561 > For C2 call nodes, it's not necessary to add effect listing flag register as being killed. > This cleans them up and thus aligns with other CPU platforms. > > Testing on linux-riscv64: > - [x] Tier1 (release build) Looks reasonable. Thanks for the cleanups! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/21576#pullrequestreview-2377301574 From epeter at openjdk.org Fri Oct 18 08:06:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 18 Oct 2024 08:06:49 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v2] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more examples and comments for Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/3c333baf..d716e9a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=00-01 Stats: 120 lines in 2 files changed: 115 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Fri Oct 18 08:06:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 18 Oct 2024 08:06:50 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Thu, 17 Oct 2024 21:42:33 GMT, Vladimir Kozlov wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > For me it is confusing to call `pointer = con + sum_i(scale_i * variable_i)` as "pointer" unless it is Unsafe address which has base address as constant. It misses base address. All out pointer types are correspond to an address of some object in Java heap, out of heap, VM's object or some native (C heap) VM object. > This looks like `address_offset`, `displacement`, ... @vnkozlov thanks for looking at this! >For me it is confusing to call pointer = con + sum_i(scale_i * variable_i) as "pointer" unless it is Unsafe address which has base address as constant. It misses base address. All out pointer types are correspond to an address of some object in Java heap, out of heap, VM's object or some native (C heap) VM object. This looks like address_offset, displacement, ... I added some explanations and examples in the code now. But essencially, any `base` is just another `variable`, with `scale = 1`. Just for adjacency, it does not matter if the variable is some offset or a base address. Of course, there may be some other aliasing analysis tasks that do care if it is an array or not. We can add such detection later, if we need it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2421757827 From epeter at openjdk.org Fri Oct 18 08:06:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 18 Oct 2024 08:06:50 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v2] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Thu, 17 Oct 2024 22:09:56 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/opto/mempointer.hpp line 43: >> >>> 41: // Where each summand_i in summands has the form: >>> 42: // >>> 43: // summand_i = scale_i * variable_i >> >> I see you treat `scale` as compile time integer value (NoOverflowInt) and not Node. Why is the check/filter for that? > > I mean: _Where_ is the check/filter for that? The meat of the parsing happens in `MemPointerDecomposedFormParser::parse_sub_expression`. One good example is this: ![image](https://github.com/user-attachments/assets/a83fd973-6407-45c6-8597-ef51edd7325d) I added the `default` case now with a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1806057740 From epeter at openjdk.org Fri Oct 18 08:06:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 18 Oct 2024 08:06:50 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v2] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Thu, 17 Oct 2024 21:48:00 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more examples and comments for Vladimir > > src/hotspot/share/opto/mempointer.hpp line 353: > >> 351: // >> 352: // array[j] -> array_base + j + con -> 2 summands >> 353: // nativeMemorySegment.get(j) -> null + address + offset + j + con -> 3 summands > > Does this means it supports only Unsafe and Array access? I removed these examples, but instead added extensive examples at the beginning of `mempointer.hpp` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1806058387 From epeter at openjdk.org Fri Oct 18 08:09:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 18 Oct 2024 08:09:32 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v19] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 23:23:12 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix build error Ok, now I have a patch out for review: https://github.com/openjdk/jdk/pull/19970 Can you see if this speeds up the `PutBytesTest` benchmark for you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2421773875 From thartmann at openjdk.org Fri Oct 18 08:21:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 18 Oct 2024 08:21:40 GMT Subject: RFR: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs [v2] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 21:12:28 GMT, Christian Hagedorn wrote: >> ### Assertion Predicates Have the True Projection on the Success Path >> By design, Template and Initialized Assertion Predicates have the true projection on the success path and false projection on the failing path. >> >> ### Is a Node a Template Assertion Predicate? >> Template Assertion Predicates can have an uncommon trap on or a `Halt` node following the failing/false projection. When trying to find out if a node is a Template Assertion Predicate, we call `TemplateAssertionPredicate::is_predicate()` which checks if the provided node is the success projection of a Template Assertion Predicate. This involves checking if the other projection has a `Halt` node or an UCT (L141): >> https://github.com/openjdk/jdk/blob/7a64fbbb9292f4d65a6970206dec1a7d7645046b/src/hotspot/share/opto/predicates.cpp#L135-L144 >> >> ### New `PredicateIterator` Class >> >> [JDK-8340786](https://bugs.openjdk.org/browse/JDK-8340786) introduced a new `PredicateIterator` class to simplify iteration over predicates and replaced code that was doing some custom iteration. >> >> #### Usual Usage >> Normally, we always start from a loop entry and follow the predicates (i.e. reaching a predicate over its success path). >> >> #### Special Usage >> However, I also replaced the predicate skipping in `PhaseIdealLoop::build_loop_late_post_work()` with the new `PredicateIterator` class which could start at any node, including a failing path false projection. >> >> ### Problem: Two Uncommon Traps for a Template Assertion Predicate >> The fuzzer now found a case where a Template Assertion Predicate has a predicate UCT on the failing **and** the success path due to folding away some dead nodes: >> >> ![image](https://github.com/user-attachments/assets/c2a9395e-9eaf-46a0-b978-8847d8e21945) >> >> In the Special Usage mentioned above, we could be using the `PredicateIterator` with `505 IfFalse` as starting node. In the core method `RegularPredicateBlockIterator::for_each()` for the traversal, we call the following with `current` = `505 IfFalse`: >> https://github.com/openjdk/jdk/blob/1ea1f33f66326804ca2892fe0659a9acb7ee72ae/src/hotspot/share/opto/predicates.hpp#L628-L629 >> `TemplateAssertionPredicate::is_predicate()` succeeds because we have a valid Template Assertion Predicate If node and the other projection `504 IfTrue` has a predicate UCT. We then fail when trying to convert the `IfFalse` to an `IfTrue` with `current->as_IfTrue()` on L629. >> >> ### Solution >> The fix is straight forward: `TemplateAssertionPr... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Review Vladimir Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21561#pullrequestreview-2377367273 From thartmann at openjdk.org Fri Oct 18 08:22:52 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 18 Oct 2024 08:22:52 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v23] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Wed, 16 Oct 2024 15:18:52 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > requires c2 enabled for IR tests Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-2377370387 From amitkumar at openjdk.org Fri Oct 18 08:34:22 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 18 Oct 2024 08:34:22 GMT Subject: RFR: 8342409: [s390x] C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 07:16:43 GMT, Lutz Schmidt wrote: >> Make sure LIR_Assembler::emit_unwind_handler() jumps to the slow path directly for unlocking a synchronized method if LM_MONITOR is used. >> On the fast paths assertions are added that the mode is actually handled. >> >> Testing: Tier1 test for fastdebug vm showed no regression. > > Done. > > From: Amit Kumar ***@***.***> > Date: Friday, 18. October 2024 at 06:00 > To: openjdk/jdk ***@***.***> > Cc: Schmidt, Lutz ***@***.***>, Mention ***@***.***> > Subject: Re: [openjdk/jdk] 8342409: [s390x] C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR (PR #21557) > > @RealLucy can I get another approval for this :-) > > ? > Reply to this email directly, view it on GitHub, or unsubscribe. > You are receiving this because you were mentioned.Message ID: ***@***.***> Thanks @RealLucy @reinrich for the approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21557#issuecomment-2421823424 From amitkumar at openjdk.org Fri Oct 18 08:34:22 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 18 Oct 2024 08:34:22 GMT Subject: Integrated: 8342409: [s390x] C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 06:38:57 GMT, Amit Kumar wrote: > Make sure LIR_Assembler::emit_unwind_handler() jumps to the slow path directly for unlocking a synchronized method if LM_MONITOR is used. > On the fast paths assertions are added that the mode is actually handled. > > Testing: Tier1 test for fastdebug vm showed no regression. This pull request has now been integrated. Changeset: 9201e9fc Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/9201e9fcc28cff37cf9996e8db38f9aee7511b1c Stats: 9 lines in 2 files changed: 8 ins; 0 del; 1 mod 8342409: [s390x] C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR Reviewed-by: rrich, lucy ------------- PR: https://git.openjdk.org/jdk/pull/21557 From chagedorn at openjdk.org Fri Oct 18 08:58:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 18 Oct 2024 08:58:18 GMT Subject: RFR: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs [v2] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 21:12:28 GMT, Christian Hagedorn wrote: >> ### Assertion Predicates Have the True Projection on the Success Path >> By design, Template and Initialized Assertion Predicates have the true projection on the success path and false projection on the failing path. >> >> ### Is a Node a Template Assertion Predicate? >> Template Assertion Predicates can have an uncommon trap on or a `Halt` node following the failing/false projection. When trying to find out if a node is a Template Assertion Predicate, we call `TemplateAssertionPredicate::is_predicate()` which checks if the provided node is the success projection of a Template Assertion Predicate. This involves checking if the other projection has a `Halt` node or an UCT (L141): >> https://github.com/openjdk/jdk/blob/7a64fbbb9292f4d65a6970206dec1a7d7645046b/src/hotspot/share/opto/predicates.cpp#L135-L144 >> >> ### New `PredicateIterator` Class >> >> [JDK-8340786](https://bugs.openjdk.org/browse/JDK-8340786) introduced a new `PredicateIterator` class to simplify iteration over predicates and replaced code that was doing some custom iteration. >> >> #### Usual Usage >> Normally, we always start from a loop entry and follow the predicates (i.e. reaching a predicate over its success path). >> >> #### Special Usage >> However, I also replaced the predicate skipping in `PhaseIdealLoop::build_loop_late_post_work()` with the new `PredicateIterator` class which could start at any node, including a failing path false projection. >> >> ### Problem: Two Uncommon Traps for a Template Assertion Predicate >> The fuzzer now found a case where a Template Assertion Predicate has a predicate UCT on the failing **and** the success path due to folding away some dead nodes: >> >> ![image](https://github.com/user-attachments/assets/c2a9395e-9eaf-46a0-b978-8847d8e21945) >> >> In the Special Usage mentioned above, we could be using the `PredicateIterator` with `505 IfFalse` as starting node. In the core method `RegularPredicateBlockIterator::for_each()` for the traversal, we call the following with `current` = `505 IfFalse`: >> https://github.com/openjdk/jdk/blob/1ea1f33f66326804ca2892fe0659a9acb7ee72ae/src/hotspot/share/opto/predicates.hpp#L628-L629 >> `TemplateAssertionPredicate::is_predicate()` succeeds because we have a valid Template Assertion Predicate If node and the other projection `504 IfTrue` has a predicate UCT. We then fail when trying to convert the `IfFalse` to an `IfTrue` with `current->as_IfTrue()` on L629. >> >> ### Solution >> The fix is straight forward: `TemplateAssertionPr... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Review Vladimir Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21561#issuecomment-2421878137 From chagedorn at openjdk.org Fri Oct 18 08:58:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 18 Oct 2024 08:58:21 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v6] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 02:33:58 GMT, Daohan Qu wrote: >> # Description >> >> [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. >> >> But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. >> >> This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". >> >> Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with >> >> >> make test \ >> TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ >> TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" >> >> >> shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) >> >> |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | >> |---|---|---| >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.678 ?(99.9%) 0.574 ops/s | 55.692 ?(99.9%) 4.419 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.792 ?(99.9%) 1.924 ops/s | 64.882 ?(99.9%) 4.175 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 27.023 ?(99.9%) 1.116 ops/s | 66.313 ?(99.9%) 0.802 ops/s | >> >> # Reproduction >> >> Compiled and run the reduced test case `Test.java` in the appendix below using >> >> >> java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test >> >> >> and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and see some output like >> >> >> > Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: > > Bug fix Sorry for the delay to get back to this. But doesn't your fix now completely disable the `base_is_phi` case altogether? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21134#issuecomment-2421873832 From dqu at openjdk.org Fri Oct 18 09:36:56 2024 From: dqu at openjdk.org (Daohan Qu) Date: Fri, 18 Oct 2024 09:36:56 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v6] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 02:33:58 GMT, Daohan Qu wrote: >> # Description >> >> [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. >> >> But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. >> >> This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". >> >> Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with >> >> >> make test \ >> TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ >> TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" >> >> >> shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) >> >> |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | >> |---|---|---| >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.678 ?(99.9%) 0.574 ops/s | 55.692 ?(99.9%) 4.419 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.792 ?(99.9%) 1.924 ops/s | 64.882 ?(99.9%) 4.175 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 27.023 ?(99.9%) 1.116 ops/s | 66.313 ?(99.9%) 0.802 ops/s | >> >> # Reproduction >> >> Compiled and run the reduced test case `Test.java` in the appendix below using >> >> >> java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test >> >> >> and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and see some output like >> >> >> > Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: > > Bug fix It still works for the cases where `mem->in(0) == base->in(0)`. It seems that the code that splits through base phi in [`LoadNode::split_through_phi()`](https://github.com/openjdk/jdk/blob/b4977e887a53c898b96a7d37a3bf94742c7cc194/hotspot/src/share/vm/opto/memnode.cpp#L1284) is moved from [`LoadNode::eliminate_autobox()`](https://github.com/openjdk/jdk/blob/7c367a6025f519bf12b5b57c807470555eb0a673/hotspot/src/share/vm/opto/memnode.cpp#L1187) in the commit https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194 . And `mem->in(0) == base->in(0)` is what the original code requires: https://github.com/openjdk/jdk/blob/7c367a6025f519bf12b5b57c807470555eb0a673/hotspot/src/share/vm/opto/memnode.cpp#L1205-L1240 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21134#issuecomment-2421969464 From eastigeevich at openjdk.org Fri Oct 18 09:42:32 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 18 Oct 2024 09:42:32 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn [v3] In-Reply-To: <4yh-9E3dPvmUGKVoo3sEHCoc_TdBbGdgMJbGQbti50s=.2290e266-e224-4e10-a282-9627f19e3c0c@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> <4yh-9E3dPvmUGKVoo3sEHCoc_TdBbGdgMJbGQbti50s=.2290e266-e224-4e10-a282-9627f19e3c0c@github.com> Message-ID: On Thu, 17 Oct 2024 21:12:35 GMT, Chad Rakoczy wrote: >> [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) >> >> Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. >> >> I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways >> >> Confirmed that added test fails before patch and passes after > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Remove jasm file > - Update test lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/21473#pullrequestreview-2377574826 From epeter at openjdk.org Fri Oct 18 09:50:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 18 Oct 2024 09:50:27 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v3] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: some unsafe and native benchmarks added ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/d716e9a3..53150059 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=01-02 Stats: 60 lines in 1 file changed: 56 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Fri Oct 18 10:08:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 18 Oct 2024 10:08:14 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v3] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <0v6RyV_T1ldF8PyS3TA7cmTPJhjGWHV7kVFkWkL0c08=.deb20bcf-c7f7-40d7-b19e-b1e545c59d23@github.com> On Fri, 18 Oct 2024 09:50:27 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. >> >> **Benchmarks** >> >> I added a few new benchmarks, to show the merging of `Unsafe` and `native` stores. We an see that 8 byte stores are now merged, and have the same performance as a long store. The same for 4 char stores that are merged into a single long store. >> >> ![image](https://github.com/user-attachments/assets/33b5cfcb-919b-46f4-bfa8-69fdff3acf1a) > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > some unsafe and native benchmarks added I added this section to the desctiption above. **Benchmarks** I added a few new benchmarks, to show the merging of `Unsafe` and `native` stores. We an see that 8 byte stores are now merged, and have the same performance as a long store. The same for 4 char stores that are merged into a single long store. ![image](https://github.com/user-attachments/assets/33b5cfcb-919b-46f4-bfa8-69fdff3acf1a) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2422067951 From shade at openjdk.org Fri Oct 18 10:09:15 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 18 Oct 2024 10:09:15 GMT Subject: RFR: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 13:12:11 GMT, Roland Westrelin wrote: > The reason for the crash is that compiled code reads from an object > that's null. All field loads from an object are guarded by a null > check. Where is the null check in that case? After the field load: > > > 0x00007ffaac91261f: 44 8B 69 0C mov r13d, dword ptr [rcx + 0xc] <- field load > 0x00007ffaac912623: 85 C9 test ecx, ecx <- null check (oops!) > 0x00007ffaac912625: 74 5C je 0x7ffaac912683 > > > When the IR graph is constructed for the test case, the field load is > correctly made dependent on the null check (through a `CastPP` node) > but then something happens that's shenandoah specific and that causes > the field load to become dependent on another check so it can execute > before the null check. > > There are several load barriers involved in the process. One of them > is expanded at the null check projection. In the process, control for > the nodes that are control dependent on the null check is updated to > be the region at the end of the just expanded barrier. The `CastPP` > node for the null check gets the `Region` as new control. > > Another barrier is expanded right after that one. The 2 are back to > back. They are merged. The `Region` that the `CastPP` depends on goes > away, the `CastPP` is cloned in both branches at the `Region` and one > of them becomes control dependent on the heap stable test of the first > expanded barrier. At this point, one of the `CastPP` is control > dependent on a heap stable test that's after the null check. But then, > the heap stable test is moved out of loop and 2 copies of the loop are > made so one can run without any overhead from barriers. When that > happens, the `CastPP` becomes dependent on a test that dominates the > null check and so the field load that depends on the `CastPP` can be > scheduled before the null check. > > The fix I propose is not update the control when the barrier is > expanded for nodes that can float when the test they depend on > moves. This way the `CastPP` remains dependent on the null check. Testing says no functional/performance regressions with this patch. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21562#pullrequestreview-2377642539 From duke at openjdk.org Fri Oct 18 10:32:10 2024 From: duke at openjdk.org (Abdelhak Zaaim) Date: Fri, 18 Oct 2024 10:32:10 GMT Subject: RFR: 8342579: RISC-V: C2: Cleanup effect of killing flag register for call instructs In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 02:24:16 GMT, Fei Yang wrote: > Previously found and discussed at: https://github.com/openjdk/jdk/pull/21406#discussion_r1803232561 > For C2 call nodes, it's not necessary to add effect listing flag register as being killed. > This cleans them up and thus aligns with other CPU platforms. > > Testing on linux-riscv64: > - [x] Tier1 (release build) Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/21576#pullrequestreview-2377713558 From aph at openjdk.org Fri Oct 18 11:59:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Oct 2024 11:59:06 GMT Subject: RFR: 8342540: InterfaceCalls micro-benchmark gives misleading results Message-ID: `InterfaceCalls.java` makes highly predictable memory accesses, which leads to a gross time underestimate of the case where a megamorphic access is unpredictable. Here's one example, with and without randomization. The unpredictable megamorphic call takes more than 4* as long as the benchmark. Benchmark (randomized) Mode Cnt Score Error Units InterfaceCalls.test2ndInt3Types false avgt 4 5.034 ? 0.219 ns/op InterfaceCalls.test2ndInt3Types true avgt 4 23.407 ? 0.475 ns/op ``` This patch adds the "scrambled" parameter, which allows the measurement of predictable and unpredictable megamorphic calls. ------------- Commit messages: - 8342540: InterfaceCalls micro-benchmark gives misleading results Changes: https://git.openjdk.org/jdk/pull/21581/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21581&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342540 Stats: 33 lines in 1 file changed: 21 ins; 6 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21581/head:pull/21581 PR: https://git.openjdk.org/jdk/pull/21581 From epeter at openjdk.org Fri Oct 18 13:17:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 18 Oct 2024 13:17:31 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v3] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 18 Oct 2024 09:50:27 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. >> >> **Benchmarks** >> >> I added a few new benchmarks, to show the merging of `Unsafe` and `native` stores. We an see that 8 byte stores are now merged, and have the same performance as a long store. The same for 4 char stores that are merged into a single long store. >> >> ![image](https://github.com/user-attachments/assets/33b5cfcb-919b-46f4-bfa8-69fdff3acf1a) >> >> And here the whole `MergeStores` benchma... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > some unsafe and native benchmarks added I added this to the desciption: **What this change enables** Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). Now we can do: Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. Merging `Unsafe` stores to native memory. Merging `MemorySegment`: with array, native, ByteBuffer backing types. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2422454623 From lucy at openjdk.org Fri Oct 18 13:44:26 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 18 Oct 2024 13:44:26 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v2] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 13:07:49 GMT, Amit Kumar wrote: >> Add match rules for UDivI, UModI, UDivL, UModL. And also adds `dlr` and `dlgr` instruction. >> >> Tier1 test are clean for fastdebug vm; >> >> Before this patch, `compiler/c2/TestDivModNodes.java` was failing (see jbs issue) but with this patch test is passing. >> >> Without Patch: >> >> >> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units >> IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 1935.176 ? 2.191 ns/op >> IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 1934.915 ? 3.207 ns/op >> IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 1934.325 ? 1.108 ns/op >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 15 1809.782 ? 49.341 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 15 1769.326 ? 2.607 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 15 1784.053 ? 71.190 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 2026.978 ? 1.534 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 2028.039 ? 3.812 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 2437.843 ? 636.808 ns/op >> Finished running test 'micro:java.lang.IntegerDivMod' >> >> >> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units >> LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 4524.897 ? 16.566 ns/op >> LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 4373.714 ? 9.514 ns/op >> LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2018.309 ? 1.788 ns/op >> LongDivMod.testDivideUnsigned 1024 mixed avgt 15 4320.382 ? 19.055 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 15 3988.953 ? 8.770 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 15 1069.703 ? 1.525 ns/op >> LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 5589.319 ? 4.247 ns/op >> LongDivMod.testRemainderUnsigned 1024 positive avgt 15 3904.555 ? 3.191 ns/op >> LongDivMod.testRemainderUnsigned 10... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > changes leftmost->upper and rightmost -> lower LGTM. One minor formatting suggestion. src/hotspot/cpu/s390/s390.ad line 6263: > 6261: effect(TEMP r4_reven_tmp, KILL cr); > 6262: // TODO: size(4); > 6263: format %{ "UDIV $r5_rodd_dst, $r5_rodd_dst,$src2" %} Suggestion: no whitespace between instruction operands. src/hotspot/cpu/s390/s390.ad line 6343: > 6341: ins_cost(DEFAULT_COST); > 6342: // TODO: size(4); > 6343: format %{ "UDIVG $r5_rodd_dst, $r5_rodd_dst, $src" %} Suggestion: no whitespace between instruction operands. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21559#pullrequestreview-2378073572 PR Review Comment: https://git.openjdk.org/jdk/pull/21559#discussion_r1806490141 PR Review Comment: https://git.openjdk.org/jdk/pull/21559#discussion_r1806490519 From duke at openjdk.org Fri Oct 18 13:59:23 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 18 Oct 2024 13:59:23 GMT Subject: RFR: 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output Message-ID: The `compiler/jvmci/TestJVMCISavedProperties` test fails due to overlapping output from the saved system properties. The initialization of `savedProperties` in `jdk.vm.ci.services.Services` is correctly synchronized, the issue suggests that two separate libjvmci compiler isolates are each printing their own set of saved properties. In a successful test run, the `CompileBroker` thread aborts the VM before it completes initialization, displaying the error message `Cannot use JVMCI compiler: Value of jvmci.Compiler is ?null?` (due to the `-Djvmci.Compiler=null` setting), and the message `DONE IN MAIN` is never printed. However, in the failed test output, the `DONE IN MAIN` message appears, indicating that the VM initialization completed and created the `JVMCIRuntime` instance. The `CompileBroker` thread might have concurrently initialized `JVMCIRuntime` in another isolate. Since each `JVMCIRuntime` initialization outputs system properties, this is likely the cause of the overlapping output. The proposed solution is to use the `-XX:+EnableJVMCI` flag instead of `-XX:+UseJVMCICompiler`, to avoid this issue. ------------- Commit messages: - 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output Changes: https://git.openjdk.org/jdk/pull/21583/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21583&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342295 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21583/head:pull/21583 PR: https://git.openjdk.org/jdk/pull/21583 From dnsimon at openjdk.org Fri Oct 18 14:07:09 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 18 Oct 2024 14:07:09 GMT Subject: RFR: 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 13:54:14 GMT, Tom?? Zezula wrote: > The `compiler/jvmci/TestJVMCISavedProperties` test fails due to overlapping output from the saved system properties. The initialization of `savedProperties` in `jdk.vm.ci.services.Services` is correctly synchronized, the issue suggests that two separate libjvmci compiler isolates are each printing their own set of saved properties. > > In a successful test run, the `CompileBroker` thread aborts the VM before it completes initialization, displaying the error message `Cannot use JVMCI compiler: Value of jvmci.Compiler is ?null?` (due to the `-Djvmci.Compiler=null` setting), and the message `DONE IN MAIN` is never printed. However, in the failed test output, the `DONE IN MAIN` message appears, indicating that the VM initialization completed and created the `JVMCIRuntime` instance. The `CompileBroker` thread might have concurrently initialized `JVMCIRuntime` in another isolate. Since each `JVMCIRuntime` initialization outputs system properties, this is likely the cause of the overlapping output. > > The proposed solution is to use the `-XX:+EnableJVMCI` flag instead of `-XX:+UseJVMCICompiler`, to avoid this issue. Looks good to me - thanks for fixing it. ------------- PR Review: https://git.openjdk.org/jdk/pull/21583#pullrequestreview-2378172430 From shade at openjdk.org Fri Oct 18 14:08:49 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 18 Oct 2024 14:08:49 GMT Subject: RFR: 8342540: InterfaceCalls micro-benchmark gives misleading results In-Reply-To: References: Message-ID: <9FV9ocK5WUg_Oer8NsJ1nHAatz7omIYcHH4ltbAvv-I=.2a4d5b2e-e446-4824-a76d-8458013692c5@github.com> On Fri, 18 Oct 2024 11:53:06 GMT, Andrew Haley wrote: > `InterfaceCalls.java` makes highly predictable memory accesses, which leads to a gross time underestimate of the case where a megamorphic access is unpredictable. > > Here's one example, with and without randomization. The unpredictable megamorphic call takes more than 4* as long as the benchmark. > > > Benchmark (randomized) Mode Cnt Score Error Units > InterfaceCalls.test2ndInt3Types false avgt 4 5.013 ? 0.081 ns/op > InterfaceCalls.test2ndInt3Types true avgt 4 23.421 ? 0.102 ns/op > ``` > > This patch adds the "randomized" parameter, which allows the measurement of predictable and unpredictable megamorphic calls. Is there even a point to do non-randomized test then? test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java line 51: > 49: // Whether to step iteratively through the list of interfaces, or > 50: // to select one in an unpredictable way. > 51: @Param({"false", "true"}) private boolean randomized; Suggestion: @Param({"false", "true"}) private boolean randomized; ------------- PR Review: https://git.openjdk.org/jdk/pull/21581#pullrequestreview-2378163030 PR Review Comment: https://git.openjdk.org/jdk/pull/21581#discussion_r1806543858 From roland at openjdk.org Fri Oct 18 14:09:56 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 18 Oct 2024 14:09:56 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v23] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <2RUCEWiXrc3iP_TGtOhRu5A6w8tqzMxeUsnQpUt5uJ0=.b6f3a627-977d-44ac-a2ae-f17c548991b0@github.com> On Wed, 16 Oct 2024 15:18:52 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > requires c2 enabled for IR tests Marked as reviewed by roland (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-2378178492 From duke at openjdk.org Fri Oct 18 14:09:56 2024 From: duke at openjdk.org (duke) Date: Fri, 18 Oct 2024 14:09:56 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v23] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <4ZoZx1UjRwb2-GP3e7mGPJWtYt3huW9WAVovTWCayg4=.25437eca-89d6-422b-88e3-5d388b319041@github.com> On Wed, 16 Oct 2024 15:18:52 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > requires c2 enabled for IR tests @tabjy Your change (at version 04b2c6adcacaeb5e372055419606b71d6fc84e49) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2422561935 From rkennke at openjdk.org Fri Oct 18 14:31:36 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 18 Oct 2024 14:31:36 GMT Subject: RFR: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 13:12:11 GMT, Roland Westrelin wrote: > The reason for the crash is that compiled code reads from an object > that's null. All field loads from an object are guarded by a null > check. Where is the null check in that case? After the field load: > > > 0x00007ffaac91261f: 44 8B 69 0C mov r13d, dword ptr [rcx + 0xc] <- field load > 0x00007ffaac912623: 85 C9 test ecx, ecx <- null check (oops!) > 0x00007ffaac912625: 74 5C je 0x7ffaac912683 > > > When the IR graph is constructed for the test case, the field load is > correctly made dependent on the null check (through a `CastPP` node) > but then something happens that's shenandoah specific and that causes > the field load to become dependent on another check so it can execute > before the null check. > > There are several load barriers involved in the process. One of them > is expanded at the null check projection. In the process, control for > the nodes that are control dependent on the null check is updated to > be the region at the end of the just expanded barrier. The `CastPP` > node for the null check gets the `Region` as new control. > > Another barrier is expanded right after that one. The 2 are back to > back. They are merged. The `Region` that the `CastPP` depends on goes > away, the `CastPP` is cloned in both branches at the `Region` and one > of them becomes control dependent on the heap stable test of the first > expanded barrier. At this point, one of the `CastPP` is control > dependent on a heap stable test that's after the null check. But then, > the heap stable test is moved out of loop and 2 copies of the loop are > made so one can run without any overhead from barriers. When that > happens, the `CastPP` becomes dependent on a test that dominates the > null check and so the field load that depends on the `CastPP` can be > scheduled before the null check. > > The fix I propose is not update the control when the barrier is > expanded for nodes that can float when the test they depend on > moves. This way the `CastPP` remains dependent on the null check. Looks good to me. Thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21562#pullrequestreview-2378292732 From jbhateja at openjdk.org Fri Oct 18 14:44:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 18 Oct 2024 14:44:04 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 18 Oct 2024 02:03:21 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Review resolutions > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction I re-evaluated the solution and feel that lowering pass will compliment such transformation, specially in light of re-wiring logic to directly feed the pattern inputs to Multiplier, while x86 VMULUDQ expects to operate on lower doubleword of each quadword lane, AARCH64 SVE has instructions which considers upper doubleword of quadword multiplier and multiplicand and hence can optimize following pattern too ` MulVL ( SRC1 << 32 ) * ( SRC2 << 32 ) ` https://www.felixcloutier.com/x86/pmuludq https://dougallj.github.io/asil/doc/umullt_z_zz_32.html I am in process of introducing a PhaseLowering which will have target specific IR transformations for nodes of interest, till then moving the PR to draft stage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2422634178 From jbhateja at openjdk.org Fri Oct 18 14:56:29 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 18 Oct 2024 14:56:29 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v30] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Replacing flag based checks with CPU feature checks in IR validation test. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/3ee0de07..dacc9313 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=28-29 Stats: 16 lines in 1 file changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From amitkumar at openjdk.org Fri Oct 18 15:55:24 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 18 Oct 2024 15:55:24 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v3] In-Reply-To: References: Message-ID: > Add match rules for UDivI, UModI, UDivL, UModL. And also adds `dlr` and `dlgr` instruction. > > Tier1 test are clean for fastdebug vm; > > Before this patch, `compiler/c2/TestDivModNodes.java` was failing (see jbs issue) but with this patch test is passing. > > Without Patch: > > > Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units > IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 1935.176 ? 2.191 ns/op > IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 1934.915 ? 3.207 ns/op > IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 1934.325 ? 1.108 ns/op > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 15 1809.782 ? 49.341 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 15 1769.326 ? 2.607 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 15 1784.053 ? 71.190 ns/op > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 2026.978 ? 1.534 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 2028.039 ? 3.812 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 2437.843 ? 636.808 ns/op > Finished running test 'micro:java.lang.IntegerDivMod' > > > Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units > LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 4524.897 ? 16.566 ns/op > LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 4373.714 ? 9.514 ns/op > LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2018.309 ? 1.788 ns/op > LongDivMod.testDivideUnsigned 1024 mixed avgt 15 4320.382 ? 19.055 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 15 3988.953 ? 8.770 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 15 1069.703 ? 1.525 ns/op > LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 5589.319 ? 4.247 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 15 3904.555 ? 3.191 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 15 1765.761 ? 1.539 ns/op > Finished ... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: removes extra whitespaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21559/files - new: https://git.openjdk.org/jdk/pull/21559/files/c94cc448..1750571a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21559&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21559&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21559.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21559/head:pull/21559 PR: https://git.openjdk.org/jdk/pull/21559 From amitkumar at openjdk.org Fri Oct 18 15:55:25 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 18 Oct 2024 15:55:25 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v2] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 13:25:46 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> changes leftmost->upper and rightmost -> lower > > src/hotspot/cpu/s390/s390.ad line 6263: > >> 6261: effect(TEMP r4_reven_tmp, KILL cr); >> 6262: // TODO: size(4); >> 6263: format %{ "UDIV $r5_rodd_dst, $r5_rodd_dst,$src2" %} > > Suggestion: no whitespace between instruction operands. removed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21559#discussion_r1806714036 From syan at openjdk.org Fri Oct 18 16:00:13 2024 From: syan at openjdk.org (SendaoYan) Date: Fri, 18 Oct 2024 16:00:13 GMT Subject: RFR: 8342612: Increase memory usage of compiler/c2/TestScalarReplacementMaxLiveNodes.java Message-ID: Hi all, The test `test/hotspot/jtreg/compiler/c2/TestScalarReplacementMaxLiveNodes.java` fails on linux-x64/macos-x64/macos-aarch64/windows-x64. To make less CI noisy, we can simply increase the max memory usage before the failure root cause been fixed. The change has been verified locally. Test-fix only, no risk. ------------- Commit messages: - 8342612: Increase memory usage of compiler/c2/TestScalarReplacementMaxLiveNodes.java Changes: https://git.openjdk.org/jdk/pull/21586/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21586&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342612 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21586/head:pull/21586 PR: https://git.openjdk.org/jdk/pull/21586 From aph at openjdk.org Fri Oct 18 16:18:02 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Oct 2024 16:18:02 GMT Subject: RFR: 8342540: InterfaceCalls micro-benchmark gives misleading results [v2] In-Reply-To: References: Message-ID: > `InterfaceCalls.java` makes highly predictable memory accesses, which leads to a gross time underestimate of the case where a megamorphic access is unpredictable. > > Here's one example, with and without randomization. The unpredictable megamorphic call takes more than 4* as long as the benchmark. > > > Benchmark (randomized) Mode Cnt Score Error Units > InterfaceCalls.test2ndInt3Types false avgt 4 5.013 ? 0.081 ns/op > InterfaceCalls.test2ndInt3Types true avgt 4 23.421 ? 0.102 ns/op > ``` > > This patch adds the "randomized" parameter, which allows the measurement of predictable and unpredictable megamorphic calls. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Update test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21581/files - new: https://git.openjdk.org/jdk/pull/21581/files/c3fa63c5..52f80b28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21581&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21581&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21581/head:pull/21581 PR: https://git.openjdk.org/jdk/pull/21581 From aph at openjdk.org Fri Oct 18 16:24:50 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 18 Oct 2024 16:24:50 GMT Subject: RFR: 8342540: InterfaceCalls micro-benchmark gives misleading results [v2] In-Reply-To: <9FV9ocK5WUg_Oer8NsJ1nHAatz7omIYcHH4ltbAvv-I=.2a4d5b2e-e446-4824-a76d-8458013692c5@github.com> References: <9FV9ocK5WUg_Oer8NsJ1nHAatz7omIYcHH4ltbAvv-I=.2a4d5b2e-e446-4824-a76d-8458013692c5@github.com> Message-ID: On Fri, 18 Oct 2024 14:05:48 GMT, Aleksey Shipilev wrote: > Is there even a point to do non-randomized test then? That's a very interesting question, and one that has been occupying me since I discovered this problem a few days ago. I have no idea how often well-predicted megamorphic calls occur. I can speculate that the "typical" megamorphic case lies somewhere between these extremes, but that is all. It may well be that normal behaviour is chaotic, but I strongly suspect that the cases are unlikely to be equally probable, as they are here. So, it is possible that an utterly unpredictable access pattern is just as unrealistic as a perfectly predictable one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21581#issuecomment-2422827754 From chagedorn at openjdk.org Fri Oct 18 16:32:56 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 18 Oct 2024 16:32:56 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v23] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Wed, 16 Oct 2024 15:18:52 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > requires c2 enabled for IR tests Two suggestions but otherwise, looks good to me, too. src/hotspot/share/opto/loopnode.cpp line 3951: > 3949: // int a = init2 > 3950: // for (int phi = init; phi < limit; phi += stride_con) { > 3951: // a = init2 + (phi - init) * (stride_con2 / stride_con) A nit but in this pseudo code, shouldn't it be the post-incremented iv (i.e. `phi + stride_con`)? Because at that point, `phi` is not incremented, yet. You eventually replace the "parallel iv" (`phi2` in the actual code) which in the graph is the pre-incremented iv. Maybe naming the iv `phi` is a little bit confusing. You could also extend this to make it more explicit: int iv2 = init2 int iv = init loop: if (phi >= limit) goto exit phi += stride_con iv2 = init2 + (iv - init) * (stride_con2 / stride_con) goto loop exit: ... src/hotspot/share/opto/loopnode.cpp line 4036: > 4034: jlong ratio_con = stride_con2 / stride_con; > 4035: > 4036: if ((ratio_con * stride_con) == stride_con2) { // Check for exact Was like that before but it might be easier to read when we negate this into an explicit skip. Otherwise, you first need to find the closing brace to ensure that we are not doing more things for the case when it's not equal: if ((ratio_con * stride_con) != stride_con2) { // Not an integer multiple. continue; } ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-2378529239 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1806747334 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1806748330 From duke at openjdk.org Fri Oct 18 16:43:42 2024 From: duke at openjdk.org (duke) Date: Fri, 18 Oct 2024 16:43:42 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn [v3] In-Reply-To: <4yh-9E3dPvmUGKVoo3sEHCoc_TdBbGdgMJbGQbti50s=.2290e266-e224-4e10-a282-9627f19e3c0c@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> <4yh-9E3dPvmUGKVoo3sEHCoc_TdBbGdgMJbGQbti50s=.2290e266-e224-4e10-a282-9627f19e3c0c@github.com> Message-ID: On Thu, 17 Oct 2024 21:12:35 GMT, Chad Rakoczy wrote: >> [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) >> >> Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. >> >> I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways >> >> Confirmed that added test fails before patch and passes after > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Remove jasm file > - Update test @chadrako Your change (at version dfbe03de3548e1b4549d44be8a4712bdd94950d4) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21473#issuecomment-2422857987 From psandoz at openjdk.org Fri Oct 18 16:55:05 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 18 Oct 2024 16:55:05 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v25] In-Reply-To: References: <28MMdFkPrGQkT3nD8dUA2GAx6uZyRwERWn3bVvEPSR8=.79c2b5e5-7317-47c5-8878-30c4ac3171ab@github.com> Message-ID: On Thu, 17 Oct 2024 23:35:36 GMT, Paul Sandoz wrote: > > I have restricted the IR validation to just newly added UMin/UMax transformations. > > Even then i think it better to do so in follow on PR, otherwise it is a moving target for review and testing. This new test fails on aarch64 e.g., > > ``` > STATUS:Failed.`main' threw exception: compiler.lib.ir_framework.shared.TestFormatException: Violations (16) --------------- - Could not find VM flag "UseAVX" in @IR rule 1 at public void compiler.vectorapi.VectorUnsignedMinMaxIRTransformsTest.umin_ir_transform1_byte() > ... > ``` > > Testing tier 1 to 3 with latest PR looks good otherwise. The updated test passes with latest changes. You will need a HotSpot re-review including the new test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2422875427 From duke at openjdk.org Fri Oct 18 17:03:28 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 18 Oct 2024 17:03:28 GMT Subject: RFR: 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output [v2] In-Reply-To: References: Message-ID: <3HZPah5ZpaIbiYIwkxanAJc5MJdHuBbTwZkRdoJ6zZg=.dfacac9e-8616-4e2b-a438-9bc872777598@github.com> > The `compiler/jvmci/TestJVMCISavedProperties` test fails due to overlapping output from the saved system properties. The initialization of `savedProperties` in `jdk.vm.ci.services.Services` is correctly synchronized, the issue suggests that two separate libjvmci compiler isolates are each printing their own set of saved properties. > > In a successful test run, the `CompileBroker` thread aborts the VM before it completes initialization, displaying the error message `Cannot use JVMCI compiler: Value of jvmci.Compiler is ?null?` (due to the `-Djvmci.Compiler=null` setting), and the message `DONE IN MAIN` is never printed. However, in the failed test output, the `DONE IN MAIN` message appears, indicating that the VM initialization completed and created the `JVMCIRuntime` instance. The `CompileBroker` thread might have concurrently initialized `JVMCIRuntime` in another isolate. Since each `JVMCIRuntime` initialization outputs system properties, this is likely the cause of the overlapping output. > > The proposed solution is to use the `-XX:+EnableJVMCI` flag instead of `-XX:+UseJVMCICompiler`, to avoid this issue. Tom?? Zezula has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21583/files - new: https://git.openjdk.org/jdk/pull/21583/files/306efecb..7849ba67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21583&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21583&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21583/head:pull/21583 PR: https://git.openjdk.org/jdk/pull/21583 From dnsimon at openjdk.org Fri Oct 18 17:03:29 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 18 Oct 2024 17:03:29 GMT Subject: RFR: 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output [v2] In-Reply-To: <3HZPah5ZpaIbiYIwkxanAJc5MJdHuBbTwZkRdoJ6zZg=.dfacac9e-8616-4e2b-a438-9bc872777598@github.com> References: <3HZPah5ZpaIbiYIwkxanAJc5MJdHuBbTwZkRdoJ6zZg=.dfacac9e-8616-4e2b-a438-9bc872777598@github.com> Message-ID: On Fri, 18 Oct 2024 16:58:30 GMT, Tom?? Zezula wrote: >> The `compiler/jvmci/TestJVMCISavedProperties` test fails due to overlapping output from the saved system properties. The initialization of `savedProperties` in `jdk.vm.ci.services.Services` is correctly synchronized, the issue suggests that two separate libjvmci compiler isolates are each printing their own set of saved properties. >> >> In a successful test run, the `CompileBroker` thread aborts the VM before it completes initialization, displaying the error message `Cannot use JVMCI compiler: Value of jvmci.Compiler is ?null?` (due to the `-Djvmci.Compiler=null` setting), and the message `DONE IN MAIN` is never printed. However, in the failed test output, the `DONE IN MAIN` message appears, indicating that the VM initialization completed and created the `JVMCIRuntime` instance. The `CompileBroker` thread might have concurrently initialized `JVMCIRuntime` in another isolate. Since each `JVMCIRuntime` initialization outputs system properties, this is likely the cause of the overlapping output. >> >> The proposed solution is to use the `-XX:+EnableJVMCI` flag instead of `-XX:+UseJVMCICompiler`, to avoid this issue. > > Tom?? Zezula has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output Still looks fine. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21583#pullrequestreview-2378590755 PR Review: https://git.openjdk.org/jdk/pull/21583#pullrequestreview-2378591709 From kvn at openjdk.org Fri Oct 18 18:12:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 18 Oct 2024 18:12:07 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v2] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 07:14:03 GMT, Amit Kumar wrote: > > you could also just cast to double at every use site. Would that also work? > > is that required ? Aren't integers, by default, will be treated as double if they are multiplied by a double data type value ? Yes, you are right. This RFE could be NOP. My suggestion in [JDK-8333098 PR](https://github.com/openjdk/jdk/pull/20615) was based on assumption that these flags may cause rounding issue if they are used in integer expressions. Or result of double expression is converted into integer. But you show that they used with double value `scale` which cast these flags values into double and only in compare expressions. The only other place I found is: compilationPolicy.hpp: static int min_invocations() { return Tier4MinInvocationThreshold; } But it is used only double expression again: bytecodeInfo.cpp: double min_freq = MAX2(MinInlineFrequencyRatio, 1.0 / CompilationPolicy::min_invocations()); ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2422988943 From kvn at openjdk.org Fri Oct 18 18:17:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 18 Oct 2024 18:17:11 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v3] In-Reply-To: References: Message-ID: <-QDrBuNXRgjUv44lHKDOyIlO5kCjLii3OoQdnbh-8N4=.96977439-1cd9-466e-8bf7-e997f6674c21@github.com> On Fri, 18 Oct 2024 03:49:25 GMT, Amit Kumar wrote: >> This is trivial PR to change data type of some "*Threshold" variables form `intx` to `double`. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > review/comments I think we should close this RFE as "Not an Issue". Change `!=0` to `>` in assert may not be need too because there is check for negative value in ciMethod.cpp: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciMethod.cpp#L148 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2422995219 From amitkumar at openjdk.org Fri Oct 18 18:43:05 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 18 Oct 2024 18:43:05 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v3] In-Reply-To: <-QDrBuNXRgjUv44lHKDOyIlO5kCjLii3OoQdnbh-8N4=.96977439-1cd9-466e-8bf7-e997f6674c21@github.com> References: <-QDrBuNXRgjUv44lHKDOyIlO5kCjLii3OoQdnbh-8N4=.96977439-1cd9-466e-8bf7-e997f6674c21@github.com> Message-ID: On Fri, 18 Oct 2024 18:12:58 GMT, Vladimir Kozlov wrote: > I think we should close this RFE as "Not an Issue". Sure I will close it. Thanks for the inputs :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2423034795 From amitkumar at openjdk.org Fri Oct 18 18:43:05 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 18 Oct 2024 18:43:05 GMT Subject: Withdrawn: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double In-Reply-To: References: Message-ID: <_bIGA67KoatNdF8B9lZhEDlSqV8VLfUeb8lG4L_fAcE=.99d0eeca-9586-4692-a2ef-cd9867ef50fc@github.com> On Fri, 4 Oct 2024 10:39:25 GMT, Amit Kumar wrote: > This is trivial PR to change data type of some "*Threshold" variables form `intx` to `double`. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21354 From kvn at openjdk.org Fri Oct 18 18:44:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 18 Oct 2024 18:44:03 GMT Subject: RFR: 8342612: Increase memory usage of compiler/c2/TestScalarReplacementMaxLiveNodes.java In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 15:55:24 GMT, SendaoYan wrote: > Hi all, > The test `test/hotspot/jtreg/compiler/c2/TestScalarReplacementMaxLiveNodes.java` fails on linux-x64/macos-x64/macos-aarch64/windows-x64. To make less CI noisy, we can simply increase the max memory usage before the failure root cause been fixed. > The change has been verified locally. Test-fix only, no risk. Good and trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21586#pullrequestreview-2378759447 From kvn at openjdk.org Fri Oct 18 18:50:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 18 Oct 2024 18:50:41 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v3] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 18 Oct 2024 07:59:37 GMT, Emanuel Peter wrote: >> I mean: _Where_ is the check/filter for that? > > The meat of the parsing happens in `MemPointerDecomposedFormParser::parse_sub_expression`. > > One good example is this: > ![image](https://github.com/user-attachments/assets/a83fd973-6407-45c6-8597-ef51edd7325d) > > I added the `default` case now with a comment. okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1806896907 From kvn at openjdk.org Fri Oct 18 18:58:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 18 Oct 2024 18:58:00 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v3] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <0VARjc68yQOvAyOPenxR3gtrhE-n9nwr3lCL5RQgKI4=.fd02333d-cbd3-4f74-ad1d-e5f23d4f644e@github.com> On Fri, 18 Oct 2024 09:50:27 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > some unsafe and native benchmarks added This is more clear now. src/hotspot/share/opto/mempointer.hpp line 39: > 37: // We parse / decompose pointers into a linear form: > 38: // > 39: // pointer = con + sum_i(scale_i * variable_i) May be you can swap them `sum_i(variable_i * scale_i) + con` src/hotspot/share/opto/mempointer.hpp line 45: > 43: // > 44: // For the MemPointer, we do not explicitly track base address. For Java heap pointers, the > 45: // base address is just a variable. For native memory (C heap) pointers, the base address is May be: "For Java heap pointers, the base address is just a variable in summand with `scale` == 1. ------------- PR Review: https://git.openjdk.org/jdk/pull/19970#pullrequestreview-2378776493 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1806902389 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1806900522 From duke at openjdk.org Fri Oct 18 19:17:53 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 18 Oct 2024 19:17:53 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory Message-ID: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change Ran array copy and tier 1on aarch64 machine Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 ============================== ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 jtreg:test/jdk:tier1 2436 2436 0 0 jtreg:test/langtools:tier1 4577 4577 0 0 jtreg:test/jaxp:tier1 0 0 0 0 jtreg:test/lib-test:tier1 34 34 0 0 ============================== ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/21589/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21589&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342601 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21589/head:pull/21589 PR: https://git.openjdk.org/jdk/pull/21589 From duke at openjdk.org Fri Oct 18 20:23:46 2024 From: duke at openjdk.org (hanklo6) Date: Fri, 18 Oct 2024 20:23:46 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v6] In-Reply-To: References: Message-ID: > Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > ### Generate test instructions > With `binutils = 2.43` > * `python3 x86-asmtest.py > asmtest.out.h` > ### Run test > * `make test TEST="gtest:AssemblerX86"` hanklo6 has updated the pull request incrementally with three additional commits since the last revision: - Add pushp and popq - Format instructions - Support other addressing mode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20857/files - new: https://git.openjdk.org/jdk/pull/20857/files/da9c54bc..bfd44632 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=04-05 Stats: 50894 lines in 2 files changed: 229 ins; 1 del; 50664 mod Patch: https://git.openjdk.org/jdk/pull/20857.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20857/head:pull/20857 PR: https://git.openjdk.org/jdk/pull/20857 From kxu at openjdk.org Fri Oct 18 21:04:38 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 18 Oct 2024 21:04:38 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v24] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: update comment pseudo code, improve readability with explicit skip ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/04b2c6ad..c37484ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=22-23 Stats: 53 lines in 1 file changed: 9 ins; 2 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Fri Oct 18 21:10:55 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Fri, 18 Oct 2024 21:10:55 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v23] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Fri, 18 Oct 2024 16:21:33 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> requires c2 enabled for IR tests > > src/hotspot/share/opto/loopnode.cpp line 3951: > >> 3949: // int a = init2 >> 3950: // for (int phi = init; phi < limit; phi += stride_con) { >> 3951: // a = init2 + (phi - init) * (stride_con2 / stride_con) > > A nit but in this pseudo code, shouldn't it be the post-incremented iv (i.e. `phi + stride_con`)? Because at that point, `phi` is not incremented, yet. You eventually replace the "parallel iv" (`phi2` in the actual code) which in the graph is the pre-incremented iv. Maybe naming the iv `phi` is a little bit confusing. > > You could also extend this to make it more explicit: > > int iv2 = init2 > int iv = init > loop: > if (phi >= limit) goto exit > phi += stride_con > iv2 = init2 + (iv - init) * (stride_con2 / stride_con) > goto loop > exit: > ... Good catch! I found it quite difficult to write pseudo code expressing the IR structure after transformation. I've updated the comment (mostly taking your example). Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1807017876 From duke at openjdk.org Fri Oct 18 21:14:22 2024 From: duke at openjdk.org (hanklo6) Date: Fri, 18 Oct 2024 21:14:22 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v4] In-Reply-To: References: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> Message-ID: On Thu, 17 Oct 2024 02:07:59 GMT, Jatin Bhateja wrote: >> test/hotspot/gtest/x86/x86-asmtest.py line 655: >> >>> 653: } >>> 654: >>> 655: for RegOp, ops in instruction_set.items(): >> >> Rest of the code is modular, can you kindly refactor below code into a top level routine called from __main__ method. > > Binutils toolset is specific to Linux. Should we not add relevant OS.name check and exit? Thanks, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1807019816 From ascarpino at openjdk.org Fri Oct 18 21:22:59 2024 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Fri, 18 Oct 2024 21:22:59 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v5] In-Reply-To: References: Message-ID: <37LXaInlMtR31NG9ziFUASr-bCMyYGsrlBQyoQjTXLQ=.71c6b447-8671-4631-b668-195760179a09@github.com> On Wed, 16 Oct 2024 23:13:45 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' of https://git.openjdk.org/jdk into sha-512 > - Updated code as per review comments > - Addressed a review comment > - Updated code as per review comment & updated test case > - Updated AMD64.java > - Merge master > - SHA-512 implementation using SHA-NI instructions Tier 1-3 passed on windows-x64, linux-x64, and macos-aarch64 ------------- Marked as reviewed by ascarpino (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20633#pullrequestreview-2378979993 From duke at openjdk.org Fri Oct 18 21:23:02 2024 From: duke at openjdk.org (hanklo6) Date: Fri, 18 Oct 2024 21:23:02 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v4] In-Reply-To: References: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> Message-ID: On Thu, 17 Oct 2024 10:15:36 GMT, Jatin Bhateja wrote: >> hanklo6 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into apx-test-tool >> - Add comment and defined >> - Add copyright header >> - Remove tab >> - Remove whitespace >> - Replace whitespace with tab >> - Add flag before testing >> - Fix assertion error on MacOS >> - Add _LP64 flag >> - Add missing header >> - ... and 6 more: https://git.openjdk.org/jdk/compare/890d6691...ca48f240 > > test/hotspot/gtest/x86/asmtest.out.h line 1: > >> 1: // BEGIN Generated code -- do not edit > > All the memory operand instructions being validated are checking for only one kind of memory addressing mode which is > `- BASE + INDEX` > We should also check for following flavors for at least some instructions :- > > - BASE > - INDEX * SCALE + DISPLACEMENT > - BASE + INDEX + DISPLACEMENT > - BASE + INDEX * SCALE + DISPLACEMENT > > > Where BASE and INDEX are EGPRs. Done. I randomly generated different scales and displacements for an instruction. Please let me know if we need to test all possible scales for an instruction. > test/hotspot/gtest/x86/asmtest.out.h line 1: > >> 1: // BEGIN Generated code -- do not edit > > Can you also emit the instruction IDs in the comments against each row in insns_strs and insns_lens tables, it > e.g. > > > // Generated by x86-asmtest.py > __ shldl(rcx, rdx); // {load}shld ecx, edx IID0 > __ shldl(rdx, rbx); // {load}shld edx, ebx IID1 > ...... > ..... > static const uint8_t insns[] = > { > 0x0f, 0xa5, 0xd1, // IID0 > 0x0f, 0xa5, 0xda, // IID1 > ... > static const unsigned int insns_lens[] = > { > 3, // IID0 > 3, // IID1 > #ifdef _LP64 > ...... > static const char* insns_strs[] = > { > "__ shldl(rcx, rdx);", // IID0 > "__ shldl(rdx, rbx);", // IID1 > #ifdef _LP64 > > It will ease correlating and manually inspecting these statically emitted tables. Thanks, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1807025312 PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1807025446 From duke at openjdk.org Fri Oct 18 21:23:03 2024 From: duke at openjdk.org (hanklo6) Date: Fri, 18 Oct 2024 21:23:03 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v4] In-Reply-To: <010De2p3_0FDAr6yBuz22bhrXSeY-LG13oCKzycjl3U=.69df1b15-5375-4450-8272-c12895ecfc21@github.com> References: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> <3e0pIvFYNGmrXzKkP0MQcGATfEVfV0ZWWbSWXwV2H-0=.36137749-b27d-47cc-b31b-124f2e76227f@github.com> <010De2p3_0FDAr6yBuz22bhrXSeY-LG13oCKzycjl3U=.69df1b15-5375-4450-8272-c12895ecfc21@github.com> Message-ID: On Fri, 18 Oct 2024 02:44:42 GMT, Jatin Bhateja wrote: >> I think I can put the `cl` register in the GCC assembly to align it with the JDK assembler. This will allow us to remove these parts of checking. > >> I think I can put the `cl` register in the GCC assembly to align it with the JDK assembler. This will allow us to remove these parts of checking. > > Good, this should prevent skipping over their validation. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1807025632 From swen at openjdk.org Fri Oct 18 21:56:53 2024 From: swen at openjdk.org (Shaojin Wen) Date: Fri, 18 Oct 2024 21:56:53 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: > After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. > > This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 - fix build error - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - revert test - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 - ... and 16 more: https://git.openjdk.org/jdk/compare/2370f48a...457735c9 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19626/files - new: https://git.openjdk.org/jdk/pull/19626/files/ae054771..457735c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19626&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19626&range=18-19 Stats: 90483 lines in 1658 files changed: 73190 ins; 9277 del; 8016 mod Patch: https://git.openjdk.org/jdk/pull/19626.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19626/head:pull/19626 PR: https://git.openjdk.org/jdk/pull/19626 From duke at openjdk.org Fri Oct 18 22:55:55 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 18 Oct 2024 22:55:55 GMT Subject: Integrated: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: On Fri, 11 Oct 2024 16:51:16 GMT, Chad Rakoczy wrote: > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways > > Confirmed that added test fails before patch and passes after This pull request has now been integrated. Changeset: 401d0d6b Author: Chad Rakoczy Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/401d0d6b09ea422eacecda2900793a416097dc9b Stats: 105 lines in 2 files changed: 102 ins; 0 del; 3 mod 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn Reviewed-by: thartmann, eastigeevich ------------- PR: https://git.openjdk.org/jdk/pull/21473 From swen at openjdk.org Fri Oct 18 23:33:17 2024 From: swen at openjdk.org (Shaojin Wen) Date: Fri, 18 Oct 2024 23:33:17 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/2cc7136e...457735c9 After PR 19970, the performance has been significantly improved. Below are the performance numbers for AMD CPU (x64) ## Script git remote add wenshao git at github.com:wenshao/jdk.git git fetch wenshao # pr 19626 git clone 58dae7888eceb1c61243f658b67c208e6c30f7f2 make test TEST="micro:java.lang.StringBuilders.appendWithNull8Latin1" # pr 19626 + 19970 git clone 457735c920aad822557e68e75ba1e76811c855a4 make test TEST="micro:java.lang.StringBuilders.appendWithNull8Latin1" ## performance numbers -Benchmark Mode Cnt Score Error Units (pr 19626) -StringBuilders.appendWithNull8Latin1 avgt 15 8.316 ? 0.512 ns/op +Benchmark Mode Cnt Score Error Units (pr 19626 + 19970) +StringBuilders.appendWithNull8Latin1 avgt 15 5.891 ? 0.043 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2423368506 From dlong at openjdk.org Fri Oct 18 23:36:40 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 18 Oct 2024 23:36:40 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Fri, 18 Oct 2024 18:35:02 GMT, Chad Rakoczy wrote: > [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) > > Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change > > Ran array copy and tier 1 on aarch64 machine > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 > ============================== > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 > jtreg:test/jdk:tier1 2436 2436 0 0 > jtreg:test/langtools:tier1 4577 4577 0 0 > jtreg:test/jaxp:tier1 0 0 0 0 > jtreg:test/lib-test:tier1 34 34 0 0 > ============================== Please quantify the performance improvement, if any. I would expect this to make 0 difference, but if it does, then perhaps we should consider checking for a shift of 0 inside `lsr`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2423370781 From duke at openjdk.org Fri Oct 18 23:42:04 2024 From: duke at openjdk.org (hanklo6) Date: Fri, 18 Oct 2024 23:42:04 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v7] In-Reply-To: References: Message-ID: > Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > ### Generate test instructions > With `binutils = 2.43` > * `python3 x86-asmtest.py > asmtest.out.h` > ### Run test > * `make test TEST="gtest:AssemblerX86"` hanklo6 has updated the pull request incrementally with two additional commits since the last revision: - Refactor - Add missing instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20857/files - new: https://git.openjdk.org/jdk/pull/20857/files/bfd44632..b0f60df7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=05-06 Stats: 116954 lines in 2 files changed: 60528 ins; 34836 del; 21590 mod Patch: https://git.openjdk.org/jdk/pull/20857.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20857/head:pull/20857 PR: https://git.openjdk.org/jdk/pull/20857 From duke at openjdk.org Fri Oct 18 23:49:37 2024 From: duke at openjdk.org (hanklo6) Date: Fri, 18 Oct 2024 23:49:37 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v4] In-Reply-To: References: <_tdMDW6ViTZsDgks0k2uWDHJUum3VL13PX0piUiQBtk=.4a06f43d-037f-4725-869d-287813bc8750@github.com> Message-ID: On Thu, 17 Oct 2024 12:36:27 GMT, Jatin Bhateja wrote: >> hanklo6 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into apx-test-tool >> - Add comment and defined >> - Add copyright header >> - Remove tab >> - Remove whitespace >> - Replace whitespace with tab >> - Add flag before testing >> - Fix assertion error on MacOS >> - Add _LP64 flag >> - Add missing header >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0ac393c7...ca48f240 > > test/hotspot/gtest/x86/test_assemblerx86.cpp line 99: > >> 97: asm_check((const uint8_t *)entry, (const uint8_t *)insns, insns_lens, insns_strs, sizeof(insns_lens) / sizeof(insns_lens[0])); >> 98: BufferBlob::free(b); >> 99: } > > Following MAP0 and MAP1 instructions are missing :- > > bsfl bsfq bsrl bsrq bswapl bswapq > btq > call > cmpb cmpl cmpq cmpw > cmpxchgb cmpxchgl cmpxchgq cmpxchgw > cvttsd2siq > incl incq > lea leal leaq > mov mov64 movb movl movq > movsbl movsbq movslq movswl movswq > movw > movzbl movzbq movzwl movzwq > orw > sall salq > testb testl testq > xaddb xaddl xaddq xaddw > xchgb xchgl xchgq xchgw > > > But, given that all assembly routines share same leaf level prefix emitting routines, we can skip them for the time being or add validate just one from each row > > Please do add following new MAP4 APX instructions since you are already taking care of their two operand counterparts with PPX. > > 1. popp > 2. pushp Thanks, I added the missing instructions. For the `incl` and `incq`, I tested them through the `incrementl` and `incrementq` functions, which calls them. The `orw` was accidentally re-added, I will remove it in another PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1807109544 From jrose at openjdk.org Sat Oct 19 00:03:36 2024 From: jrose at openjdk.org (John R Rose) Date: Sat, 19 Oct 2024 00:03:36 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Fri, 18 Oct 2024 18:35:02 GMT, Chad Rakoczy wrote: > [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) > > Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change > > Ran array copy and tier 1 on aarch64 machine > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 > ============================== > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 > jtreg:test/jdk:tier1 2436 2436 0 0 > jtreg:test/langtools:tier1 4577 4577 0 0 > jtreg:test/jaxp:tier1 0 0 0 0 > jtreg:test/lib-test:tier1 34 34 0 0 > ============================== A general principle for our work here: Keep it simple. Simplicity is maintainability. Only make it more complex if you can demonstrate a practical win. Otherwise, keep the hands away from the keyboard. It probably never ever makes a difference, to replace shift-by-0 with move, likewise an add-with-0 or an xor-with-0. On most RISC machines for the last 4 decades, mov is just a macro for such an ALU operation, and all of them take the same number of cycles and go through the same circuitry (with different mode lines enabled). In other words, the cost model justifying this supposed improvement is probably about a half century out of date. Maybe an expert on AARCH64 can correct me on this point? So I'm against this change as unnecessary, unless there is a performance test that shows a significant benefit. Changes like this only make the sources harder to read and maintain. If we think multiple people will have an overwhelming urge to tweak the code, even after the tweak is proven unnecessary, then add a comment to future maintainers explaining why there is no need here for a tweak. A semi-useless comment is easier to maintain than a useless if/then/else. Dean's suggestion is somewhat reasonable, to have the assembler desugar shift-by-zero to mov. The only effect of that would be to make the disassembly slightly easier to read, for less experienced readers of disassembly. We did some things like that with the SPARC port, but did so under new assembler API points (what used to be called pseudo-instructions). For example, we had a mov-or-nop instruction, which nopped itself if the source and destination of the move were the same register. There is huge benefit to having the assembler API do exactly what it says, and not second-guess the user (swapping other instructions behind the user's back). Using that process, you'd introduce a shift-or-mov instruction, explicitly. Which is obviously not worth it. So, again, I'd say no change is needed here, not even Dean's. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2423387296 From dlong at openjdk.org Sat Oct 19 00:30:56 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 19 Oct 2024 00:30:56 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Fri, 18 Oct 2024 23:59:28 GMT, John R Rose wrote: > In other words, the cost model justifying this supposed improvement is probably about a half century out of date. Maybe an expert on AARCH64 can correct me on this point? Apparently some moves are "0 latency" and skip the pipeline on some aarch64 hardware. Yes, we usually expect Assembler APIs to do exactly what we ask, but when the API is in MacroAssembler and doesn't correspond to an actual aarch64 instruction or alias, I think we normally allow optimizations. As LSR is an alias, I think we would expect it to generate the underlying `ubfm` encoding, so if we were going to optimize based on the shift value, we could introduce a new API with a name like shift_right(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2423403430 From jrose at openjdk.org Sat Oct 19 01:01:10 2024 From: jrose at openjdk.org (John R Rose) Date: Sat, 19 Oct 2024 01:01:10 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Fri, 18 Oct 2024 18:35:02 GMT, Chad Rakoczy wrote: > [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) > > Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change > > Ran array copy and tier 1 on aarch64 machine > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 > ============================== > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 > jtreg:test/jdk:tier1 2436 2436 0 0 > jtreg:test/langtools:tier1 4577 4577 0 0 > jtreg:test/jaxp:tier1 0 0 0 0 > jtreg:test/lib-test:tier1 34 34 0 0 > ============================== Thanks Dean. If there is a specific forwarding mechanism for some moves but not all move-like instructions, then a micro-optimization like this is worth considering. (We'd still want evidence from perf tests.) I think it would belong inside the macro-assembler, though, so we don't play whack-a-mole finding all the places where we could reduce a quasi-move to a real forwardable move. Also the comments documenting this decision belong in the macro-assembler, and not just in PR conversations or comments on random use-sites of the macro-assembler. Again, for SPARC, we made the distinction between assembler and macro-assembler. The assembler never did anything "smart", that was left to the macro-assembler. Occasionally the same name would mean different things at the two levels of assembler. That may or may not be good C++ practice. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2423416746 From aph at openjdk.org Sat Oct 19 13:21:32 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 19 Oct 2024 13:21:32 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: <2kQx2r_Z7ibQj6Kp3W_R5tL_Ljmj6s2q7gkAwxtr3mE=.69d56ad7-efd3-4e20-912a-280f43f78836@github.com> On Sat, 19 Oct 2024 00:57:42 GMT, John R Rose wrote: > Thanks Dean. If there is a specific forwarding mechanism for some moves but not all move-like instructions, then a micro-optimization like this is worth considering. (We'd still want evidence from perf tests.) Here's how it works, on recent high-end Arm and Apple silicon. Full-width mov instructions do not issue at all: instead, they are handled by the renamer at decode time. In effect they have no latency at execution time, although they do occupy slots in the decoder. Partial width (e.g. 32-bit) mov instructions do issue because they do some work: they clear the top half of the destination register, and they need an ALU to do that. There is some theoretical advantage to turning a full-width shift of 0 into a full-width mov. For example, Apple M1 can decode 8 instructions and can execute 6 integer ops per clock. Shift instructions have a latency of one clock. But given that these CPUs have very wide issue as well as many integer ALUs, it may be impossible to gain any performance advantage in real-world code. It is possible, with a carefully-written assembly-code benchmark, to measure some performance advantage, but it is unlikely to gain much in practice. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2423845405 From jbhateja at openjdk.org Sat Oct 19 17:10:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 19 Oct 2024 17:10:31 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v7] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 23:42:04 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request incrementally with two additional commits since the last revision: > > - Refactor > - Add missing instructions Thanks Hank!!, I think we have pretty much covered all MAP0 and MAP1 instructions supporting APX REX2 encoding along with MAP4 based PUSP/POP with PPX. LGTM ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20857#pullrequestreview-2379647882 From duke at openjdk.org Sat Oct 19 18:51:03 2024 From: duke at openjdk.org (duke) Date: Sat, 19 Oct 2024 18:51:03 GMT Subject: RFR: 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output [v2] In-Reply-To: <3HZPah5ZpaIbiYIwkxanAJc5MJdHuBbTwZkRdoJ6zZg=.dfacac9e-8616-4e2b-a438-9bc872777598@github.com> References: <3HZPah5ZpaIbiYIwkxanAJc5MJdHuBbTwZkRdoJ6zZg=.dfacac9e-8616-4e2b-a438-9bc872777598@github.com> Message-ID: On Fri, 18 Oct 2024 17:03:28 GMT, Tom?? Zezula wrote: >> The `compiler/jvmci/TestJVMCISavedProperties` test fails due to overlapping output from the saved system properties. The initialization of `savedProperties` in `jdk.vm.ci.services.Services` is correctly synchronized, the issue suggests that two separate libjvmci compiler isolates are each printing their own set of saved properties. >> >> In a successful test run, the `CompileBroker` thread aborts the VM before it completes initialization, displaying the error message `Cannot use JVMCI compiler: Value of jvmci.Compiler is ?null?` (due to the `-Djvmci.Compiler=null` setting), and the message `DONE IN MAIN` is never printed. However, in the failed test output, the `DONE IN MAIN` message appears, indicating that the VM initialization completed and created the `JVMCIRuntime` instance. The `CompileBroker` thread might have concurrently initialized `JVMCIRuntime` in another isolate. Since each `JVMCIRuntime` initialization outputs system properties, this is likely the cause of the overlapping output. >> >> The proposed solution is to use the `-XX:+EnableJVMCI` flag instead of `-XX:+UseJVMCICompiler`, to avoid this issue. > > Tom?? Zezula has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output @tzezula Your change (at version 7849ba677388c1f2c10507d98affcfdc3cd54229) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21583#issuecomment-2424138147 From redestad at openjdk.org Sat Oct 19 23:24:40 2024 From: redestad at openjdk.org (Claes Redestad) Date: Sat, 19 Oct 2024 23:24:40 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: <5gmz0_Piseb_o1SzBeDw0nYJVYlpitOLTcZJTC_xr5I=.e85f6998-68de-4f36-95da-d5c12566d8fa@github.com> On Fri, 18 Oct 2024 23:29:58 GMT, Shaojin Wen wrote: > After PR 19970, the performance has been significantly improved. Below are the performance numbers for AMD CPU (x64) It'd be interesting to check performance on this micro with #19970 alone ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2424281717 From swen at openjdk.org Sun Oct 20 07:54:43 2024 From: swen at openjdk.org (Shaojin Wen) Date: Sun, 20 Oct 2024 07:54:43 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/9d5a1c38...457735c9 I ran performance tests on MaxBook M1 (aarch64) and aliyun c8a (AMD CPU x64). There was no significant performance difference between pr #19970 and master, but pr #19626 combined with #19970 significantly improved performance. ## Script git remote add wenshao git at github.com:wenshao/jdk.git git fetch wenshao # master git checkout 85582d7a88bd5f79f5991ce22bc3bc75e514aec9 make test TEST="micro:java.lang.StringBuilders.appendWithNull8Latin1" # pr 19970 git checkout 3b89956957085e134a05c05876f40b85d949227e make test TEST="micro:java.lang.StringBuilders.appendWithNull8Latin1" # pr 19626 + 19970 git checkout 58dae7888eceb1c61243f658b67c208e6c30f7f2 make test TEST="micro:java.lang.StringBuilders.appendWithNull8Latin1" ## MacBook M1 Max Performance Numbers # master Benchmark Mode Cnt Score Error Units StringBuilders.appendWithNull8Latin1 avgt 15 6.950 ? 0.027 ns/op # pr 19970 Benchmark Mode Cnt Score Error Units StringBuilders.appendWithNull8Latin1 avgt 15 6.945 ? 0.008 ns/op # pr 19626 + 19970 Benchmark Mode Cnt Score Error Units StringBuilders.appendWithNull8Latin1 avgt 15 6.441 ? 0.059 ns/op ## AMD x64 Performance Numbers # master Benchmark Mode Cnt Score Error Units StringBuilders.appendWithNull8Latin1 avgt 15 17.522 ? 8.113 ns/op # pr 19970 Benchmark Mode Cnt Score Error Units StringBuilders.appendWithNull8Latin1 avgt 15 17.487 ? 8.127 ns/op # pr 19626 + 19970 Benchmark Mode Cnt Score Error Units StringBuilders.appendWithNull8Latin1 avgt 15 5.983 ? 0.113 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2424709734 From qamai at openjdk.org Sun Oct 20 11:44:19 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 20 Oct 2024 11:44:19 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte Message-ID: Hi, This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - refactor array constant, fix codebuffer reallocation Changes: https://git.openjdk.org/jdk/pull/21596/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21596&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342651 Stats: 169 lines in 8 files changed: 74 ins; 44 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/21596.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21596/head:pull/21596 PR: https://git.openjdk.org/jdk/pull/21596 From dhanalla at openjdk.org Sun Oct 20 14:48:48 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Sun, 20 Oct 2024 14:48:48 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v3] In-Reply-To: References: Message-ID: > In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. > > When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both debug and Release builds exhibited the same behavior: the compilation bails out, and execution completes without any issues. > > The assert statement is not essential, as it is causing unnecessary failures in the debug build. Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: CR feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20504/files - new: https://git.openjdk.org/jdk/pull/20504/files/7d3367f6..8a414baf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=01-02 Stats: 62 lines in 4 files changed: 59 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20504/head:pull/20504 PR: https://git.openjdk.org/jdk/pull/20504 From qamai at openjdk.org Sun Oct 20 16:40:32 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 20 Oct 2024 16:40:32 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v2] In-Reply-To: References: Message-ID: > Hi, > > This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. > > Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. > > This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into constanttable - refactor array constant, fix codebuffer reallocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21596/files - new: https://git.openjdk.org/jdk/pull/21596/files/bcd0457c..2efa68db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21596&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21596&range=00-01 Stats: 22112 lines in 670 files changed: 17395 ins; 2460 del; 2257 mod Patch: https://git.openjdk.org/jdk/pull/21596.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21596/head:pull/21596 PR: https://git.openjdk.org/jdk/pull/21596 From qamai at openjdk.org Sun Oct 20 16:41:19 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 20 Oct 2024 16:41:19 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v25] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - Merge branch 'master' into unsignedbounds - address reviews - comment adjust_lo empty case - formality - address reviews - add comments, refactor functions to helper class - refine comments - remove leftover code - add doc to TypeInt, rename parameters, remove unused methods - change (~v & ones) == 0 to (v & ones) == ones - ... and 22 more: https://git.openjdk.org/jdk/compare/309b9291...7f3316fa ------------- Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=24 Stats: 1945 lines in 10 files changed: 1385 ins; 325 del; 235 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From redestad at openjdk.org Sun Oct 20 20:22:37 2024 From: redestad at openjdk.org (Claes Redestad) Date: Sun, 20 Oct 2024 20:22:37 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/ea332e42...457735c9 Hmm, I would have hoped for `appendNull` the pre-existing code would have allowed for merging stores with #19970. Can you run with `+TraceMergeStores` on the #19970 branch? Perhaps we'd need to minimally change from `count++` increments to constant offsets: val[count] = 'n'; val[count + 1] = 'u'; val[count + 2] = 'l'; val[count + 3] = 'l'; (I think it would be good to explore if we can to trigger the optimization without resorting to `Unsafe`. Any ideas, @eme64?) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2425202938 From swen at openjdk.org Sun Oct 20 22:50:23 2024 From: swen at openjdk.org (Shaojin Wen) Date: Sun, 20 Oct 2024 22:50:23 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/d6725bc9...457735c9 Simplifying the implementation of appendNull can improve performance, but it is still not as good as using Unsafe.putByte. git remote add wenshao git at github.com:wenshao/jdk.git git fetch wenshao # master git checkout 85582d7a88bd5f79f5991ce22bc3bc75e514aec9 make test TEST="micro:java.lang.StringBuilders.appendWithNull8Latin1" # pr 19970 git checkout 3b89956957085e134a05c05876f40b85d949227e make test TEST="micro:java.lang.StringBuilders.appendWithNull8Latin1" # pr 19970 + simplify append null git checkout a43be33a6cc67ac72058d1819ee3008fb6f76211 make test TEST="micro:java.lang.StringBuilders.appendWithNull8Latin1" # pr 19626 + 19970 git checkout 58dae7888eceb1c61243f658b67c208e6c30f7f2 make test TEST="micro:java.lang.StringBuilders.appendWithNull8Latin1" ## MacBook M1 Max Performance Numbers # master Benchmark Mode Cnt Score Error Units StringBuilders.appendWithNull8Latin1 avgt 15 6.950 ? 0.027 ns/op # pr 19970 Benchmark Mode Cnt Score Error Units StringBuilders.appendWithNull8Latin1 avgt 15 6.945 ? 0.008 ns/op # pr 19970 + simplify append null Benchmark Mode Cnt Score Error Units StringBuilders.appendWithNull8Latin1 avgt 15 6.766 ? 0.012 ns/op # pr 19626 + 19970 Benchmark Mode Cnt Score Error Units StringBuilders.appendWithNull8Latin1 avgt 15 6.441 ? 0.059 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2425269311 From swen at openjdk.org Mon Oct 21 01:33:50 2024 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 21 Oct 2024 01:33:50 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/a86af3e4...457735c9 There is no array out-of-bounds check when using Unsafe. In the appendNull method, there is already a call to ensureCapacityInternal. It is also safe to use Unsafe in this scenario. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2425373747 From fyang at openjdk.org Mon Oct 21 01:49:47 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 21 Oct 2024 01:49:47 GMT Subject: RFR: 8342579: RISC-V: C2: Cleanup effect of killing flag register for call instructs In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 02:24:16 GMT, Fei Yang wrote: > Previously found and discussed at: https://github.com/openjdk/jdk/pull/21406#discussion_r1803232561 > For C2 call nodes, it's not necessary to add effect listing flag register as being killed. > This cleans them up and thus aligns with other CPU platforms. > > Testing on linux-riscv64: > - [x] Tier1 (release build) Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21576#issuecomment-2425389391 From fyang at openjdk.org Mon Oct 21 01:49:48 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 21 Oct 2024 01:49:48 GMT Subject: Integrated: 8342579: RISC-V: C2: Cleanup effect of killing flag register for call instructs In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 02:24:16 GMT, Fei Yang wrote: > Previously found and discussed at: https://github.com/openjdk/jdk/pull/21406#discussion_r1803232561 > For C2 call nodes, it's not necessary to add effect listing flag register as being killed. > This cleans them up and thus aligns with other CPU platforms. > > Testing on linux-riscv64: > - [x] Tier1 (release build) This pull request has now been integrated. Changeset: 76ae072a Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/76ae072a1fec5f2af4ac4c633bc67a0c4c756a90 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod 8342579: RISC-V: C2: Cleanup effect of killing flag register for call instructs Reviewed-by: rehn, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/21576 From dhanalla at openjdk.org Mon Oct 21 02:39:00 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Mon, 21 Oct 2024 02:39:00 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v4] In-Reply-To: References: Message-ID: > In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. > > When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both debug and Release builds exhibited the same behavior: the compilation bails out, and execution completes without any issues. > > The assert statement is not essential, as it is causing unnecessary failures in the debug build. Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: change CRLF to LF ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20504/files - new: https://git.openjdk.org/jdk/pull/20504/files/8a414baf..8f9cd174 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=02-03 Stats: 51 lines in 1 file changed: 0 ins; 1 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/20504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20504/head:pull/20504 PR: https://git.openjdk.org/jdk/pull/20504 From jkarthikeyan at openjdk.org Mon Oct 21 04:16:12 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 21 Oct 2024 04:16:12 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering Message-ID: Hi all, This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! ------------- Commit messages: - Implement PhaseLowering Changes: https://git.openjdk.org/jdk/pull/21599/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342662 Stats: 292 lines in 21 files changed: 279 ins; 12 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21599.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21599/head:pull/21599 PR: https://git.openjdk.org/jdk/pull/21599 From amitkumar at openjdk.org Mon Oct 21 04:27:51 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 21 Oct 2024 04:27:51 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v2] In-Reply-To: References: Message-ID: <2PMRVsGJHC6sQsMumqKgg9eEoX30OnUYA8BlvTWYs2U=.9cbef728-014b-4eba-ba40-641680741059@github.com> On Fri, 18 Oct 2024 13:25:46 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> changes leftmost->upper and rightmost -> lower > > src/hotspot/cpu/s390/s390.ad line 6263: > >> 6261: effect(TEMP r4_reven_tmp, KILL cr); >> 6262: // TODO: size(4); >> 6263: format %{ "UDIV $r5_rodd_dst, $r5_rodd_dst,$src2" %} > > Suggestion: no whitespace between instruction operands. Hi @RealLucy, was that for consistency or there is issue with space in the format section? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21559#discussion_r1808085902 From qamai at openjdk.org Mon Oct 21 06:16:07 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 21 Oct 2024 06:16:07 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 04:11:03 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. > > In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! src/hotspot/cpu/aarch64/c2_lowering_aarch64.cpp line 29: > 27: #include "opto/phaseX.hpp" > 28: > 29: Node* PhaseLowering::lower_node(Node* in) { You need to wrap all these in `#ifdef COMPILER2` src/hotspot/share/opto/compile.cpp line 2464: > 2462: { > 2463: TracePhase tp("lower", &timers[_t_lower]); > 2464: print_method(PHASE_BEFORE_LOWERING, 3); Isn't `BEFORE_LOWERING` the same as `AFTER_BARRIER_EXPANSION` right above? src/hotspot/share/opto/phaseX.cpp line 2277: > 2275: > 2276: // Try to find an existing version of the same node > 2277: Node* existing = _igvn->hash_find_insert(n); I think it would be easier if you have a switch in `gvn` that says you passed the point of doing `Ideal`, moving forward you will probably want to have a `IdealLowering` to transform nodes during this phase. `Identity` I think is fine since it returns an existing node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1808154845 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1808157225 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1808156752 From chagedorn at openjdk.org Mon Oct 21 06:24:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Oct 2024 06:24:01 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v24] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Fri, 18 Oct 2024 21:04:38 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comment pseudo code, improve readability with explicit skip src/hotspot/share/opto/loopnode.cpp line 3973: > 3971: // conversions are required: > 3972: // > 3973: // long iv2 = ((long) phi * stride_con2 / stride_con) + (init2 - ((long) init * stride_con2 / stride_con)) Thanks for updating the example. I guess we should use `iv` consistently - missed that to do in my example before: Suggestion: // int a = init2; // for (int iv = init; iv < limit; iv += stride_con) { // a += stride_con2; // } // // and transforms it to: // // int iv2 = init2 // int iv = init // loop: // if ( iv >= limit ) goto exit // iv += stride_con // iv2 = init2 + (iv - init) * (stride_con2 / stride_con) // goto loop // exit: // ... // // Such transformation introduces more optimization opportunities. In this // particular example, the loop can be eliminated entirely given that // `stride_con2 / stride_con` is exact (i.e., no remainder). Checks are in // place to only perform this optimization if such a division is exact. This // example will be transformed into its semantic equivalence: // // int iv2 = (iv * stride_con2 / stride_con) + (init2 - (init * stride_con2 / stride_con)) // // which corresponds to the structure of transformed subgraph. // // However, if there is a mismatch between types of the loop and the parallel // induction variable (e.g., a long-typed IV in an int-typed loop), type // conversions are required: // // long iv2 = ((long) iv * stride_con2 / stride_con) + (init2 - ((long) init * stride_con2 / stride_con)) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1808152983 From chagedorn at openjdk.org Mon Oct 21 06:31:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Oct 2024 06:31:34 GMT Subject: Integrated: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 11:52:02 GMT, Christian Hagedorn wrote: > ### Assertion Predicates Have the True Projection on the Success Path > By design, Template and Initialized Assertion Predicates have the true projection on the success path and false projection on the failing path. > > ### Is a Node a Template Assertion Predicate? > Template Assertion Predicates can have an uncommon trap on or a `Halt` node following the failing/false projection. When trying to find out if a node is a Template Assertion Predicate, we call `TemplateAssertionPredicate::is_predicate()` which checks if the provided node is the success projection of a Template Assertion Predicate. This involves checking if the other projection has a `Halt` node or an UCT (L141): > https://github.com/openjdk/jdk/blob/7a64fbbb9292f4d65a6970206dec1a7d7645046b/src/hotspot/share/opto/predicates.cpp#L135-L144 > > ### New `PredicateIterator` Class > > [JDK-8340786](https://bugs.openjdk.org/browse/JDK-8340786) introduced a new `PredicateIterator` class to simplify iteration over predicates and replaced code that was doing some custom iteration. > > #### Usual Usage > Normally, we always start from a loop entry and follow the predicates (i.e. reaching a predicate over its success path). > > #### Special Usage > However, I also replaced the predicate skipping in `PhaseIdealLoop::build_loop_late_post_work()` with the new `PredicateIterator` class which could start at any node, including a failing path false projection. > > ### Problem: Two Uncommon Traps for a Template Assertion Predicate > The fuzzer now found a case where a Template Assertion Predicate has a predicate UCT on the failing **and** the success path due to folding away some dead nodes: > > ![image](https://github.com/user-attachments/assets/c2a9395e-9eaf-46a0-b978-8847d8e21945) > > In the Special Usage mentioned above, we could be using the `PredicateIterator` with `505 IfFalse` as starting node. In the core method `RegularPredicateBlockIterator::for_each()` for the traversal, we call the following with `current` = `505 IfFalse`: > https://github.com/openjdk/jdk/blob/1ea1f33f66326804ca2892fe0659a9acb7ee72ae/src/hotspot/share/opto/predicates.hpp#L628-L629 > `TemplateAssertionPredicate::is_predicate()` succeeds because we have a valid Template Assertion Predicate If node and the other projection `504 IfTrue` has a predicate UCT. We then fail when trying to convert the `IfFalse` to an `IfTrue` with `current->as_IfTrue()` on L629. > > ### Solution > The fix is straight forward: `TemplateAssertionPredicate::is_predicate()` (and `InitiliazedAssertionPredi... This pull request has now been integrated. Changeset: d61f56a3 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/d61f56a3001f2f574f49c36f5bb40e96bb6b827d Stats: 83 lines in 4 files changed: 77 ins; 0 del; 6 mod 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21561 From kxu at openjdk.org Mon Oct 21 06:36:30 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 21 Oct 2024 06:36:30 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v25] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/c37484ab..4314d739 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=23-24 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Mon Oct 21 06:36:30 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 21 Oct 2024 06:36:30 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v24] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <-wOSP3lnzWkUmb0l77CBHHVKQnJV2Kl7jqbAaYXkG6M=.7aca9cbe-8377-4703-9d2a-a2a80775d72c@github.com> On Mon, 21 Oct 2024 06:07:14 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comment pseudo code, improve readability with explicit skip > > src/hotspot/share/opto/loopnode.cpp line 3973: > >> 3971: // conversions are required: >> 3972: // >> 3973: // long iv2 = ((long) phi * stride_con2 / stride_con) + (init2 - ((long) init * stride_con2 / stride_con)) > > Thanks for updating the example. I guess we should use `iv` consistently - missed that to do in my example before: > Suggestion: > > // int a = init2; > // for (int iv = init; iv < limit; iv += stride_con) { > // a += stride_con2; > // } > // > // and transforms it to: > // > // int iv2 = init2 > // int iv = init > // loop: > // if ( iv >= limit ) goto exit > // iv += stride_con > // iv2 = init2 + (iv - init) * (stride_con2 / stride_con) > // goto loop > // exit: > // ... > // > // Such transformation introduces more optimization opportunities. In this > // particular example, the loop can be eliminated entirely given that > // `stride_con2 / stride_con` is exact (i.e., no remainder). Checks are in > // place to only perform this optimization if such a division is exact. This > // example will be transformed into its semantic equivalence: > // > // int iv2 = (iv * stride_con2 / stride_con) + (init2 - (init * stride_con2 / stride_con)) > // > // which corresponds to the structure of transformed subgraph. > // > // However, if there is a mismatch between types of the loop and the parallel > // induction variable (e.g., a long-typed IV in an int-typed loop), type > // conversions are required: > // > // long iv2 = ((long) iv * stride_con2 / stride_con) + (init2 - ((long) init * stride_con2 / stride_con)) Sorry I thought you were only refering to transform code in that last review. Yes, it's better to keep names consistent across. Thanks for pointing out! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1808176476 From epeter at openjdk.org Mon Oct 21 06:48:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Oct 2024 06:48:27 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v4] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: updates for Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/53150059..a911b630 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=02-03 Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From kxu at openjdk.org Mon Oct 21 06:58:38 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 21 Oct 2024 06:58:38 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v26] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/4314d739..81bce8ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From chagedorn at openjdk.org Mon Oct 21 06:58:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Oct 2024 06:58:38 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v25] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Mon, 21 Oct 2024 06:36:30 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn src/hotspot/share/opto/loopnode.cpp line 3952: > 3950: // int iv = init > 3951: // loop: > 3952: // if ( iv >= limit ) goto exit Last nit, then we are good to go from my side :-) Thanks for the updates! Suggestion: // if (iv >= limit) goto exit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1808199243 From chagedorn at openjdk.org Mon Oct 21 06:58:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Oct 2024 06:58:38 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v24] In-Reply-To: <-wOSP3lnzWkUmb0l77CBHHVKQnJV2Kl7jqbAaYXkG6M=.7aca9cbe-8377-4703-9d2a-a2a80775d72c@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <-wOSP3lnzWkUmb0l77CBHHVKQnJV2Kl7jqbAaYXkG6M=.7aca9cbe-8377-4703-9d2a-a2a80775d72c@github.com> Message-ID: On Mon, 21 Oct 2024 06:32:58 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/loopnode.cpp line 3973: >> >>> 3971: // conversions are required: >>> 3972: // >>> 3973: // long iv2 = ((long) phi * stride_con2 / stride_con) + (init2 - ((long) init * stride_con2 / stride_con)) >> >> Thanks for updating the example. I guess we should use `iv` consistently - missed that to do in my example before: >> Suggestion: >> >> // int a = init2; >> // for (int iv = init; iv < limit; iv += stride_con) { >> // a += stride_con2; >> // } >> // >> // and transforms it to: >> // >> // int iv2 = init2 >> // int iv = init >> // loop: >> // if ( iv >= limit ) goto exit >> // iv += stride_con >> // iv2 = init2 + (iv - init) * (stride_con2 / stride_con) >> // goto loop >> // exit: >> // ... >> // >> // Such transformation introduces more optimization opportunities. In this >> // particular example, the loop can be eliminated entirely given that >> // `stride_con2 / stride_con` is exact (i.e., no remainder). Checks are in >> // place to only perform this optimization if such a division is exact. This >> // example will be transformed into its semantic equivalence: >> // >> // int iv2 = (iv * stride_con2 / stride_con) + (init2 - (init * stride_con2 / stride_con)) >> // >> // which corresponds to the structure of transformed subgraph. >> // >> // However, if there is a mismatch between types of the loop and the parallel >> // induction variable (e.g., a long-typed IV in an int-typed loop), type >> // conversions are required: >> // >> // long iv2 = ((long) iv * stride_con2 / stride_con) + (init2 - ((long) init * stride_con2 / stride_con)) > > Sorry I thought you were only refering to transform code in that last review. Yes, it's better to keep names consistent across. Thanks for pointing out! Was first my intention. But when reading it again, I think it's better to go with `iv` everywhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1808198822 From chagedorn at openjdk.org Mon Oct 21 06:58:47 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Oct 2024 06:58:47 GMT Subject: RFR: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit [v3] In-Reply-To: <7LdbqP4RgXKa964-BAUdRkIdB8iBUTiuX3dyB21uB-4=.3ab80aff-56dd-4877-9cd8-d8b669645b9c@github.com> References: <7LdbqP4RgXKa964-BAUdRkIdB8iBUTiuX3dyB21uB-4=.3ab80aff-56dd-4877-9cd8-d8b669645b9c@github.com> Message-ID: On Fri, 18 Oct 2024 07:30:54 GMT, Roland Westrelin wrote: >> That assert checks that during RC elimination, we have either: >> >> - not updated the limit of the main loop >> >> - or that the new limit is at the expected control >> >> The assert fires because the limit was updated but is not at the >> expected control. That happens because `new_limit_ctrl` is updated for >> a test that it attempts to eliminate before it actually proceeds with >> the elimination: if the test can't be eliminated, `new_limit_ctrl` >> gets updated anyway. >> >> While the assert could, maybe, be relaxed (it fires in this case but >> nothing is going wrong), it's better, I think, to simply not uselessly >> restrict the control of the limit. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Tobias Hartmann Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21564#pullrequestreview-2381095686 From epeter at openjdk.org Mon Oct 21 07:02:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Oct 2024 07:02:12 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v4] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 21 Oct 2024 06:48:27 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > updates for Vladimir src/hotspot/share/opto/memnode.cpp line 2916: > 2914: int opc = _store->Opcode(); > 2915: assert(opc == Op_StoreB || opc == Op_StoreC || opc == Op_StoreI, "precondition"); > 2916: // assert(_store->adr_type()->isa_aryptr() != nullptr, "must be array store"); remove! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1808207358 From epeter at openjdk.org Mon Oct 21 07:16:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Oct 2024 07:16:49 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 01:29:16 GMT, Shaojin Wen wrote: >> Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 >> - fix build error >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - revert test >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - ... and 16 more: https://git.openjdk.org/jdk/compare/25212971...457735c9 > > There is no array out-of-bounds check when using Unsafe. In the appendNull method, there is already a call to ensureCapacityInternal. It is also safe to use Unsafe in this scenario. @wenshao @cl4es I've investigated a little with the benchmark. RangeCheck smearing could be the issue here. ![image](https://github.com/user-attachments/assets/776e8464-3e6d-4725-a6ee-3bcd772ee4e0) It turns out that RangeChecks are smeared (a kind of elimination in straight-line code) in the same phase as the MergeStores happens (`post_loop_opts_phase`). So it seems to depend on the order of processing in the worklist, if we first remove the RC or try to merge the stores. If a RC is somewhere still stuck between stores, then merging does not quite work as hoped. Of course if you do it all with `Unsafe`, then those RC are not there any more. Maybe I can somehow delay the MergeStores a little more. But I have nice solution in mind - only hacks so far ? . ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2425810572 From chagedorn at openjdk.org Mon Oct 21 07:17:50 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Oct 2024 07:17:50 GMT Subject: RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v6] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 02:33:58 GMT, Daohan Qu wrote: >> # Description >> >> [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. >> >> But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. >> >> This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". >> >> Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with >> >> >> make test \ >> TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ >> TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" >> >> >> shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) >> >> |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | >> |---|---|---| >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.678 ?(99.9%) 0.574 ops/s | 55.692 ?(99.9%) 4.419 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.792 ?(99.9%) 1.924 ops/s | 64.882 ?(99.9%) 4.175 ops/s | >> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 27.023 ?(99.9%) 1.116 ops/s | 66.313 ?(99.9%) 0.802 ops/s | >> >> # Reproduction >> >> Compiled and run the reduced test case `Test.java` in the appendix below using >> >> >> java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test >> >> >> and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and see some output like >> >> >> > Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: > > Bug fix > It still works for the cases where `mem->in(0) == base->in(0)`. > > It seems that the code that splits through base phi in [`LoadNode::split_through_phi()`](https://github.com/openjdk/jdk/blob/b4977e887a53c898b96a7d37a3bf94742c7cc194/hotspot/src/share/vm/opto/memnode.cpp#L1284) is moved from [`LoadNode::eliminate_autobox()`](https://github.com/openjdk/jdk/blob/7c367a6025f519bf12b5b57c807470555eb0a673/hotspot/src/share/vm/opto/memnode.cpp#L1187) in the commit [b4977e8](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) . And `mem->in(0) == base->in(0)` is what the original code requires: > > https://github.com/openjdk/jdk/blob/7c367a6025f519bf12b5b57c807470555eb0a673/hotspot/src/share/vm/opto/memnode.cpp#L1205-L1240 I see. So, the question is if it's okay to disable it for the other cases where `mem->in(0) != base->in(0)` completely to fix this edge case. What was the problem with your original fix where you wanted to search for dups? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21134#issuecomment-2425815320 From syan at openjdk.org Mon Oct 21 07:26:04 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 21 Oct 2024 07:26:04 GMT Subject: RFR: 8342612: Increase memory usage of compiler/c2/TestScalarReplacementMaxLiveNodes.java In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 15:55:24 GMT, SendaoYan wrote: > Hi all, > The test `test/hotspot/jtreg/compiler/c2/TestScalarReplacementMaxLiveNodes.java` fails on linux-x64/macos-x64/macos-aarch64/windows-x64. To make less CI noisy, we can simply increase the max memory usage before the failure root cause been fixed. > The change has been verified locally. Test-fix only, no risk. Thanks for the review. Test-fix only, no risk. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21586#issuecomment-2425825130 From syan at openjdk.org Mon Oct 21 07:26:04 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 21 Oct 2024 07:26:04 GMT Subject: Integrated: 8342612: Increase memory usage of compiler/c2/TestScalarReplacementMaxLiveNodes.java In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 15:55:24 GMT, SendaoYan wrote: > Hi all, > The test `test/hotspot/jtreg/compiler/c2/TestScalarReplacementMaxLiveNodes.java` fails on linux-x64/macos-x64/macos-aarch64/windows-x64. To make less CI noisy, we can simply increase the max memory usage before the failure root cause been fixed. > The change has been verified locally. Test-fix only, no risk. This pull request has now been integrated. Changeset: 21682bcd Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/21682bcdccbb35286cbffc21517b3b52abcb2476 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8342612: Increase memory usage of compiler/c2/TestScalarReplacementMaxLiveNodes.java Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/21586 From chagedorn at openjdk.org Mon Oct 21 07:27:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Oct 2024 07:27:10 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v26] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <2DXFkwVapA7opI1aOcAE0pIlVTXUGAWK2gsmxycu-tk=.abeb0271-a047-43f2-bdca-0498dc817024@github.com> On Mon, 21 Oct 2024 06:58:38 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-2381152195 From roland at openjdk.org Mon Oct 21 07:39:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 21 Oct 2024 07:39:51 GMT Subject: RFR: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit [v2] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 07:24:54 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> fix & test > > Looks good to me. @TobiHartmann @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/21564#issuecomment-2425859281 From roland at openjdk.org Mon Oct 21 07:39:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 21 Oct 2024 07:39:51 GMT Subject: Integrated: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit In-Reply-To: References: Message-ID: <8oBpu_2SD7dKIlw3UxL0SU1_mbPCeY_YFqWBHhe8Ry8=.2a219321-915a-4ac0-b77a-a31202c4f96e@github.com> On Thu, 17 Oct 2024 14:03:44 GMT, Roland Westrelin wrote: > That assert checks that during RC elimination, we have either: > > - not updated the limit of the main loop > > - or that the new limit is at the expected control > > The assert fires because the limit was updated but is not at the > expected control. That happens because `new_limit_ctrl` is updated for > a test that it attempts to eliminate before it actually proceeds with > the elimination: if the test can't be eliminated, `new_limit_ctrl` > gets updated anyway. > > While the assert could, maybe, be relaxed (it fires in this case but > nothing is going wrong), it's better, I think, to simply not uselessly > restrict the control of the limit. This pull request has now been integrated. Changeset: 8f2b23bb Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/8f2b23bb53e81e3f9d8d84720719d129aea82a78 Stats: 72 lines in 2 files changed: 64 ins; 0 del; 8 mod 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21564 From roland at openjdk.org Mon Oct 21 07:41:50 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 21 Oct 2024 07:41:50 GMT Subject: RFR: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 13:40:16 GMT, Aleksey Shipilev wrote: >> The reason for the crash is that compiled code reads from an object >> that's null. All field loads from an object are guarded by a null >> check. Where is the null check in that case? After the field load: >> >> >> 0x00007ffaac91261f: 44 8B 69 0C mov r13d, dword ptr [rcx + 0xc] <- field load >> 0x00007ffaac912623: 85 C9 test ecx, ecx <- null check (oops!) >> 0x00007ffaac912625: 74 5C je 0x7ffaac912683 >> >> >> When the IR graph is constructed for the test case, the field load is >> correctly made dependent on the null check (through a `CastPP` node) >> but then something happens that's shenandoah specific and that causes >> the field load to become dependent on another check so it can execute >> before the null check. >> >> There are several load barriers involved in the process. One of them >> is expanded at the null check projection. In the process, control for >> the nodes that are control dependent on the null check is updated to >> be the region at the end of the just expanded barrier. The `CastPP` >> node for the null check gets the `Region` as new control. >> >> Another barrier is expanded right after that one. The 2 are back to >> back. They are merged. The `Region` that the `CastPP` depends on goes >> away, the `CastPP` is cloned in both branches at the `Region` and one >> of them becomes control dependent on the heap stable test of the first >> expanded barrier. At this point, one of the `CastPP` is control >> dependent on a heap stable test that's after the null check. But then, >> the heap stable test is moved out of loop and 2 copies of the loop are >> made so one can run without any overhead from barriers. When that >> happens, the `CastPP` becomes dependent on a test that dominates the >> null check and so the field load that depends on the `CastPP` can be >> scheduled before the null check. >> >> The fix I propose is not update the control when the barrier is >> expanded for nodes that can float when the test they depend on >> moves. This way the `CastPP` remains dependent on the null check. > > Thanks! I am running tests with this patch. @shipilev @rkennke thanks for the reviews + testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/21562#issuecomment-2425863060 From roland at openjdk.org Mon Oct 21 07:41:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 21 Oct 2024 07:41:51 GMT Subject: Integrated: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 13:12:11 GMT, Roland Westrelin wrote: > The reason for the crash is that compiled code reads from an object > that's null. All field loads from an object are guarded by a null > check. Where is the null check in that case? After the field load: > > > 0x00007ffaac91261f: 44 8B 69 0C mov r13d, dword ptr [rcx + 0xc] <- field load > 0x00007ffaac912623: 85 C9 test ecx, ecx <- null check (oops!) > 0x00007ffaac912625: 74 5C je 0x7ffaac912683 > > > When the IR graph is constructed for the test case, the field load is > correctly made dependent on the null check (through a `CastPP` node) > but then something happens that's shenandoah specific and that causes > the field load to become dependent on another check so it can execute > before the null check. > > There are several load barriers involved in the process. One of them > is expanded at the null check projection. In the process, control for > the nodes that are control dependent on the null check is updated to > be the region at the end of the just expanded barrier. The `CastPP` > node for the null check gets the `Region` as new control. > > Another barrier is expanded right after that one. The 2 are back to > back. They are merged. The `Region` that the `CastPP` depends on goes > away, the `CastPP` is cloned in both branches at the `Region` and one > of them becomes control dependent on the heap stable test of the first > expanded barrier. At this point, one of the `CastPP` is control > dependent on a heap stable test that's after the null check. But then, > the heap stable test is moved out of loop and 2 copies of the loop are > made so one can run without any overhead from barriers. When that > happens, the `CastPP` becomes dependent on a test that dominates the > null check and so the field load that depends on the `CastPP` can be > scheduled before the null check. > > The fix I propose is not update the control when the barrier is > expanded for nodes that can float when the test they depend on > moves. This way the `CastPP` remains dependent on the null check. This pull request has now been integrated. Changeset: 680dc5d8 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/680dc5d896f4f7b01b3cf800d548e32bb2ef8c81 Stats: 70 lines in 2 files changed: 70 ins; 0 del; 0 mod 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress Reviewed-by: shade, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/21562 From lucy at openjdk.org Mon Oct 21 07:49:00 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 21 Oct 2024 07:49:00 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v2] In-Reply-To: <2PMRVsGJHC6sQsMumqKgg9eEoX30OnUYA8BlvTWYs2U=.9cbef728-014b-4eba-ba40-641680741059@github.com> References: <2PMRVsGJHC6sQsMumqKgg9eEoX30OnUYA8BlvTWYs2U=.9cbef728-014b-4eba-ba40-641680741059@github.com> Message-ID: On Mon, 21 Oct 2024 04:24:28 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/s390.ad line 6263: >> >>> 6261: effect(TEMP r4_reven_tmp, KILL cr); >>> 6262: // TODO: size(4); >>> 6263: format %{ "UDIV $r5_rodd_dst, $r5_rodd_dst,$src2" %} >> >> Suggestion: no whitespace between instruction operands. > > Hi @RealLucy, > was that for consistency or there is issue with space in the format section? It was just for consistency. As long as you don't parse the OptoAssembly output, it is purely eye candy. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21559#discussion_r1808264322 From lucy at openjdk.org Mon Oct 21 07:49:00 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 21 Oct 2024 07:49:00 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v3] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 15:55:24 GMT, Amit Kumar wrote: >> Add match rules for UDivI, UModI, UDivL, UModL. And also adds `dlr` and `dlgr` instruction. >> >> Tier1 test are clean for fastdebug vm; >> >> Before this patch, `compiler/c2/TestDivModNodes.java` was failing (see jbs issue) but with this patch test is passing. >> >> Without Patch: >> >> >> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units >> IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 1935.176 ? 2.191 ns/op >> IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 1934.915 ? 3.207 ns/op >> IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 1934.325 ? 1.108 ns/op >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 15 1809.782 ? 49.341 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 15 1769.326 ? 2.607 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 15 1784.053 ? 71.190 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 2026.978 ? 1.534 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 2028.039 ? 3.812 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 2437.843 ? 636.808 ns/op >> Finished running test 'micro:java.lang.IntegerDivMod' >> >> >> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units >> LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 4524.897 ? 16.566 ns/op >> LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 4373.714 ? 9.514 ns/op >> LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2018.309 ? 1.788 ns/op >> LongDivMod.testDivideUnsigned 1024 mixed avgt 15 4320.382 ? 19.055 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 15 3988.953 ? 8.770 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 15 1069.703 ? 1.525 ns/op >> LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 5589.319 ? 4.247 ns/op >> LongDivMod.testRemainderUnsigned 1024 positive avgt 15 3904.555 ? 3.191 ns/op >> LongDivMod.testRemainderUnsigned 10... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > removes extra whitespaces Still looks good. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21559#pullrequestreview-2381201082 From epeter at openjdk.org Mon Oct 21 08:01:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Oct 2024 08:01:26 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/a3ecf480...457735c9 Not sure why the RangeCheck smearing issue only appreas here now. Maybe because the graph is bigger? Maybe because of other things in the graph that change the order of things? IDK yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2425905714 From redestad at openjdk.org Mon Oct 21 08:06:46 2024 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 21 Oct 2024 08:06:46 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 07:58:42 GMT, Emanuel Peter wrote: >> Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 >> - fix build error >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - revert test >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 >> - ... and 16 more: https://git.openjdk.org/jdk/compare/a510a929...457735c9 > > Not sure why the RangeCheck smearing issue only appreas here now. Maybe because the graph is bigger? Maybe because of other things in the graph that change the order of things? IDK yet. Thanks for checking @eme64 - up to you if and when you want to try and tackle such an improvement, if it can be done in a clean way. And as you say there might be other factors at play here. Perhaps things are confounded by how the code is structured before and after; this PR outlines the array stores to a separate method which might affect loop optimization passes if that method is first compiled separately then inlined. While I don't think the point fix in this PR is all that urgent this has been lingering for a while and I'm sure @wenshao wants to integrate and move on. We can always integrate then back out the `Unsafe` stuff once/if the phases have been disentangled. Or we keep experimenting to try and see if we can get it to behave ideally without `Unsafe` here and now. WDYT? @liach? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2425916432 From epeter at openjdk.org Mon Oct 21 08:07:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Oct 2024 08:07:22 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rm dead assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/a911b630..b8fc83ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From shade at openjdk.org Mon Oct 21 08:09:53 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Oct 2024 08:09:53 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Fri, 18 Oct 2024 18:35:02 GMT, Chad Rakoczy wrote: > [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) > > Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change > > Ran array copy and tier 1 on aarch64 machine > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 > ============================== > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 > jtreg:test/jdk:tier1 2436 2436 0 0 > jtreg:test/langtools:tier1 4577 4577 0 0 > jtreg:test/jaxp:tier1 0 0 0 0 > jtreg:test/lib-test:tier1 34 34 0 0 > ============================== There is `if (shift) __ lsr(...)` above at line 1376, please fix it as `shift > 0` as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/21589#pullrequestreview-2381250288 From epeter at openjdk.org Mon Oct 21 08:15:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Oct 2024 08:15:09 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: <4k1WvfgPtwKa4RSDzjGnJYo2_O1dzDKdfHQrbLX5730=.040ea20c-7318-43e8-b39d-d0c2d44b3a27@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> <4k1WvfgPtwKa4RSDzjGnJYo2_O1dzDKdfHQrbLX5730=.040ea20c-7318-43e8-b39d-d0c2d44b3a27@github.com> Message-ID: <3Ysdt9UzNOVBIC_EGORy0IhZrcM3aN7uiOGfhCURIZs=.dff00cf4-36aa-476a-a53f-382706774a0a@github.com> On Fri, 11 Oct 2024 16:31:57 GMT, Quan Anh Mai wrote: >> Hi @merykitty , LGTM. >> >> Best Regards. > > @jatin-bhateja Thanks a lot for your reviews. @merykitty I was out on vacation and missed this patch. Can you tell me why the `long` type has the letter `J` and not `L` in the vector type? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2425934071 From epeter at openjdk.org Mon Oct 21 08:24:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Oct 2024 08:24:26 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: On Thu, 10 Oct 2024 01:09:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more style changes Ah, I guess that is generally the letter we use when passing longs, like in method signatures. And that is because `L` is already taken for objects. Oh well, makes sense now ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2425955080 From shade at openjdk.org Mon Oct 21 08:39:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Oct 2024 08:39:13 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Sat, 19 Oct 2024 00:57:42 GMT, John R Rose wrote: > If there is a specific forwarding mechanism for some moves but not all move-like instructions, then a micro-optimization like this is worth considering. (We'd still want evidence from perf tests.) I think it would belong inside the macro-assembler, though, so we don't play whack-a-mole finding all the places where we could reduce a quasi-move to a real forwardable move. If you look at linked issues in JBS, you'll see that I initially did [JDK-8341893](https://bugs.openjdk.org/browse/JDK-8341893) as the fix on compressed ptr decoding path, and [JDK-8341895](https://bugs.openjdk.org/browse/JDK-8341895) as the generic fix in `MacroAssembler`. Then I realized we _only_ reach that pattern from one place in `copy_memory`, which this PR tidies up. Not going for a generic `MacroAssembler` fix is saner here, because with only a single use we do not have a good test coverage for the generic translation. "Failed to pass the cost/benefit bar" is exactly why I backed off doing [JDK-8341895](https://bugs.openjdk.org/browse/JDK-8341895), and instead assigned Chad to touch up the only place where this conversion can at all matter. Looking at this differently: if I wrote the `copy_memory` stub from scratch today, would I do this optimization? Answering personally, I probably would. The original authors apparently did a similar `lsr` -> "nothing" conversion in one of the places already: https://github.com/openjdk/jdk/blob/aa060f22d302789c4f80dd1ebaa233a97b6b0073/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L1376-L1377 Philosophically, the performance optimizations usually fall into three broad categories: "so bad they show up in common tests", "can be measured in targeted tests without trying hard", and "death by a thousand (paper) cuts, you might probably show the impact if you really, really try". Only the first two could be reasonably measured in isolation. The effort required to make a performance-test-based decision for third category usually grossly outweigh their impact. I believe it is a waste of engineering time to even try. Note it does not mean third category can be summarily ignored: adding up hundreds of paper-cut inefficiency fixes is how you get incremental performance improvements as you go. For issues like these, if you can spare a (micro-)instruction on a fairly generic path, do so and move on. I advise all of us to do exactly this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2425987345 From jbhateja at openjdk.org Mon Oct 21 08:45:59 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 21 Oct 2024 08:45:59 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 06:28:35 GMT, Emanuel Peter wrote: >>> > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? >>> > >>> > >>> > Nomenclature is suggested by Paul. >>> >>> @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? >>> >> >> It's whatever math functions are required to in support of vector operations (as the JavaDoc indicates) that are not provided by other classes such as the boxed primitives or `java.lang.Math`. >> >> >> /** >> * The class {@code VectorMath} contains methods for performing >> * scalar numeric operations in support of vector numeric operations. >> */ >> public final class VectorMath { >> >> >> These are referenced by the vector operators e.g., >> >> >> /** Produce saturating {@code a+b}. Integral only. >> * @see VectorMath#addSaturating(int, int) >> */ >> public static final Binary SADD = binary("SADD", "+", VectorSupport.VECTOR_OP_SADD, VO_NOFP); >> >> >> And in addition these methods would be used by any tail computation (and the fallback code). >> >> At the moment we are uncertain whether such operations should reside elsewhere and we did not want to block progress. I am not beholden to the name, but so far i cannot think of a concise alternative.`VectorOperatorMath` is arguably more precise but more verbose. > >> > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? >> > > >> > > >> > > Nomenclature is suggested by Paul. >> > >> > >> > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? >> >> It's whatever math functions are required to in support of vector operations (as the JavaDoc indicates) that are not provided by other classes such as the boxed primitives or `java.lang.Math`. > > Ok. I suppose these methods could eventually be moved to `java.lang.Math` or some other `java.lang` class, when the VectorAPI goes out of incubator mode? > > I feel like these saturating operations, and also the unsigned ops could find a more wider use, away from (explicit) vector usage. For example, the saturating operations are nice because they prevent overflows, and in some cases that would be very nice to have readily available. Hi @eme64 , Can you kindly review following changes, rest of the portions are already reviewed and approved. https://github.com/openjdk/jdk/pull/20507/commits/2b0fa01633875926595656d8dcfd539c334f23a3 https://github.com/openjdk/jdk/pull/20507/commits/c56508899b000b8b1eb6755c901798a2a3685ef5 Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2426005754 From roland at openjdk.org Mon Oct 21 08:45:59 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 21 Oct 2024 08:45:59 GMT Subject: RFR: 8342330: C2: "node pinned on loop exit test?" assert failure Message-ID: The assert fires because range check elimination processes a test of the shape: if (i * 4 != (x - objectField.intField) - 1)) { ... } and `(x - objectField.intField) - 1)` has control on the exit projection of the pre loop. This happens because: - `objectField.intField` depends on the null check of `objectField` which is performed in the pre loop. - `i * scale + (objectField.intField + 1) == x` is transformed into: `i * scale == x - (objectField.intField + 1)` - `(x - objectField.intField) - 1)` only has uses out of the pre loop and is sunk out of the loop. It ends up pinned on the the exit projection of the pre loop. There is already logic in `PhaseIdealLoop::ctrl_of_use_out_of_loop()` to handle similar cases but, here, the difference is that the use (`SubI` of 1) for what's being sunk doesn't have control in the main loop but between the pre and main loop so that logic doesn't catch this case. There is also a possible bug in that logic: n_loop->_next == get_loop(u_loop->_head->as_CountedLoop()->skip_strip_mined()) assumes the loop that follows the pre loop in the loop tree is the main loop which is not guaranteed. In this particular case, the assert is harmless: RCE can't eliminate the condition but it's hard to rule out a similar scenario with a condition that RCE could remove. I propose revisiting the condition in `PhaseIdealLoop::ctrl_of_use_out_of_loop()` so it skips all uses that are dominated by the loop exit of the pre loop. ------------- Commit messages: - test - fix Changes: https://git.openjdk.org/jdk/pull/21601/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21601&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342330 Stats: 87 lines in 2 files changed: 83 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21601.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21601/head:pull/21601 PR: https://git.openjdk.org/jdk/pull/21601 From epeter at openjdk.org Mon Oct 21 08:52:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Oct 2024 08:52:20 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/f6d711b7...457735c9 I'm not the library guy, so I'm not going to block this PR. But having good tests and benchmarks are crucial, and very time-consuming to come up with for us compiler engineers. So these examples here are very valuable, and I think the time is not wasted working more on it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2426020938 From epeter at openjdk.org Mon Oct 21 09:14:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 21 Oct 2024 09:14:34 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v30] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 14:56:29 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Replacing flag based checks with CPU feature checks in IR validation test. Wow this is really a very moving target - quite frustrating to review - it takes up way too much of the reviewers bandwidth. You really need to split up your PRs as much as possible so that review is easier and faster. I think these optimizations should be done in a separate PR. I see no reason why they need to be mixed in. https://github.com/openjdk/jdk/commit/c56508899b000b8b1eb6755c901798a2a3685ef5 The `UMinVNode::Ideal` etc changes with IR rules. I also cannot easily review just such a diff, it does not let me make comments - so I still have to go find the relevant code in the whole PR. Some comments on this section: Node* UMinVNode::Ideal(PhaseGVN* phase, bool can_reshape) { bool match1 = in(1)->Opcode() == Op_UMinV || in(1)->Opcode() == Op_UMaxV; bool match2 = in(2)->Opcode() == Op_UMinV || in(2)->Opcode() == Op_UMaxV; // UMin (UMin(a, b), UMax(a, b)) => UMin(a, b) // UMin (UMin(a, b), UMax(b, a)) => UMin(a, b) if (match1 && match2) { if ((in(1)->in(1) == in(2)->in(1) && in(1)->in(2) == in(2)->in(2)) || (in(1)->in(2) == in(2)->in(1) && in(1)->in(1) == in(2)->in(2))) { return new UMinVNode(in(1)->in(1), in(1)->in(2), vect_type()); } } return nullptr; } What about the reverse case `min(max, min)`? And are we sure we do not need to verify any types in all of these cases? Maybe not - but I'd rather be super sure - not that things get misinterpreted and then folded the wrong way. I mean if I now approve only that diff, then I still need to approve the whole PR, which means I need to spend a lot of time on everything again. Otherwise, in theory people could smuggle anything in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2426080652 From shade at openjdk.org Mon Oct 21 09:46:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Oct 2024 09:46:39 GMT Subject: RFR: 8342540: InterfaceCalls micro-benchmark gives misleading results [v2] In-Reply-To: References: <9FV9ocK5WUg_Oer8NsJ1nHAatz7omIYcHH4ltbAvv-I=.2a4d5b2e-e446-4824-a76d-8458013692c5@github.com> Message-ID: On Fri, 18 Oct 2024 16:21:39 GMT, Andrew Haley wrote: > I have no idea how often well-predicted megamorphic calls occur. I can speculate that the "typical" megamorphic case lies somewhere between these extremes, but that is all. It may well be that normal behaviour is chaotic, but I strongly suspect that the cases are unlikely to be equally probable, as they are here. So, it is possible that an utterly unpredictable access pattern is just as unrealistic as a perfectly predictable one. Yeah, all right. We can have a test that brackets the real-world performance between "best" and "worst" cases. If you want to make "best" and "worst" only differ in the actual payload, maybe we should compute both scrambled and non-scrambled indexes, feed both to `Blackhole`, and only then select the index based on `randomized`. This way we would always compute both indexes. Also stats question: Do you know if scrambling actually produces the full period in requested `range`, and whether the frequency for individual cases is roughly the same? We kinda assume this xor-step is well distributed and has good enthropy in lower bits, but is it in practice? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21581#issuecomment-2426161090 From fgao at openjdk.org Mon Oct 21 10:10:14 2024 From: fgao at openjdk.org (Fei Gao) Date: Mon, 21 Oct 2024 10:10:14 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: <-FYW9yWcn9euWjBA9qpWiqVm5NaaNo-ZmJuSKl3wWTo=.4e2515cd-18eb-4754-80ba-782f611ab429@github.com> On Wed, 16 Oct 2024 14:00:37 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add missing files src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8207: > 8205: for (int op = 0; op < VectorSupport::NUM_VECTOR_OP_MATH; op++) { > 8206: int vop = VectorSupport::VECTOR_OP_MATH_START + op; > 8207: if (vop == VectorSupport::VECTOR_OP_TANH) { Could you please add a comment that mentions the reason, for example `// Skip "tanh" because there is performance regression` src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8225: > 8223: for (int op = 0; op < VectorSupport::NUM_VECTOR_OP_MATH; op++) { > 8224: int vop = VectorSupport::VECTOR_OP_MATH_START + op; > 8225: if (vop == VectorSupport::VECTOR_OP_TANH) { Ditto ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21502#discussion_r1808455401 PR Review Comment: https://git.openjdk.org/jdk/pull/21502#discussion_r1808470048 From aph at openjdk.org Mon Oct 21 11:19:46 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 21 Oct 2024 11:19:46 GMT Subject: RFR: 8342540: InterfaceCalls micro-benchmark gives misleading results [v2] In-Reply-To: References: <9FV9ocK5WUg_Oer8NsJ1nHAatz7omIYcHH4ltbAvv-I=.2a4d5b2e-e446-4824-a76d-8458013692c5@github.com> Message-ID: On Mon, 21 Oct 2024 09:43:17 GMT, Aleksey Shipilev wrote: > > I have no idea how often well-predicted megamorphic calls occur. I can speculate that the "typical" megamorphic case lies somewhere between these extremes, but that is all. It may well be that normal behaviour is chaotic, but I strongly suspect that the cases are unlikely to be equally probable, as they are here. So, it is possible that an utterly unpredictable access pattern is just as unrealistic as a perfectly predictable one. > > Yeah, all right. We can have a test that brackets the real-world performance between "best" and "worst" cases. > > If you want to make "best" and "worst" only differ in the actual payload, maybe we should compute both scrambled and non-scrambled indexes, feed both to `Blackhole`, and only then select the index based on `randomized`. This way we would always compute both indexes. Sure, I can try that, but I doubt it'll affect much. The great virtue of the xorshift generator is to be so lightweight, just a few clocks, that it barely affects anything. > Also stats question: Do you know if scrambling actually produces the full period in requested `range`, and whether the frequency for individual cases is roughly the same? We kinda assume this xor-step is well distributed and has good enthropy in lower bits, but is it in practice? I don't think we assume it: the analysis is in Marsaglia's _Xorshift RNGs_ paper. The full period (except 0) is proven there. The generator I've used here is a compromise: it has to be just good enough to throw off a branch predictor, while still being of very low overhead. There are several elaborations of Xorshift with better statistical properties (see "xoroshiro*") but they are less successful at just getting out of the way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21581#issuecomment-2426382114 From aph at openjdk.org Mon Oct 21 11:29:16 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 21 Oct 2024 11:29:16 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Mon, 21 Oct 2024 08:34:38 GMT, Aleksey Shipilev wrote: > Looking at this differently: if I wrote the `copy_memory` stub from scratch today, would I do this optimization? Answering personally, I probably would. I wonder if that's too low a bar for whether we should change it now, though. If we adopt "would I have done it this way, first time around?" as an appropriate threshold for a change, it gives permission for endless churn with tiny issues. Churn _in itself_ is bad. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2426401929 From liach at openjdk.org Mon Oct 21 12:22:04 2024 From: liach at openjdk.org (Chen Liang) Date: Mon, 21 Oct 2024 12:22:04 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/492ea9a6...457735c9 Thanks for the evaluations. I think we can use this unsafe version for now, and use direct writes once JIT can reliably eliminate bound checks. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19626#pullrequestreview-2381888080 From jbhateja at openjdk.org Mon Oct 21 12:25:37 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 21 Oct 2024 12:25:37 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Factor out IR tests and Transforms to follow-up PRs. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/dacc9313..7506ac14 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=29-30 Stats: 595 lines in 4 files changed: 0 ins; 595 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From duke at openjdk.org Mon Oct 21 12:26:18 2024 From: duke at openjdk.org (duke) Date: Mon, 21 Oct 2024 12:26:18 GMT Subject: Withdrawn: 8032218: Emit single post-constructor barrier for chain of superclass constructors In-Reply-To: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> References: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> Message-ID: On Fri, 19 Apr 2024 22:31:10 GMT, Joshua Cao wrote: > [C2 emits a StoreStore barrier for each constructor call](https://github.com/openjdk/jdk/blob/72ca7bafcd49a98c1fe09da72e4e47683f052e9d/src/hotspot/share/opto/parse1.cpp#L1016) in a chain of superclass constructor calls. It is unnecessary. We only need to emit a single barrier for each object allocation / each pair of `Allocation/InitializeNode`. > > [Macro expansion emits a trailing StoreStore after an InitializeNode](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/macro.cpp#L1610-L1628). This `StoreStore` is sufficient as the post-constructor barrier. From the [InitializeNode definition](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/memnode.cpp#L3639-L3642): > >> // An InitializeNode collects and isolates object initialization after > // an AllocateNode and before the next possible safepoint. As a > // memory barrier (MemBarNode), it keeps critical stores from drifting > // down past any safepoint or any publication of the allocation. > > This PR modifies `Parse::do_exits()` such that it only emits a barrier for a constructor if we find that the constructed object does not have an `InitializeNode`. It is possible that we cannot find an `InitializeNode` i.e. if the outermost method of the compilation unit is the constructor. We still need to emit a barrier in these cases. > > Passes hotspot tier1 locally on x86 linux machine. New tests make sure that there is a single `StoreStore` for chained constructors. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18870 From jbhateja at openjdk.org Mon Oct 21 12:29:03 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 21 Oct 2024 12:29:03 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v30] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 09:12:25 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Replacing flag based checks with CPU feature checks in IR validation test. > > Wow this is really a very moving target - quite frustrating to review - it takes up way too much of the reviewers bandwidth. You really need to split up your PRs as much as possible so that review is easier and faster. > > I think these optimizations should be done in a separate PR. I see no reason why they need to be mixed in. > > https://github.com/openjdk/jdk/commit/c56508899b000b8b1eb6755c901798a2a3685ef5 The `UMinVNode::Ideal` etc changes with IR rules. > > I also cannot easily review just such a diff, it does not let me make comments - so I still have to go find the relevant code in the whole PR. > > Some comments on this section: > > > Node* UMinVNode::Ideal(PhaseGVN* phase, bool can_reshape) { > bool match1 = in(1)->Opcode() == Op_UMinV || in(1)->Opcode() == Op_UMaxV; > bool match2 = in(2)->Opcode() == Op_UMinV || in(2)->Opcode() == Op_UMaxV; > // UMin (UMin(a, b), UMax(a, b)) => UMin(a, b) > // UMin (UMin(a, b), UMax(b, a)) => UMin(a, b) > if (match1 && match2) { > if ((in(1)->in(1) == in(2)->in(1) && in(1)->in(2) == in(2)->in(2)) || > (in(1)->in(2) == in(2)->in(1) && in(1)->in(1) == in(2)->in(2))) { > return new UMinVNode(in(1)->in(1), in(1)->in(2), vect_type()); > } > } > return nullptr; > } > > > Are we sure we do not need to verify any types in all of these cases? Maybe not - but I'd rather be super sure - not that things get misinterpreted and then folded the wrong way. > > I mean if I now approve only that diff, then I still need to approve the whole PR, which means I need to spend a lot of time on everything again. Otherwise, in theory people could smuggle anything in. Hey @eme64 , > Wow this is really a very moving target - quite frustrating to review - it takes up way too much of the reviewers bandwidth. You really need to split up your PRs as much as possible so that review is easier and faster. I understand reviewer's pain, which is why I mentioned about last two changes specifically. Vector API related PRs generally looks bulky due to script generated sources and tests. Barring that it may not demand much of your time. But, to keep you motivated :-) and following @PaulSandoz and yours suggestions, I have moved out IR validations and Min / Max transforms to following follow up PRs. - https://bugs.openjdk.org/browse/JDK-8342676 (https://github.com/openjdk/jdk/pull/21604) - https://bugs.openjdk.org/browse/JDK-8342677 (https://github.com/openjdk/jdk/pull/21603) Can you kindly run this though your test infrastructure and approve if it goes fine ? Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2426527459 From swen at openjdk.org Mon Oct 21 12:51:04 2024 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 21 Oct 2024 12:51:04 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/02b43061...457735c9 This PR requires PR #19970 to support Unsafe.putByte MergeStore for better performance. I will add more test scenarios to TestMergeStores.java or MergeStoreBench later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2426575966 From ihse at openjdk.org Mon Oct 21 13:01:41 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 21 Oct 2024 13:01:41 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 04:11:03 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. > > In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! Build changes look good (but would be slightly better without the extra blank line). I have not reviewed the actual hotspot changes. make/hotspot/gensrc/GensrcAdlc.gmk line 60: > 58: > 59: ADLC_CFLAGS += -D$(HOTSPOT_TARGET_CPU_DEFINE) > 60: Maybe skip this blank line? src/hotspot/cpu/aarch64/c2_lowering_aarch64.cpp line 31: > 29: Node* PhaseLowering::lower_node(Node* in) { > 30: return nullptr; > 31: } Note that this, and several other of the new files, are missing a trailing newline on the last line (marked by the red circle/dash icon). I thought this was checked by jcheck, but apparently not. It is still not recommended, though. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21599#pullrequestreview-2381995734 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1808752950 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1808763526 From shade at openjdk.org Mon Oct 21 13:03:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Oct 2024 13:03:20 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: <7J1_9D2X-EMnq6JdM4brCmFKVwlU3JClJ8VjfZ8m6uk=.3fca14fb-7c6a-447f-bbc8-30edc33ca3cc@github.com> On Mon, 21 Oct 2024 11:25:50 GMT, Andrew Haley wrote: > > Looking at this differently: if I wrote the `copy_memory` stub from scratch today, would I do this optimization? Answering personally, I probably would. > > I wonder if that's too low a bar for whether we should change it now, though. If we adopt "would I have done it this way, first time around?" as an appropriate threshold for a change, it gives permission for endless churn with tiny issues. I don't think about this in slippery slope terms. There is always a cost/benefit calculation for every change. Contributors' costs are not zero (see the testing that goes into these), which is a part of back-pressure against doing this often. Reviewers costs are not zero either, but that cost is in control of reviewers -- _that's us!_ -- themselves. In that sense, tiny patches become a problem only when we collectively spend disproportionate amount of time on them, like in this PR. I advocate for taking low-benefit patches quickly, without accidentally inflating their costs. With regards for post-integration touchups, I think the reverse policy would be worse, if not straight-up chilling: we will have to be extra-stringent on getting every little detail absolutely, un-changeably right during every review, if the amendments would not pass the contribution bar. I don't believe this is the outcome we want either. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2426608041 From shade at openjdk.org Mon Oct 21 13:18:33 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Oct 2024 13:18:33 GMT Subject: RFR: 8342540: InterfaceCalls micro-benchmark gives misleading results [v2] In-Reply-To: References: Message-ID: <1r0BjzAPun6IchWY-06Y1vnxsZCBKekEgucBzRMFNJ4=.c53b1dc3-f91e-4dd2-9a1f-a6302af96875@github.com> On Fri, 18 Oct 2024 16:18:02 GMT, Andrew Haley wrote: >> `InterfaceCalls.java` makes highly predictable memory accesses, which leads to a gross time underestimate of the case where a megamorphic access is unpredictable. >> >> Here's one example, with and without randomization. The unpredictable megamorphic call takes more than 4* as long as the benchmark. >> >> >> Benchmark (randomized) Mode Cnt Score Error Units >> InterfaceCalls.test2ndInt3Types false avgt 4 5.013 ? 0.081 ns/op >> InterfaceCalls.test2ndInt3Types true avgt 4 23.421 ? 0.102 ns/op >> ``` >> >> This patch adds the "randomized" parameter, which allows the measurement of predictable and unpredictable megamorphic calls. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Update test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java > > Co-authored-by: Aleksey Shipil?v Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21581#pullrequestreview-2382078512 From shade at openjdk.org Mon Oct 21 13:18:33 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Oct 2024 13:18:33 GMT Subject: RFR: 8342540: InterfaceCalls micro-benchmark gives misleading results [v2] In-Reply-To: References: <9FV9ocK5WUg_Oer8NsJ1nHAatz7omIYcHH4ltbAvv-I=.2a4d5b2e-e446-4824-a76d-8458013692c5@github.com> Message-ID: On Mon, 21 Oct 2024 11:16:51 GMT, Andrew Haley wrote: > I don't think we assume it: the analysis is in Marsaglia's _Xorshift RNGs_ paper. The full period (except 0) is proven there. Yeah, all right. Seeing is believing, and indeed the distributions look okay: public class XorShiftHisto { static int l; static int step(int range) { l = scramble(l); return (l & Integer.MAX_VALUE) % range; } static int scramble(int n) { int x = n; x ^= x << 13; x ^= x >>> 17; x ^= x << 5; return x == 0 ? 1 : x; } public static void main(String... args) { for (int range = 1; range <= 10; range++) { l = 0; int[] histo = new int[range]; for (int c = 0; c < 1000000; c++) { histo[step(range)]++; } System.out.println("Histo " + range + ": " + Arrays.toString(histo)); } } } $ java XorShiftHisto.java Histo 1: [1000000] Histo 2: [500093, 499907] Histo 3: [333036, 333888, 333076] Histo 4: [250116, 249824, 249977, 250083] Histo 5: [200577, 199994, 199940, 199945, 199544] Histo 6: [166273, 166493, 166425, 166763, 167395, 166651] Histo 7: [142746, 142358, 142815, 143556, 143004, 142908, 142613] Histo 8: [124841, 124510, 125431, 124673, 125275, 125314, 124546, 125410] Histo 9: [110846, 111088, 110885, 110834, 111145, 111201, 111356, 111655, 110990] Histo 10: [100197, 100061, 99901, 99953, 100070, 100380, 99933, 100039, 99992, 99474] ------------- PR Comment: https://git.openjdk.org/jdk/pull/21581#issuecomment-2426651581 From chagedorn at openjdk.org Mon Oct 21 13:45:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Oct 2024 13:45:11 GMT Subject: RFR: 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode Message-ID: ### Two Uses of `Opaque4` The `Opaque4` node is currently used for two things: 1. Non-null checks of intrinsics to accompany a cast to non-null where we implicitly know that an object is non-null but C2 does not. This is required to fold control away if the cast becomes top at some point. 2. Template Assertion Predicates ### How to Differentiate between Uses The only reliable way to differentiate between the two uses is to check if there are `OpaqueLoop*Nodes` above the `Opaque4` node which should only be found for a Template Assertion Predicate. This can be done with the method `assertion_predicate_has_loop_opaque_node()` which follows the inputs of the `Opaque4` recursively. ### Problems by Sharing `Opaque4` Nodes for Two Concepts This sharing of the `Opaque4` comes with some problems: - One need to be careful when checking for `Opaque4` nodes if the code can be applied for non-null checks and/or Template Assertion Predicates. Ideally, we should add assertions to exclude the unexpected case (mostly done). - Walking the graph with `assertion_predicate_has_loop_opaque_node()` has some overhead, especially when checking the negation, we could walk quite some nodes in the worst case. - There is no real benefit of (re)using `Opaque4` nodes for two concepts and could easily be separated. ### Split `Opaque4` into Two Classes to Separate Uses Therefore, this patch proposes to split `Opaque4` into a new `OpaqueNonNull` and a `OpaqueTemplateAssertionPredicate` class (the latter accompanies the already split off `OpaqueInitializedAssertionPredicate` class). I went through all the uses and updated them accordingly. As a consequence, I could turn `assertion_predicate_has_loop_opaque_node()` into a debug only method since it's now only used in assertion code. I additionally removed the second input of the `Opaque4` nodes since these nodes have always eventually been replaced by true. ### Turning `OpaqueTemplateAssertionPredicate` into a Non-Macro Node The `Opaque4` was a macro node. The `OpaqueNotNull` still is but the `OpaqueTemplateAssertionPredicate` is turned into a non-macro node and is removed after loop opts. At that point, loops can no longer be split and we do no need to create Initialized Assertion Predicates from the templates anymore. Thus, they can be cleaned up in the post loop opts IGVN. ### Pre-Requisite for Replacing Uncommon Traps with Halt Nodes for Template Assertion Predicates Eventually, I want to get rid of UCTs for Template Assertion Predicates completely. They have been used out of convenience to reuse code in Loop Predication. But there is a problem when copying them to loops where we do not have traps (for example, for the remaining loop when peeling one iteration off or when copying them to the main loop). We cannot use UCTs anymore and need to fall back to `Halt` nodes. Supporting the UCT and `Halt` node format for Template Assertion Predicates is difficult and does not really give us a benefit. Therefore, I want to get rid of UCTs and only use `Halt` nodes with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047). This patch lays the foundation to enable this change to still easily detect a Template Assertion Predicate. Thanks, Christian ------------- Commit messages: - Fix whitespaces - clean-up - 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode Changes: https://git.openjdk.org/jdk/pull/21608/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21608&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342043 Stats: 237 lines in 18 files changed: 66 ins; 19 del; 152 mod Patch: https://git.openjdk.org/jdk/pull/21608.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21608/head:pull/21608 PR: https://git.openjdk.org/jdk/pull/21608 From chagedorn at openjdk.org Mon Oct 21 13:45:14 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 21 Oct 2024 13:45:14 GMT Subject: RFR: 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:46:16 GMT, Christian Hagedorn wrote: > ### Two Uses of `Opaque4` > The `Opaque4` node is currently used for two things: > 1. Non-null checks of intrinsics to accompany a cast to non-null where we implicitly know that an object is non-null but C2 does not. This is required to fold control away if the cast becomes top at some point. > 2. Template Assertion Predicates > > ### How to Differentiate between Uses > The only reliable way to differentiate between the two uses is to check if there are `OpaqueLoop*Nodes` above the `Opaque4` node which should only be found for a Template Assertion Predicate. This can be done with the method `assertion_predicate_has_loop_opaque_node()` which follows the inputs of the `Opaque4` recursively. > > ### Problems by Sharing `Opaque4` Nodes for Two Concepts > This sharing of the `Opaque4` comes with some problems: > - One need to be careful when checking for `Opaque4` nodes if the code can be applied for non-null checks and/or Template Assertion Predicates. Ideally, we should add assertions to exclude the unexpected case (mostly done). > - Walking the graph with `assertion_predicate_has_loop_opaque_node()` has some overhead, especially when checking the negation, we could walk quite some nodes in the worst case. > - There is no real benefit of (re)using `Opaque4` nodes for two concepts and could easily be separated. > > ### Split `Opaque4` into Two Classes to Separate Uses > Therefore, this patch proposes to split `Opaque4` into a new `OpaqueNonNull` and a `OpaqueTemplateAssertionPredicate` class (the latter accompanies the already split off `OpaqueInitializedAssertionPredicate` class). I went through all the uses and updated them accordingly. > > As a consequence, I could turn `assertion_predicate_has_loop_opaque_node()` into a debug only method since it's now only used in assertion code. I additionally removed the second input of the `Opaque4` nodes since these nodes have always eventually been replaced by true. > > ### Turning `OpaqueTemplateAssertionPredicate` into a Non-Macro Node > The `Opaque4` was a macro node. The `OpaqueNotNull` still is but the `OpaqueTemplateAssertionPredicate` is turned into a non-macro node and is removed after loop opts. At that point, loops can no longer be split and we do no need to create Initialized Assertion Predicates from the templates anymore. Thus, they can be cleaned up in the post loop opts IGVN. > > ### Pre-Requisite for Replacing Uncommon Traps with Halt Nodes for Template Assertion Predicates > Eventually, I want to get rid of UCTs for Template Assertion Pre... src/hotspot/share/opto/escape.cpp line 584: > 582: can_reduce = (opc == Op_CmpP || opc == Op_CmpN) && can_reduce_cmp(n, iff_cmp); > 583: } else { > 584: assert(iff->in(1)->is_OpaqueNotNull(), "must be OpaqueNotNull"); I don't think we can have a Template Assertion Predicate here, so I added an assertion. If this ever fails, we can just remove the assertion add a test for it. src/hotspot/share/opto/loopTransform.cpp line 1195: > 1193: bol->is_OpaqueTemplateAssertionPredicate() || > 1194: bol->is_OpaqueInitializedAssertionPredicate(), > 1195: "Opaque node of a non-null-check or an Assertion Predicate"); Need case for both new opaque nodes (hit during testing). src/hotspot/share/opto/loopopts.cpp line 2205: > 2203: // split if to break. > 2204: assert(!use->is_OpaqueTemplateAssertionPredicate(), > 2205: "should not clone a Template Assertion Predicate which should be removed once it's useless"); `OpaqueTemplateAssertionPredicate` nodes should only have a single unique use. I therefore added this assert here. src/hotspot/share/opto/macro.cpp line 2431: > 2429: default: > 2430: assert(n->Opcode() == Op_LoopLimit || > 2431: n->is_OpaqueNotNull() || `OpaqueTemplateAssertionPredicate` is not a macro node. src/hotspot/share/opto/predicates.cpp line 146: > 144: } > 145: IfNode* if_node = node->in(0)->as_If(); > 146: return if_node->in(1)->is_OpaqueTemplateAssertionPredicate(); Detection is now easier. I plan to add more verification code that we also find the correct nodes on the failing path. But I'll wait with that until we have only Halt nodes for Template Assertion Predicates. src/hotspot/share/opto/split_if.cpp line 326: > 324: Node* use = bol->unique_out(); > 325: if (use->is_OpaqueNotNull() || use->is_OpaqueTemplateAssertionPredicate() || > 326: use->is_OpaqueInitializedAssertionPredicate()) { Code is executed with both new opaque nodes (hit during testing). Same below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21608#discussion_r1808732859 PR Review Comment: https://git.openjdk.org/jdk/pull/21608#discussion_r1808753732 PR Review Comment: https://git.openjdk.org/jdk/pull/21608#discussion_r1808735284 PR Review Comment: https://git.openjdk.org/jdk/pull/21608#discussion_r1808736365 PR Review Comment: https://git.openjdk.org/jdk/pull/21608#discussion_r1808740425 PR Review Comment: https://git.openjdk.org/jdk/pull/21608#discussion_r1808745488 From duke at openjdk.org Mon Oct 21 13:53:26 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Mon, 21 Oct 2024 13:53:26 GMT Subject: Integrated: 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 13:54:14 GMT, Tom?? Zezula wrote: > The `compiler/jvmci/TestJVMCISavedProperties` test fails due to overlapping output from the saved system properties. The initialization of `savedProperties` in `jdk.vm.ci.services.Services` is correctly synchronized, the issue suggests that two separate libjvmci compiler isolates are each printing their own set of saved properties. > > In a successful test run, the `CompileBroker` thread aborts the VM before it completes initialization, displaying the error message `Cannot use JVMCI compiler: Value of jvmci.Compiler is ?null?` (due to the `-Djvmci.Compiler=null` setting), and the message `DONE IN MAIN` is never printed. However, in the failed test output, the `DONE IN MAIN` message appears, indicating that the VM initialization completed and created the `JVMCIRuntime` instance. The `CompileBroker` thread might have concurrently initialized `JVMCIRuntime` in another isolate. Since each `JVMCIRuntime` initialization outputs system properties, this is likely the cause of the overlapping output. > > The proposed solution is to use the `-XX:+EnableJVMCI` flag instead of `-XX:+UseJVMCICompiler`, to avoid this issue. This pull request has now been integrated. Changeset: 330f2b5a Author: Tomas Zezula Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/330f2b5a9cad02b8e6882fc6eee996d7792d3de1 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/21583 From aph at openjdk.org Mon Oct 21 14:05:16 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 21 Oct 2024 14:05:16 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: <7J1_9D2X-EMnq6JdM4brCmFKVwlU3JClJ8VjfZ8m6uk=.3fca14fb-7c6a-447f-bbc8-30edc33ca3cc@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> <7J1_9D2X-EMnq6JdM4brCmFKVwlU3JClJ8VjfZ8m6uk=.3fca14fb-7c6a-447f-bbc8-30edc33ca3cc@github.com> Message-ID: On Mon, 21 Oct 2024 12:59:28 GMT, Aleksey Shipilev wrote: > I don't think about this in slippery slope terms. There is always a cost/benefit calculation for every change. Contributors' costs are not zero (see the testing that goes into these), which is a part of back-pressure against doing this often. Reviewers costs are not zero either, but that cost is in control of reviewers -- _that's us!_ -- themselves. > In that sense, tiny patches become a problem only when we collectively spend disproportionate amount of time on them, like in this PR. I'm not sure about that. Even if you were to discount the cost of the PR review process entirely, there's the long-term cost of churn in the change logs, etc. We've all done archaeology, trying to figure out why and when a change was made, and every patch costs something. I certainly would much prefer to read a cleanup patch with twenty small changes in it than twenty small patches scattered in between more significant contributions. > I advocate for taking low-benefit patches quickly, without accidentally inflating their costs. I think that you are at one end of the range of opinions, judging by the other comments to this post. I held a workshop session about low-benefit patches at the most recent committers' workshop, and there was a good deal of opinion, with many people advocating some pushback against then. > With regards for post-integration touchups, I think the reverse policy would be worse, if not straight-up chilling: we will have to be extra-stringent on getting every little detail absolutely, un-changeably right during every review, if the amendments would not pass the contribution bar. I don't believe this is the outcome we want either. True enough, but I don't think anyone was proposing a change to the _de facto_ current policy. @dean-long and @rose00 were raising a skeptical eyebrow about this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2426778619 From dlunden at openjdk.org Mon Oct 21 14:13:05 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 21 Oct 2024 14:13:05 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 Message-ID: Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. ### Changeset - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. - Add a regression test. ### Testing - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. ------------- Commit messages: - Add one more word for incoming and outgoing registers Changes: https://git.openjdk.org/jdk/pull/21612/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21612&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342156 Stats: 50 lines in 2 files changed: 49 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21612.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21612/head:pull/21612 PR: https://git.openjdk.org/jdk/pull/21612 From mdoerr at openjdk.org Mon Oct 21 14:16:28 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 21 Oct 2024 14:16:28 GMT Subject: RFR: 8342701: [PPC64] TestOSRLotsOfLocals.java crashes Message-ID: Fix for "assert(nbits == 32 || (-(1 << (nbits-1)) <= x && x < (1 << (nbits-1)))) failed: value out of range" in new test "TestOSRLotsOfLocals" (see JBS). ------------- Commit messages: - 8342701: [PPC64] TestOSRLotsOfLocals.java crashes Changes: https://git.openjdk.org/jdk/pull/21613/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21613&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342701 Stats: 19 lines in 1 file changed: 16 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21613.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21613/head:pull/21613 PR: https://git.openjdk.org/jdk/pull/21613 From mdoerr at openjdk.org Mon Oct 21 14:37:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 21 Oct 2024 14:37:06 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms Message-ID: There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values, so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) Example output (linux): Registers: RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 TRAPNO=0x000000000000000e XMM[0]=0x0000000000000000 0x0000000000000000 XMM[1]=0x00007fea3c034200 0x0000000000000000 XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 XMM[3]=0x00007fea7c3d6608 0x0000000000000000 XMM[4]=0x00007f0000000000 0x0000000000000000 XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff XMM[6]=0x0000000000000000 0x00007fea897d0f98 XMM[7]=0x0202020202020202 0x0000000000000000 XMM[8]=0x0000000000000000 0x0202020202020202 XMM[9]=0x666e69206e6f6974 0x0000000000000000 XMM[10]=0x0000000000000000 0x6e6f6974616d726f XMM[11]=0x0000000000000001 0x0000000000000000 XMM[12]=0x00007fea8b684400 0x0000000000000001 XMM[13]=0x0000000000000000 0x0000000000000000 XMM[14]=0x0000000000000000 0x0000000000000000 XMM[15]=0x0000000000000000 0x0000000000000000 MXCSR=0x0000037f ------------- Commit messages: - 8342607: Enhance register printing on x86_64 platforms Changes: https://git.openjdk.org/jdk/pull/21615/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342607 Stats: 20 lines in 2 files changed: 20 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21615.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21615/head:pull/21615 PR: https://git.openjdk.org/jdk/pull/21615 From roland at openjdk.org Mon Oct 21 14:53:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 21 Oct 2024 14:53:30 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v26] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Mon, 21 Oct 2024 06:58:38 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by roland (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-2382366567 From kxu at openjdk.org Mon Oct 21 15:00:30 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 21 Oct 2024 15:00:30 GMT Subject: Integrated: 8328528: C2 should optimize long-typed parallel iv in an int counted loop In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Tue, 26 Mar 2024 14:43:42 GMT, Kangcheng Xu wrote: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. This pull request has now been integrated. Changeset: 80ec5522 Author: Kangcheng Xu URL: https://git.openjdk.org/jdk/commit/80ec552248470dda2d0d003be9315e9e39eb5276 Stats: 492 lines in 3 files changed: 456 ins; 1 del; 35 mod 8328528: C2 should optimize long-typed parallel iv in an int counted loop Reviewed-by: roland, chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/18489 From sviswanathan at openjdk.org Mon Oct 21 15:01:27 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 21 Oct 2024 15:01:27 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v4] In-Reply-To: References: <9LrhezvsYwS32PEUN9wn6hKDJJn0wybl3YXSHuohUC8=.eded7969-c03c-4a75-a2c2-2d0e9682722d@github.com> Message-ID: On Fri, 18 Oct 2024 06:55:37 GMT, Tobias Hartmann wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > Sorry for the delay. I re-submitted testing with the latest version and it all passed. Thanks a lot @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21480#issuecomment-2426933331 From sviswanathan at openjdk.org Mon Oct 21 15:01:27 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 21 Oct 2024 15:01:27 GMT Subject: Integrated: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 23:27:35 GMT, Sandhya Viswanathan wrote: > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: 153ad911 Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/153ad911f9fa3389ab92a1acab44526e3f4be4a2 Stats: 31 lines in 3 files changed: 24 ins; 3 del; 4 mod 8338126: C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 Reviewed-by: thartmann, jbhateja, epeter ------------- PR: https://git.openjdk.org/jdk/pull/21480 From duke at openjdk.org Mon Oct 21 15:37:24 2024 From: duke at openjdk.org (duke) Date: Mon, 21 Oct 2024 15:37:24 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v5] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 23:13:45 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' of https://git.openjdk.org/jdk into sha-512 > - Updated code as per review comments > - Addressed a review comment > - Updated code as per review comment & updated test case > - Updated AMD64.java > - Merge master > - SHA-512 implementation using SHA-NI instructions @smita-kamath Your change (at version af309deb89fe33a60c635f7a2269858dc1f757c2) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20633#issuecomment-2427030286 From svkamath at openjdk.org Mon Oct 21 15:40:19 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 21 Oct 2024 15:40:19 GMT Subject: Integrated: 8341052: SHA-512 implementation using SHA-NI In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 21:34:05 GMT, Smita Kamath wrote: > Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. This pull request has now been integrated. Changeset: 18bcbf79 Author: Smita Kamath URL: https://git.openjdk.org/jdk/commit/18bcbf7941f7567449983b3f317401efb3e34d39 Stats: 271 lines in 10 files changed: 252 ins; 11 del; 8 mod 8341052: SHA-512 implementation using SHA-NI Reviewed-by: jbhateja, ascarpino, sviswanathan, sparasa ------------- PR: https://git.openjdk.org/jdk/pull/20633 From duke at openjdk.org Mon Oct 21 15:54:48 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 21 Oct 2024 15:54:48 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v2] In-Reply-To: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: > [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) > > Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change > > Ran array copy and tier 1 on aarch64 machine > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 > ============================== > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 > jtreg:test/jdk:tier1 2436 2436 0 0 > jtreg:test/langtools:tier1 4577 4577 0 0 > jtreg:test/jaxp:tier1 0 0 0 0 > jtreg:test/lib-test:tier1 34 34 0 0 > ============================== Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Fix shift check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21589/files - new: https://git.openjdk.org/jdk/pull/21589/files/0d59131d..efae62ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21589&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21589&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21589/head:pull/21589 PR: https://git.openjdk.org/jdk/pull/21589 From duke at openjdk.org Mon Oct 21 16:03:23 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 21 Oct 2024 16:03:23 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v3] In-Reply-To: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: > [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) > > Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change > > Ran array copy and tier 1 on aarch64 machine > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 > ============================== > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 > jtreg:test/jdk:tier1 2436 2436 0 0 > jtreg:test/langtools:tier1 4577 4577 0 0 > jtreg:test/jaxp:tier1 0 0 0 0 > jtreg:test/lib-test:tier1 34 34 0 0 > ============================== Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Update check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21589/files - new: https://git.openjdk.org/jdk/pull/21589/files/efae62ef..461e2a53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21589&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21589&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21589/head:pull/21589 PR: https://git.openjdk.org/jdk/pull/21589 From shade at openjdk.org Mon Oct 21 16:03:24 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Oct 2024 16:03:24 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v3] In-Reply-To: References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Mon, 21 Oct 2024 16:00:49 GMT, Chad Rakoczy wrote: >> [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) >> >> Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change >> >> Ran array copy and tier 1 on aarch64 machine >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 >> ============================== >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 >> jtreg:test/jdk:tier1 2436 2436 0 0 >> jtreg:test/langtools:tier1 4577 4577 0 0 >> jtreg:test/jaxp:tier1 0 0 0 0 >> jtreg:test/lib-test:tier1 34 34 0 0 >> ============================== > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Update check Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21589#pullrequestreview-2382552270 From shade at openjdk.org Mon Oct 21 16:03:24 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Oct 2024 16:03:24 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> <7J1_9D2X-EMnq6JdM4brCmFKVwlU3JClJ8VjfZ8m6uk=.3fca14fb-7c6a-447f-bbc8-30edc33ca3cc@github.com> Message-ID: <2e9SWLUxAMSPQFEvDzKscb2wO1uHTvc2TL3mKrCBVZQ=.ae7eb2ec-3e1a-4152-b6f4-1b9dd62e3f02@github.com> On Mon, 21 Oct 2024 14:02:40 GMT, Andrew Haley wrote: > I'm not sure about that. Even if you were to discount the cost of the PR review process entirely, there's the long-term cost of churn in the change logs, etc. We've all done archaeology, trying to figure out why and when a change was made, and every patch costs something. I certainly would much prefer to read a cleanup patch with twenty small changes in it than twenty small patches scattered in between more significant contributions. Noted. Speaking of archaeology, nearly all of my archaeological digs start either with `git bisect`, or with "History for selection" for a particular hunk. In both these steps having 20 atomic commits is palpably easier to deal with than with 20 small cleanups clobbered together. Anyway, I still believe that patch is good, that we are spending way too much time here, so I am approving and moving on. If nobody else agrees, I am OK with abandoning this patch as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2427085780 From duke at openjdk.org Mon Oct 21 16:39:40 2024 From: duke at openjdk.org (duke) Date: Mon, 21 Oct 2024 16:39:40 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v7] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 23:42:04 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request incrementally with two additional commits since the last revision: > > - Refactor > - Add missing instructions @hanklo6 Your change (at version b0f60df7669d0cade6d9273cd6374a3950c6a160) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20857#issuecomment-2427174844 From duke at openjdk.org Mon Oct 21 16:49:27 2024 From: duke at openjdk.org (hanklo6) Date: Mon, 21 Oct 2024 16:49:27 GMT Subject: Integrated: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions In-Reply-To: References: Message-ID: <6VuqG4tUXNKjv7gsc_B8e8XpAan4vcJmY4tMrqb8pyo=.0390d37a-b626-424a-b4ff-24bc77201ba9@github.com> On Wed, 4 Sep 2024 16:44:57 GMT, hanklo6 wrote: > Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > ### Generate test instructions > With `binutils = 2.43` > * `python3 x86-asmtest.py > asmtest.out.h` > ### Run test > * `make test TEST="gtest:AssemblerX86"` This pull request has now been integrated. Changeset: 52d752c4 Author: hanklo6 URL: https://git.openjdk.org/jdk/commit/52d752c43b3a9935ea97051c39adf381084035cc Stats: 85016 lines in 3 files changed: 85016 ins; 0 del; 0 mod 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions Reviewed-by: jbhateja, sviswanathan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/20857 From dhanalla at openjdk.org Mon Oct 21 17:04:25 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Mon, 21 Oct 2024 17:04:25 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v2] In-Reply-To: <1wvg7YCL6ne-5LEBQ7Mi7fkrVn-d72W9_UcsrzvKho8=.d29bb5c9-9e83-42a7-a937-c304efb3b4dd@github.com> References: <1wvg7YCL6ne-5LEBQ7Mi7fkrVn-d72W9_UcsrzvKho8=.d29bb5c9-9e83-42a7-a937-c304efb3b4dd@github.com> Message-ID: On Wed, 18 Sep 2024 16:07:49 GMT, Dhamoder Nalla wrote: >> Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: >> >> add test case > > adding a comment to keep the PR active. > > > > > Hi @dhanalla, this is not the right way to handle this assertion failure. The assertion is here to catch real issues when creating too many nodes due to a bug in the code. For example, in [JDK-8256934](https://bugs.openjdk.org/browse/JDK-8256934), we hit this assert due to an inefficient cloning algorithm in Partial Peeling. We should not remove the assert. > > > > > For such bugs, you first need to investigate why we hit the node limit with your reproducer. Once you find the problem, it can usually be put into one of the following categories: > > > > > > > > > > 1. We have a real bug and by fixing it, we no longer create this many nodes. > > > > > 2. It is a false-positive and it is expected to create this many nodes (note that the node limit of 80000 is quite large, so it needs to be explained well why it is a false-positive - more often than not, there is still a bug somewhere that is first missed). > > > > > 3. We have a real bug but the fix is too hard, risky, or just not worth the complexity, especially for a real edge-case (also needs to be explained and justified well). > > > > > > > > > > Note that for category 2 and 3, when we cannot easily fix the problem of creating too many nodes, we should implement a bail out fix from the current optimization and not the entire compilation to reduce the performance impact. This was, for example, done in JDK-8256934, where a fix was too risky at that point during the release and a proper fix was delayed. The fix was to bail out from Partial Peeling when hitting a critically high amount of live nodes (an estimate to ensure we never hit the node limit). > > > > > You should then describe your analysis in the PR then explain your proposed solution. You should also add the reproducer as test case to your patch. > > > > > > > > > > > > Thanks @chhagedorn for reviewing this PR. This scenario corresponds to Case 2 mentioned above, where more than 80,000 nodes are expected to be created. As an alternative solution, could we consider limiting the JVM option `EliminateAllocationArraySizeLimit` (in `c2_globals.hpp`) to a range between 0 and 1024, instead of the current range of 0 to `max_jint`, as the upper limit of `max_jint` may not be practical? > > > > > > > > > Hi @dhanalla, can you elaborate more why it is expected and not an actual bug where we unnecessarily create too many nodes? > > > > > > The test case (ReductionPerf.java) involves multiple arrays, each with a size of 8k. Using the JVM option -XX:EliminateAllocationArraySizeLimit=10240 (which is larger than array size 8k) will enable scalar replacement for all array elements. This, in turn, may result in constructing a graph with over 80k live nodes. > > I see, thanks for explaining the test behavior. > > > As an alternative solution, could we consider limiting the JVM option EliminateAllocationArraySizeLimit (in c2_globals.hpp) to a range between 0 and 1024, instead of the current range of 0 to max_jint, as the upper limit of max_jint may not be practical? > > I think that is just a mitigation which makes it less likelier. You could probably still just come up with a test with a lot more arrays of size 1024 and hit the node limit again. > > I suggest to first extract a simpler minimal test case which isolates the problem. Then you can also play around with different values for `EliminateAllocationArraySizeLimit`. I could imagine that you can also trigger this problem with just one huge array when you set the limit large enough. This could make it easier to understand and explain where the nodes are exactly created, what kind of nodes those are etc. Once we know that, we can try to implement a bailout right there which is independent of how big `EliminateAllocationArraySizeLimit` is. Thanks @chhagedorn, the graph is growing to more than 80K nodes as two nodes (a set of phi nodes) are added for each element in the array during scalarization process. I minimized the test case to create a single array of size 48K. The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing in code_gen. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2427259939 From duke at openjdk.org Mon Oct 21 17:58:34 2024 From: duke at openjdk.org (hanklo6) Date: Mon, 21 Oct 2024 17:58:34 GMT Subject: RFR: 8342715: x86 unused orw instruction encoding could be removed Message-ID: x86 orw(Register, Register) encoding is missing 0x66 prefix. This instruction is unused and can be removed. It was initially removed in #20901 but was re-added in #20698 ------------- Commit messages: - Remove x86 orw encoding Changes: https://git.openjdk.org/jdk/pull/21620/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21620&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342715 Stats: 6 lines in 2 files changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21620/head:pull/21620 PR: https://git.openjdk.org/jdk/pull/21620 From sviswanathan at openjdk.org Mon Oct 21 18:06:49 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 21 Oct 2024 18:06:49 GMT Subject: RFR: 8342715: x86 unused orw instruction encoding could be removed In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 17:49:36 GMT, hanklo6 wrote: > x86 orw(Register, Register) encoding is missing 0x66 prefix. This instruction is unused and can be removed. It was initially removed in #20901 but was re-added in #20698 Thanks for fixing this. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21620#pullrequestreview-2382945173 From jbhateja at openjdk.org Mon Oct 21 18:15:37 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 21 Oct 2024 18:15:37 GMT Subject: RFR: 8342715: x86 unused orw instruction encoding could be removed In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 17:49:36 GMT, hanklo6 wrote: > x86 orw(Register, Register) encoding is missing 0x66 prefix. This instruction is unused and can be removed. It was initially removed in #20901 but was re-added in #20698 Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21620#pullrequestreview-2382981672 From jbhateja at openjdk.org Mon Oct 21 18:21:17 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 21 Oct 2024 18:21:17 GMT Subject: RFR: 8342715: x86 unused orw instruction encoding could be removed In-Reply-To: References: Message-ID: <9ymiVkYw1XAIAjqTc3Pbh652_ZZtSX6w8qpHND9DUhQ=.caf31566-3bf8-439a-8151-17315fc40ad9@github.com> On Mon, 21 Oct 2024 17:49:36 GMT, hanklo6 wrote: > x86 orw(Register, Register) encoding is missing 0x66 prefix. This instruction is unused and can be removed. It was initially removed in #20901 but was re-added in #20698 LGTM ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21620#pullrequestreview-2382998641 From kvn at openjdk.org Mon Oct 21 19:37:20 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 21 Oct 2024 19:37:20 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <2qlxVlgEtYbLMbCiFmkmxWSUdL7TYQ4SH4dYE-kqT6M=.affc8393-ea09-43a0-9447-09fdc71dc8a9@github.com> On Mon, 21 Oct 2024 08:07:22 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rm dead assert Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19970#pullrequestreview-2383148092 From kvn at openjdk.org Mon Oct 21 19:48:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 21 Oct 2024 19:48:14 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:05:54 GMT, Daniel Lund?n wrote: > Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. > > ### Changeset > > - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. > - Add a regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. Looks good. Thank you for adding test. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21612#pullrequestreview-2383167782 From kvn at openjdk.org Mon Oct 21 19:51:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 21 Oct 2024 19:51:38 GMT Subject: RFR: 8342715: x86 unused orw instruction encoding could be removed In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 17:49:36 GMT, hanklo6 wrote: > x86 orw(Register, Register) encoding is missing 0x66 prefix. This instruction is unused and can be removed. It was initially removed in #20901 but was re-added in #20698 Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21620#pullrequestreview-2383173563 From dlong at openjdk.org Mon Oct 21 20:14:24 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 21 Oct 2024 20:14:24 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v3] In-Reply-To: References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Mon, 21 Oct 2024 16:03:23 GMT, Chad Rakoczy wrote: >> [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) >> >> Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change >> >> Ran array copy and tier 1 on aarch64 machine >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 >> ============================== >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 >> jtreg:test/jdk:tier1 2436 2436 0 0 >> jtreg:test/langtools:tier1 4577 4577 0 0 >> jtreg:test/jaxp:tier1 0 0 0 0 >> jtreg:test/lib-test:tier1 34 34 0 0 >> ============================== > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Update check If I had known about aarch64 optimizations like Neoverse 0 latency moves when I first looked at this, I would have approved it, so some code comments hinting at why we are doing this not-completely-obvious optimization would be appreciated. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21589#pullrequestreview-2383215877 From dlong at openjdk.org Mon Oct 21 20:17:27 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 21 Oct 2024 20:17:27 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v3] In-Reply-To: References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Mon, 21 Oct 2024 16:03:23 GMT, Chad Rakoczy wrote: >> [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) >> >> Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change >> >> Ran array copy and tier 1 on aarch64 machine >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 >> ============================== >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 >> jtreg:test/jdk:tier1 2436 2436 0 0 >> jtreg:test/langtools:tier1 4577 4577 0 0 >> jtreg:test/jaxp:tier1 0 0 0 0 >> jtreg:test/lib-test:tier1 34 34 0 0 >> ============================== > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Update check BTW, apparently Neoverse has 0 latency moves even for 32-bit registers, so they must do something clever with clearing the high bits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2427631042 From cslucas at openjdk.org Mon Oct 21 20:36:30 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 21 Oct 2024 20:36:30 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" Message-ID: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> Please, review this patch to fix an issue that may occur when serializing debug information related to reduce allocation merges. The problem happens when there are more than one JVMS in a `uncommon_trap` and a _younger_ JVMS doesn't have the RAM inputs as a local/expression/monitor but an older JVMS does. In that situation the loop at line 1173 of output.cpp will set the `is_root` property of the ObjectValue to `false` when processing the younger JVMS even though it may have been set to `true` when visiting the older JVMS. Tested on: - Win, Mac & Linux tier1-4 on x64 & Aarch64. - CTW with some thousands of jars. ------------- Commit messages: - Preserve is_root status across JVMS. Changes: https://git.openjdk.org/jdk/pull/21624/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21624&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335977 Stats: 102 lines in 2 files changed: 100 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21624.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21624/head:pull/21624 PR: https://git.openjdk.org/jdk/pull/21624 From duke at openjdk.org Mon Oct 21 20:42:25 2024 From: duke at openjdk.org (duke) Date: Mon, 21 Oct 2024 20:42:25 GMT Subject: RFR: 8342715: x86 unused orw instruction encoding could be removed In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 17:49:36 GMT, hanklo6 wrote: > x86 orw(Register, Register) encoding is missing 0x66 prefix. This instruction is unused and can be removed. It was initially removed in #20901 but was re-added in #20698 @hanklo6 Your change (at version be4d8b0ed3baa733b8251121a2d645d20816e09e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21620#issuecomment-2427675588 From dlong at openjdk.org Mon Oct 21 20:42:27 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 21 Oct 2024 20:42:27 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:05:54 GMT, Daniel Lund?n wrote: > Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. > > ### Changeset > > - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. > - Add a regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. Is this a problem when UseAPX is false? Where exactly is the rounding problem? If it's because number_of_registers != available_gp_registers then it feels like we are not fixing the problem but only masking its side-effects. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21612#issuecomment-2427674557 From dlong at openjdk.org Mon Oct 21 21:00:19 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 21 Oct 2024 21:00:19 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <23PlA6HVVcBvytZRFHXey3DaMRodEesfU7pf-xp30ZA=.41466050-b52f-4a33-81db-0e2f6c527797@github.com> On Mon, 21 Oct 2024 08:07:22 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rm dead assert src/hotspot/share/opto/noOverflowInt.hpp line 45: > 43: explicit NoOverflowInt(jlong value) : _is_NaN(true), _value(0) { > 44: jint trunc = (jint)value; > 45: if ((jlong)trunc == value) { Do you think we need these runtime checks even in product builds? If not, then consider using checked_cast<>() here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1809524345 From dlong at openjdk.org Mon Oct 21 21:04:16 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 21 Oct 2024 21:04:16 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 21 Oct 2024 08:07:22 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rm dead assert src/hotspot/share/opto/noOverflowInt.hpp line 66: > 64: if (a.is_NaN()) { return make_NaN(); } > 65: if (b.is_NaN()) { return make_NaN(); } > 66: return NoOverflowInt(java_subtract((jlong)a.value(), (jlong)b.value())); For add and subtract, I don't think two 32-bit inputs can cause a 64-bit result to wrap or overflow, but it could then using the java_ APIs that wrap would hide the overflow. I think you can safely use + and - for these two. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1809529564 From dlong at openjdk.org Mon Oct 21 21:08:15 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 21 Oct 2024 21:08:15 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 21 Oct 2024 08:07:22 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rm dead assert src/hotspot/share/opto/noOverflowInt.hpp line 72: > 70: if (a.is_NaN()) { return make_NaN(); } > 71: if (b.is_NaN()) { return make_NaN(); } > 72: return NoOverflowInt(java_multiply((jlong)a.value(), (jlong)b.value())); Suggestion: return NoOverflowInt((jlong)a.value() * (jlong)b.value()); Can't overflow 64-bits, so no need for java_multiply ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1809534285 From dlong at openjdk.org Mon Oct 21 21:11:41 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 21 Oct 2024 21:11:41 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 21 Oct 2024 08:07:22 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rm dead assert src/hotspot/share/opto/noOverflowInt.hpp line 80: > 78: jint shift = b.value(); > 79: if (shift < 0 || shift > 31) { return make_NaN(); } > 80: return NoOverflowInt(java_shift_left((jlong)a.value(), shift)); Suggestion: return NoOverflowInt((jlong)a.value() << shift); left shift for shift values between 0 and 31 should be well-defined. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1809539333 From dlong at openjdk.org Mon Oct 21 21:20:17 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 21 Oct 2024 21:20:17 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 21 Oct 2024 08:07:22 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rm dead assert src/hotspot/share/opto/noOverflowInt.hpp line 51: > 49: } > 50: > 51: static NoOverflowInt make_NaN() { return NoOverflowInt(); } Suggestion: static constexpr NoOverflowInt make_NaN() { return NoOverflowInt(); } I think this can be constexpr, but you may need to add constexpr to the ctor as well. src/hotspot/share/opto/noOverflowInt.hpp line 77: > 75: friend NoOverflowInt operator<<(const NoOverflowInt a, const NoOverflowInt b) { > 76: if (a.is_NaN()) { return make_NaN(); } > 77: if (b.is_NaN()) { return make_NaN(); } Suggestion: if (a.is_NaN()) { return a; } if (b.is_NaN()) { return b; } This might be more efficient than creating a new one. src/hotspot/share/opto/noOverflowInt.hpp line 90: > 88: > 89: NoOverflowInt abs() const { > 90: if (is_NaN()) { return make_NaN(); } Suggestion: if (is_NaN() || value() == min_jint) { return make_NaN(); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1809550632 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1809548407 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1809546446 From duke at openjdk.org Mon Oct 21 21:20:17 2024 From: duke at openjdk.org (hanklo6) Date: Mon, 21 Oct 2024 21:20:17 GMT Subject: Integrated: 8342715: x86 unused orw instruction encoding could be removed In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 17:49:36 GMT, hanklo6 wrote: > x86 orw(Register, Register) encoding is missing 0x66 prefix. This instruction is unused and can be removed. It was initially removed in #20901 but was re-added in #20698 This pull request has now been integrated. Changeset: 8276a419 Author: hanklo6 Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/8276a419585b9f06c6e9b5fc5813aecc434e00bf Stats: 6 lines in 2 files changed: 0 ins; 6 del; 0 mod 8342715: x86 unused orw instruction encoding could be removed Reviewed-by: sviswanathan, jbhateja, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21620 From duke at openjdk.org Mon Oct 21 21:22:57 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 21 Oct 2024 21:22:57 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v4] In-Reply-To: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: <50saqCLba9Bx2oSGvZdJNsfjc8ZaOtDSMWIDkqP8QSA=.7658d698-3a89-484f-b977-0694eae251ba@github.com> > [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) > > Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change > > Ran array copy and tier 1 on aarch64 machine > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 > ============================== > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 > jtreg:test/jdk:tier1 2436 2436 0 0 > jtreg:test/langtools:tier1 4577 4577 0 0 > jtreg:test/jaxp:tier1 0 0 0 0 > jtreg:test/lib-test:tier1 34 34 0 0 > ============================== Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21589/files - new: https://git.openjdk.org/jdk/pull/21589/files/461e2a53..707fed36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21589&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21589&range=02-03 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21589/head:pull/21589 PR: https://git.openjdk.org/jdk/pull/21589 From dlong at openjdk.org Mon Oct 21 21:40:39 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 21 Oct 2024 21:40:39 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v4] In-Reply-To: <50saqCLba9Bx2oSGvZdJNsfjc8ZaOtDSMWIDkqP8QSA=.7658d698-3a89-484f-b977-0694eae251ba@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> <50saqCLba9Bx2oSGvZdJNsfjc8ZaOtDSMWIDkqP8QSA=.7658d698-3a89-484f-b977-0694eae251ba@github.com> Message-ID: On Mon, 21 Oct 2024 21:22:57 GMT, Chad Rakoczy wrote: >> [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) >> >> Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change >> >> Ran array copy and tier 1 on aarch64 machine >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 >> ============================== >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 >> jtreg:test/jdk:tier1 2436 2436 0 0 >> jtreg:test/langtools:tier1 4577 4577 0 0 >> jtreg:test/jaxp:tier1 0 0 0 0 >> jtreg:test/lib-test:tier1 34 34 0 0 >> ============================== > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21589#pullrequestreview-2383356496 From dlong at openjdk.org Mon Oct 21 22:06:34 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 21 Oct 2024 22:06:34 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v12] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 15:03:51 GMT, Quan Anh Mai wrote: >> If you want to understand full details of how a (symmetrical) type lattice with duals supports a unified model for many different type flow analysis algorithms you can read up on it in Nielsen, Nielsen and Hankin's book Principles of Program Analysis. If it is new to you then a more simplified account of the use of (unqualified) TOP and BOTTOM types in type flow analysis can be found in Muchnick's book Advanced Compiler Design and Implementation. Note that Cliff Click goes against conventional mathematical terminology in making BOTTOM a universal type and TOP an empty (unrealizable) type. >> >> One detail that may not be obvious is that the sub-lattice for int and long sorts includes the hierarchy of single, continuous intervals. Individual integral values (on the lattice centre line) are modelled as singleton ranges i.e. [a,a]. Given the large cardinality of the set of continuous intervals this makes it necessary to place a bound on any fixed point iterations that widen interval ranges. The iteration is killed by widening to the maximum range (this is what Cliff refers to in the code as a 'death march'). > > @adinn Thanks a lot for your direction, it is really interesting and took me a while to read through. Although, I think that in practice, currently C2 only uses `dual` to compute the join of 2 types, which is rather confusing. I'm OK with this clever "lazy dual" optimization using _is_dual that get reversed instead of doing an actual computation, but how can we make sure it is only used with join() and not accidentally used with other operations? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1809592664 From psandoz at openjdk.org Mon Oct 21 22:16:16 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 21 Oct 2024 22:16:16 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: <3-cpxSaOIJwguG378TdsGpSkEh6LSnPW8C9pMBNRMJI=.532c659b-0949-4990-85f9-c714f084c9aa@github.com> On Mon, 21 Oct 2024 08:20:44 GMT, Emanuel Peter wrote: > Ah, I guess that is generally the letter we use when passing longs, like in method signatures. And that is because `L` is already taken for objects. Oh well, makes sense now ? Welcome to the in-between world of Java and the JVM :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2427837878 From sparasa at openjdk.org Mon Oct 21 23:33:19 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 21 Oct 2024 23:33:19 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v7] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 23:42:04 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request incrementally with two additional commits since the last revision: > > - Refactor > - Add missing instructions test/hotspot/gtest/x86/test_assemblerx86.cpp line 53: > 51: stringStream ss; > 52: ss.print("%s\n", insn); > 53: ss.print("Ours: "); "Ours" could be replaced with "OpenJDK encoding". test/hotspot/gtest/x86/test_assemblerx86.cpp line 58: > 56: } > 57: ss.print_cr(""); > 58: ss.print("Theirs: "); Same as above. "Theirs" could be replaced with "gcc encoding" or something similar. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1809656234 PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1809656619 From jwaters at openjdk.org Tue Oct 22 00:59:25 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 22 Oct 2024 00:59:25 GMT Subject: RFR: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs [v2] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 21:12:28 GMT, Christian Hagedorn wrote: >> ### Assertion Predicates Have the True Projection on the Success Path >> By design, Template and Initialized Assertion Predicates have the true projection on the success path and false projection on the failing path. >> >> ### Is a Node a Template Assertion Predicate? >> Template Assertion Predicates can have an uncommon trap on or a `Halt` node following the failing/false projection. When trying to find out if a node is a Template Assertion Predicate, we call `TemplateAssertionPredicate::is_predicate()` which checks if the provided node is the success projection of a Template Assertion Predicate. This involves checking if the other projection has a `Halt` node or an UCT (L141): >> https://github.com/openjdk/jdk/blob/7a64fbbb9292f4d65a6970206dec1a7d7645046b/src/hotspot/share/opto/predicates.cpp#L135-L144 >> >> ### New `PredicateIterator` Class >> >> [JDK-8340786](https://bugs.openjdk.org/browse/JDK-8340786) introduced a new `PredicateIterator` class to simplify iteration over predicates and replaced code that was doing some custom iteration. >> >> #### Usual Usage >> Normally, we always start from a loop entry and follow the predicates (i.e. reaching a predicate over its success path). >> >> #### Special Usage >> However, I also replaced the predicate skipping in `PhaseIdealLoop::build_loop_late_post_work()` with the new `PredicateIterator` class which could start at any node, including a failing path false projection. >> >> ### Problem: Two Uncommon Traps for a Template Assertion Predicate >> The fuzzer now found a case where a Template Assertion Predicate has a predicate UCT on the failing **and** the success path due to folding away some dead nodes: >> >> ![image](https://github.com/user-attachments/assets/c2a9395e-9eaf-46a0-b978-8847d8e21945) >> >> In the Special Usage mentioned above, we could be using the `PredicateIterator` with `505 IfFalse` as starting node. In the core method `RegularPredicateBlockIterator::for_each()` for the traversal, we call the following with `current` = `505 IfFalse`: >> https://github.com/openjdk/jdk/blob/1ea1f33f66326804ca2892fe0659a9acb7ee72ae/src/hotspot/share/opto/predicates.hpp#L628-L629 >> `TemplateAssertionPredicate::is_predicate()` succeeds because we have a valid Template Assertion Predicate If node and the other projection `504 IfTrue` has a predicate UCT. We then fail when trying to convert the `IfFalse` to an `IfTrue` with `current->as_IfTrue()` on L629. >> >> ### Solution >> The fix is straight forward: `TemplateAssertionPr... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Review Vladimir I might be missing something, but why is the new test inside src/hotspot? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21561#issuecomment-2428001222 From dholmes at openjdk.org Tue Oct 22 01:21:28 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 22 Oct 2024 01:21:28 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v7] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 23:42:04 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request incrementally with two additional commits since the last revision: > > - Refactor > - Add missing instructions FYI the new test fails on our old Mac systems - ref https://bugs.openjdk.org/browse/JDK-8342768 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20857#issuecomment-2428019409 From jwaters at openjdk.org Tue Oct 22 01:38:26 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 22 Oct 2024 01:38:26 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values, so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp line 444: > 442: st->cr(); > 443: for (int i = 0; i < 16; ++i) { > 444: const uint64_t *xmm = ((const uint64_t*)&(uc->Xmm0)) + 2 * i; Using a more specific cast is generally better, in this case since it's a pointer to pointer, reinterpret_cast might suffice. And if that is used, the const is not required (I think) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1809726442 From jwaters at openjdk.org Tue Oct 22 01:49:17 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 22 Oct 2024 01:49:17 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 01:35:20 GMT, Julian Waters wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values, so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp line 444: > >> 442: st->cr(); >> 443: for (int i = 0; i < 16; ++i) { >> 444: const uint64_t *xmm = ((const uint64_t*)&(uc->Xmm0)) + 2 * i; > > Using a more specific cast is generally better, in this case since it's a pointer to pointer, reinterpret_cast might suffice. And if that is used, the const is not required (I think) Actually, scratch that. The const is probably needed even with reinterpret_cast. I got it mixed up with something else ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1809733463 From jbhateja at openjdk.org Tue Oct 22 03:24:15 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 22 Oct 2024 03:24:15 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering In-Reply-To: References: Message-ID: <7OotWFc8fJ7nKgj6J1CSCA5ybjkGkQeo7KduWecxXWE=.386310fc-a876-4f8b-bc08-39989e753461@github.com> On Mon, 21 Oct 2024 04:11:03 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. > > In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! @jaskarth , Being MacroLogic optimization pass author, I volenteer to move it to lowering phase once this patch gets integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2428128710 From epeter at openjdk.org Tue Oct 22 06:54:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Oct 2024 06:54:30 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: <23PlA6HVVcBvytZRFHXey3DaMRodEesfU7pf-xp30ZA=.41466050-b52f-4a33-81db-0e2f6c527797@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <23PlA6HVVcBvytZRFHXey3DaMRodEesfU7pf-xp30ZA=.41466050-b52f-4a33-81db-0e2f6c527797@github.com> Message-ID: On Mon, 21 Oct 2024 20:57:31 GMT, Dean Long wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rm dead assert > > src/hotspot/share/opto/noOverflowInt.hpp line 45: > >> 43: explicit NoOverflowInt(jlong value) : _is_NaN(true), _value(0) { >> 44: jint trunc = (jint)value; >> 45: if ((jlong)trunc == value) { > > Do you think we need these runtime checks even in product builds? If not, then consider using checked_cast<>() here. I definitely need these checks in product, yes. I want longs outside the int-range to become NaN. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1810052025 From epeter at openjdk.org Tue Oct 22 07:01:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Oct 2024 07:01:25 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 21 Oct 2024 21:01:15 GMT, Dean Long wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rm dead assert > > src/hotspot/share/opto/noOverflowInt.hpp line 66: > >> 64: if (a.is_NaN()) { return make_NaN(); } >> 65: if (b.is_NaN()) { return make_NaN(); } >> 66: return NoOverflowInt(java_subtract((jlong)a.value(), (jlong)b.value())); > > For add and subtract, I don't think two 32-bit inputs can cause a 64-bit result to wrap or overflow, but it could then using the java_ APIs that wrap would hide the overflow. I think you can safely use + and - for these two. Ok, I'll change it to `jlong +/-` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1810061123 From dlunden at openjdk.org Tue Oct 22 07:04:10 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 22 Oct 2024 07:04:10 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:05:54 GMT, Daniel Lund?n wrote: > Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. > > ### Changeset > > - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. > - Add a regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. This is a general problem and neither `UseAPX`, `number_of_registers`, nor `available_gp_registers` influence the static register mask size computation. Instead, the register mask size computation uses `RegisterForm::_reg_ctr` which we increment as we encounter registers during ADL parsing. I have summarized the rounding issue in more detail in JBS, but pasting it below as well for convenience. > The addition of the new APX registers results in less available space in register masks for method arguments. The (static) register mask size computation in RegisterForm::RegMask_Size does take the number of available registers into account, but also rounds up to an even number of 32-bit words. Specifically, the register mask size computation is > > (words_for_regs + 3 + 1) & ~1; > > Adding the new APX registers correctly bumps `words_for_regs` from 18 to 19, but due to the rounding, the result is the same (22) as for a value of 18 for `words_for_regs`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21612#issuecomment-2428417102 From epeter at openjdk.org Tue Oct 22 07:09:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Oct 2024 07:09:24 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 21 Oct 2024 21:14:07 GMT, Dean Long wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rm dead assert > > src/hotspot/share/opto/noOverflowInt.hpp line 90: > >> 88: >> 89: NoOverflowInt abs() const { >> 90: if (is_NaN()) { return make_NaN(); } > > Suggestion: > > if (is_NaN() || value() == min_jint) { return make_NaN(); } It is not necessary, and I actually test that in my gtest. The subtraction below handles the `min_jint` case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1810069838 From epeter at openjdk.org Tue Oct 22 07:09:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Oct 2024 07:09:25 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <5s8RC77-wK6VzWPn3rMzYbbx5u1rrz38MFYwmFcKlRk=.db068153-574a-4c92-9e16-62bdda59f8cd@github.com> On Tue, 22 Oct 2024 07:05:02 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/noOverflowInt.hpp line 90: >> >>> 88: >>> 89: NoOverflowInt abs() const { >>> 90: if (is_NaN()) { return make_NaN(); } >> >> Suggestion: >> >> if (is_NaN() || value() == min_jint) { return make_NaN(); } > > It is not necessary, and I actually test that in my gtest. The subtraction below handles the `min_jint` case. `ASSERT_TRUE(NoOverflowInt(min_jint).abs().is_NaN());` But if you insist on it I can add it in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1810071100 From epeter at openjdk.org Tue Oct 22 07:19:54 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Oct 2024 07:19:54 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: changes to NoOverflowInt for Dean ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/b8fc83ba..a35a7cfe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=04-05 Stats: 15 lines in 1 file changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From chagedorn at openjdk.org Tue Oct 22 07:21:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Oct 2024 07:21:30 GMT Subject: RFR: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs [v2] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 00:56:39 GMT, Julian Waters wrote: > I might be missing something, but why is the new test inside src/hotspot? Huh, you're right! I have no idea how this happened. Well, let's fix it: https://github.com/openjdk/jdk/pull/21629 Thanks for spotting and reporting this! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21561#issuecomment-2428450388 From mbaesken at openjdk.org Tue Oct 22 07:22:34 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 22 Oct 2024 07:22:34 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values, so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Looks good to me, but I could live well without that indentation of the MXCSR value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2428452656 From chagedorn at openjdk.org Tue Oct 22 07:23:14 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Oct 2024 07:23:14 GMT Subject: RFR: 8342787: Move misplaced TestTemplateAssertionPredicateWithTwoUCTs.java from src to test directory Message-ID: For some unknown reason, `TestTemplateAssertionPredicateWithTwoUCTs.java` ended up in the src directory instead of the test directory with [JDK-8342287](https://bugs.openjdk.org/browse/JDK-8342787). Maybe a patch application have gone wrong when moving the test to another checked out JDK repository. Anyway, this patch moves it to the correct place. Thanks to @TheShermanTanker for spotting this! ------------- Commit messages: - 8342787: Move misplaced TestTemplateAssertionPredicateWithTwoUCTs.java from src to test directory Changes: https://git.openjdk.org/jdk/pull/21629/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21629&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342787 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21629/head:pull/21629 PR: https://git.openjdk.org/jdk/pull/21629 From thartmann at openjdk.org Tue Oct 22 07:32:14 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 22 Oct 2024 07:32:14 GMT Subject: RFR: 8342787: Move misplaced TestTemplateAssertionPredicateWithTwoUCTs.java from src to test directory In-Reply-To: References: Message-ID: <8R3p_uRxuN1tjklwkpoliJaqgFYaS-dHJfQsn34TXVg=.e4b0488e-ed1b-44b6-88d6-d03bb4a1949e@github.com> On Tue, 22 Oct 2024 07:17:32 GMT, Christian Hagedorn wrote: > For some unknown reason, `TestTemplateAssertionPredicateWithTwoUCTs.java` ended up in the src directory instead of the test directory with [JDK-8342287](https://bugs.openjdk.org/browse/JDK-8342787). Maybe a patch application have gone wrong when moving the test to another checked out JDK repository. Anyway, this patch moves it to the correct place. > > Thanks to @TheShermanTanker for spotting this! Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21629#pullrequestreview-2384192421 From thartmann at openjdk.org Tue Oct 22 07:42:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 22 Oct 2024 07:42:21 GMT Subject: RFR: 8342330: C2: "node pinned on loop exit test?" assert failure In-Reply-To: References: Message-ID: <_bUbJzH10zwdRO_35iZth59cm-IbV0RLGKTesX69eV8=.33389e9e-d6b1-4132-a574-a8cabb7f9fb4@github.com> On Mon, 21 Oct 2024 08:40:31 GMT, Roland Westrelin wrote: > The assert fires because range check elimination processes a test of > the shape: > > > if (i * 4 != (x - objectField.intField) - 1)) { > ... > } > > > and `(x - objectField.intField) - 1)` has control on the exit > projection of the pre loop. > > This happens because: > > - `objectField.intField` depends on the null check of `objectField` > which is performed in the pre loop. > > - `i * scale + (objectField.intField + 1) == x` is transformed into: > `i * scale == x - (objectField.intField + 1)` > > - `(x - objectField.intField) - 1)` only has uses out of the pre loop > and is sunk out of the loop. It ends up pinned on the the exit > projection of the pre loop. > > > There is already logic in `PhaseIdealLoop::ctrl_of_use_out_of_loop()` > to handle similar cases but, here, the difference is that the use > (`SubI` of 1) for what's being sunk doesn't have control in the main > loop but between the pre and main loop so that logic doesn't catch > this case. > > There is also a possible bug in that logic: > > > n_loop->_next == get_loop(u_loop->_head->as_CountedLoop()->skip_strip_mined()) > > > assumes the loop that follows the pre loop in the loop tree is the > main loop which is not guaranteed. > > In this particular case, the assert is harmless: RCE can't eliminate > the condition but it's hard to rule out a similar scenario with a > condition that RCE could remove. I propose revisiting the condition in > `PhaseIdealLoop::ctrl_of_use_out_of_loop()` so it skips all uses that > are dominated by the loop exit of the pre loop. Looks reasonable to me. src/hotspot/share/opto/loopopts.cpp line 1949: > 1947: // test of the pre loop above the point in the graph where it's pinned. > 1948: if (n_loop->_head->is_CountedLoop() && n_loop->_head->as_CountedLoop()->is_pre_loop()) { > 1949: bool res = false; Suggestion: ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21601#pullrequestreview-2384198539 PR Review Comment: https://git.openjdk.org/jdk/pull/21601#discussion_r1810120394 From shade at openjdk.org Tue Oct 22 07:47:41 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 22 Oct 2024 07:47:41 GMT Subject: RFR: 8342787: Move misplaced TestTemplateAssertionPredicateWithTwoUCTs.java from src to test directory In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 07:17:32 GMT, Christian Hagedorn wrote: > For some unknown reason, `TestTemplateAssertionPredicateWithTwoUCTs.java` ended up in the src directory instead of the test directory with [JDK-8342287](https://bugs.openjdk.org/browse/JDK-8342787). Maybe a patch application have gone wrong when moving the test to another checked out JDK repository. Anyway, this patch moves it to the correct place. > > Thanks to @TheShermanTanker for spotting this! Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21629#pullrequestreview-2384228132 From roland at openjdk.org Tue Oct 22 07:53:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 22 Oct 2024 07:53:22 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops Message-ID: To optimize a long counted loop and long range checks in a long or int counted loop, the loop is turned into a loop nest. When the loop has few iterations, the overhead of having an outer loop whose backedge is never taken, has a measurable cost. Furthermore, creating the loop nest usually causes one iteration of the loop to be peeled so predicates can be set up. If the loop is short running, then it's an extra iteration that's run with range checks (compared to an int counted loop with int range checks). This change doesn't create a loop nest when: 1- it can be determined statically at loop nest creation time that the loop runs for a short enough number of iterations 2- profiling reports that the loop runs for no more than ShortLoopIter iterations (1000 by default). For 2-, a guard is added which is implemented as yet another predicate. While this change is in principle simple, I ran into a few implementation issues: - while c2 has a way to compute the number of iterations of an int counted loop, it doesn't have that for long counted loop. The existing logic for int counted loops promotes values to long to avoid overflows. I reworked it so it now works for both long and int counted loops. - I added a new deoptimization reason (Reason_short_running_loop) for the new predicate. Given the number of iterations is narrowed down by the predicate, the limit of the loop after transformation is a cast node that's control dependent on the short running loop predicate. Because once the counted loop is transformed, it is likely that range check predicates will be inserted and they will depend on the limit, the short running loop predicate has to be the one that's further away from the loop entry. Now it is also possible that the limit before transformation depends on a predicate (TestShortRunningLongCountedLoopPredicatesClone is an example), we can have: new predicates inserted after the transformation that depend on the casted limit that itself depend on old predicates added before the transformation. To solve this cicular dependency, parse and assert predicates are cloned between the old predicates and the loop head. The cloned short running loop parse predicate is the one that's used to insert the short running loop predicate. - In the case of a long counted loop, the loop is transformed into a regular loop with a new limit and transformed range checks that's later turned into an in counted loop. The int counted loop doesn't need loop limit checks because of the way it's constructed. There's an assert that catches that we don't attempt to add one. I ran into test failures where, by the time the int counted loop is created, the fact that the number of iterations of the loop is small enough to not need a loop limit check gets lost. I added a cast to make sure the narrowed limit's type is not lost (I had to do something similar for loop nests). But then, I ran into the same issue again because the cast was pushed through a sub or add and the narrowed type was lost. I propose that pushing casts through sub/add be only done after loop opts are over (same as what's done for range check `CastII`). On Maurizio's benchmark that's mentioned in the bug, this gives a ~30% performance increase. ------------- Commit messages: - more - more - more - more - more - fix & test Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342692 Stats: 1166 lines in 18 files changed: 1104 ins; 16 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From jwaters at openjdk.org Tue Oct 22 08:03:38 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 22 Oct 2024 08:03:38 GMT Subject: RFR: 8342787: Move misplaced TestTemplateAssertionPredicateWithTwoUCTs.java from src to test directory In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 07:17:32 GMT, Christian Hagedorn wrote: > For some unknown reason, `TestTemplateAssertionPredicateWithTwoUCTs.java` ended up in the src directory instead of the test directory with [JDK-8342287](https://bugs.openjdk.org/browse/JDK-8342787). Maybe a patch application have gone wrong when moving the test to another checked out JDK repository. Anyway, this patch moves it to the correct place. > > Thanks to @TheShermanTanker for spotting this! Marked as reviewed by jwaters (Committer). Not sure whether this ends with a newline or not (I think it does), but thanks for fixing this! ------------- PR Review: https://git.openjdk.org/jdk/pull/21629#pullrequestreview-2384269331 PR Comment: https://git.openjdk.org/jdk/pull/21629#issuecomment-2428542029 From jwaters at openjdk.org Tue Oct 22 08:04:18 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 22 Oct 2024 08:04:18 GMT Subject: RFR: 8342287: C2 fails with "assert(is_IfTrue()) failed: invalid node class: IfFalse" due to Template Assertion Predicate with two UCTs [v2] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 21:12:28 GMT, Christian Hagedorn wrote: >> ### Assertion Predicates Have the True Projection on the Success Path >> By design, Template and Initialized Assertion Predicates have the true projection on the success path and false projection on the failing path. >> >> ### Is a Node a Template Assertion Predicate? >> Template Assertion Predicates can have an uncommon trap on or a `Halt` node following the failing/false projection. When trying to find out if a node is a Template Assertion Predicate, we call `TemplateAssertionPredicate::is_predicate()` which checks if the provided node is the success projection of a Template Assertion Predicate. This involves checking if the other projection has a `Halt` node or an UCT (L141): >> https://github.com/openjdk/jdk/blob/7a64fbbb9292f4d65a6970206dec1a7d7645046b/src/hotspot/share/opto/predicates.cpp#L135-L144 >> >> ### New `PredicateIterator` Class >> >> [JDK-8340786](https://bugs.openjdk.org/browse/JDK-8340786) introduced a new `PredicateIterator` class to simplify iteration over predicates and replaced code that was doing some custom iteration. >> >> #### Usual Usage >> Normally, we always start from a loop entry and follow the predicates (i.e. reaching a predicate over its success path). >> >> #### Special Usage >> However, I also replaced the predicate skipping in `PhaseIdealLoop::build_loop_late_post_work()` with the new `PredicateIterator` class which could start at any node, including a failing path false projection. >> >> ### Problem: Two Uncommon Traps for a Template Assertion Predicate >> The fuzzer now found a case where a Template Assertion Predicate has a predicate UCT on the failing **and** the success path due to folding away some dead nodes: >> >> ![image](https://github.com/user-attachments/assets/c2a9395e-9eaf-46a0-b978-8847d8e21945) >> >> In the Special Usage mentioned above, we could be using the `PredicateIterator` with `505 IfFalse` as starting node. In the core method `RegularPredicateBlockIterator::for_each()` for the traversal, we call the following with `current` = `505 IfFalse`: >> https://github.com/openjdk/jdk/blob/1ea1f33f66326804ca2892fe0659a9acb7ee72ae/src/hotspot/share/opto/predicates.hpp#L628-L629 >> `TemplateAssertionPredicate::is_predicate()` succeeds because we have a valid Template Assertion Predicate If node and the other projection `504 IfTrue` has a predicate UCT. We then fail when trying to convert the `IfFalse` to an `IfTrue` with `current->as_IfTrue()` on L629. >> >> ### Solution >> The fix is straight forward: `TemplateAssertionPr... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Review Vladimir > > I might be missing something, but why is the new test inside src/hotspot? > > Huh, you're right! I have no idea how this happened. Well, let's fix it: #21629 > > Thanks for spotting and reporting this! Haha, no worries! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21561#issuecomment-2428542827 From roland at openjdk.org Tue Oct 22 08:27:10 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 22 Oct 2024 08:27:10 GMT Subject: RFR: 8342330: C2: "node pinned on loop exit test?" assert failure [v2] In-Reply-To: References: Message-ID: > The assert fires because range check elimination processes a test of > the shape: > > > if (i * 4 != (x - objectField.intField) - 1)) { > ... > } > > > and `(x - objectField.intField) - 1)` has control on the exit > projection of the pre loop. > > This happens because: > > - `objectField.intField` depends on the null check of `objectField` > which is performed in the pre loop. > > - `i * scale + (objectField.intField + 1) == x` is transformed into: > `i * scale == x - (objectField.intField + 1)` > > - `(x - objectField.intField) - 1)` only has uses out of the pre loop > and is sunk out of the loop. It ends up pinned on the the exit > projection of the pre loop. > > > There is already logic in `PhaseIdealLoop::ctrl_of_use_out_of_loop()` > to handle similar cases but, here, the difference is that the use > (`SubI` of 1) for what's being sunk doesn't have control in the main > loop but between the pre and main loop so that logic doesn't catch > this case. > > There is also a possible bug in that logic: > > > n_loop->_next == get_loop(u_loop->_head->as_CountedLoop()->skip_strip_mined()) > > > assumes the loop that follows the pre loop in the loop tree is the > main loop which is not guaranteed. > > In this particular case, the assert is harmless: RCE can't eliminate > the condition but it's hard to rule out a similar scenario with a > condition that RCE could remove. I propose revisiting the condition in > `PhaseIdealLoop::ctrl_of_use_out_of_loop()` so it skips all uses that > are dominated by the loop exit of the pre loop. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21601/files - new: https://git.openjdk.org/jdk/pull/21601/files/2fd46262..cbd2ed62 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21601&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21601&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21601.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21601/head:pull/21601 PR: https://git.openjdk.org/jdk/pull/21601 From roland at openjdk.org Tue Oct 22 08:33:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 22 Oct 2024 08:33:30 GMT Subject: RFR: 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode In-Reply-To: References: Message-ID: <7v5I_lOLC9pFaISHTB9K2pgJEQcjMtfiCX8HBjmm-ug=.d050c9e9-b1d7-47e3-a132-da417504735e@github.com> On Mon, 21 Oct 2024 12:46:16 GMT, Christian Hagedorn wrote: > OpaqueTemplateAssertionPredicate is turned into a non-macro node and is removed after loop opts. At that point, loops can no longer be split and we do no need to create Initialized Assertion Predicates from the templates anymore. Thus, they can be cleaned up in the post loop opts IGVN. The role of an assertion predicate is to catch an out of range input to a range check Cast or Conv nodes. If the `OpaqueTemplateAssertionPredicate` is removed after loop opts then you don't expect out of range values to appear after loop opts are over. But is it really the case? Why wouldn't an igvn later on not cause an out of range value? Also what's the benefit of changing when the opaque node is removed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21608#issuecomment-2428612876 From chagedorn at openjdk.org Tue Oct 22 08:38:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Oct 2024 08:38:25 GMT Subject: RFR: 8342787: Move misplaced TestTemplateAssertionPredicateWithTwoUCTs.java from src to test directory In-Reply-To: <8R3p_uRxuN1tjklwkpoliJaqgFYaS-dHJfQsn34TXVg=.e4b0488e-ed1b-44b6-88d6-d03bb4a1949e@github.com> References: <8R3p_uRxuN1tjklwkpoliJaqgFYaS-dHJfQsn34TXVg=.e4b0488e-ed1b-44b6-88d6-d03bb4a1949e@github.com> Message-ID: On Tue, 22 Oct 2024 07:29:22 GMT, Tobias Hartmann wrote: >> For some unknown reason, `TestTemplateAssertionPredicateWithTwoUCTs.java` ended up in the src directory instead of the test directory with [JDK-8342287](https://bugs.openjdk.org/browse/JDK-8342787). Maybe a patch application have gone wrong when moving the test to another checked out JDK repository. Anyway, this patch moves it to the correct place. >> >> Thanks to @TheShermanTanker for spotting this! > > Looks good and trivial. Thanks @TobiHartmann, @shipilev, and @TheShermanTanker for the quick reviews! > Not sure whether this ends with a newline or not (I think it does), but thanks for fixing this! It does, there were two new lines, so I removed one. Github will show a special "no-new-line" icon otherwise :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21629#issuecomment-2428620991 From chagedorn at openjdk.org Tue Oct 22 08:38:26 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Oct 2024 08:38:26 GMT Subject: Integrated: 8342787: Move misplaced TestTemplateAssertionPredicateWithTwoUCTs.java from src to test directory In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 07:17:32 GMT, Christian Hagedorn wrote: > For some unknown reason, `TestTemplateAssertionPredicateWithTwoUCTs.java` ended up in the src directory instead of the test directory with [JDK-8342287](https://bugs.openjdk.org/browse/JDK-8342787). Maybe a patch application have gone wrong when moving the test to another checked out JDK repository. Anyway, this patch moves it to the correct place. > > Thanks to @TheShermanTanker for spotting this! This pull request has now been integrated. Changeset: 2da7f2bc Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/2da7f2bcb066184831207ee8c1317094c9891b8a Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8342787: Move misplaced TestTemplateAssertionPredicateWithTwoUCTs.java from src to test directory Reviewed-by: thartmann, shade, jwaters ------------- PR: https://git.openjdk.org/jdk/pull/21629 From chagedorn at openjdk.org Tue Oct 22 08:46:20 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Oct 2024 08:46:20 GMT Subject: RFR: 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode In-Reply-To: <7v5I_lOLC9pFaISHTB9K2pgJEQcjMtfiCX8HBjmm-ug=.d050c9e9-b1d7-47e3-a132-da417504735e@github.com> References: <7v5I_lOLC9pFaISHTB9K2pgJEQcjMtfiCX8HBjmm-ug=.d050c9e9-b1d7-47e3-a132-da417504735e@github.com> Message-ID: On Tue, 22 Oct 2024 08:30:32 GMT, Roland Westrelin wrote: > > OpaqueTemplateAssertionPredicate is turned into a non-macro node and is removed after loop opts. At that point, loops can no longer be split and we do no need to create Initialized Assertion Predicates from the templates anymore. Thus, they can be cleaned up in the post loop opts IGVN. > > The role of an assertion predicate is to catch an out of range input to a range check Cast or Conv nodes. If the `OpaqueTemplateAssertionPredicate` is removed after loop opts then you don't expect out of range values to appear after loop opts are over. But is it really the case? Why wouldn't an igvn later on not cause an out of range value? Also what's the benefit of changing when the opaque node is removed? The `OpaqueTemplateAssertionPredicate` is only used for a Template Assertion Predicate from which we create new Initialized Assertion Predicates from when splitting a loop during loop opts, for example for Loop Peeling. When we no longer split loops, we do not need to create new Initialized Assertion Predicates anymore. So, we replace the `OpaqueTemplateAssertionPredicate` nodes with true to let the Template Assertion Predicates be folded away in the post loop opts IGVN round. However, the `OpaqueInitializedAssertionPredicate` nodes, which accompany the Initialized Assertion Predicates and make sure control is also folded away, will be kept and are only removed during macro node expansion (like the `Opaque4` nodes before). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21608#issuecomment-2428643955 From rcastanedalo at openjdk.org Tue Oct 22 08:55:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Oct 2024 08:55:38 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 In-Reply-To: References: Message-ID: <8UbXTjkMPw6xs9rvGtJukgbq8dOM4q-ogBLXlA7sdhA=.41db68e4-1f59-47e2-b0eb-6caa9a7a7d7b@github.com> On Mon, 21 Oct 2024 14:05:54 GMT, Daniel Lund?n wrote: > Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. > > ### Changeset > > - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. > - Add a regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. src/hotspot/share/adlc/formsopt.cpp line 177: > 175: // Add one more word to avoid problematic rounding. Specifically, APX added > 176: // 16 more registers but did not result in a mask size increase. > 177: // Round up to the next doubleword size. I agree with Dean that the comment should state more clearly what is problematic with the current rounding. From your analysis in the JBS issue, I take it that the problem is that 1) the number of bits left for stack locations depends on the value of `words_for_regs`, 2) before the addition of APX registers we were relying on the slack created by the up-rounding to accommodate enough stack locations, and 3) after adding APX registers this slack has been reduced significantly. Is this the case? If so, I suggest to update the comment to something like: Suggestion: // Round up to the next doubleword size. // Add one more word to accommodate a reasonable number of stack locations // in the register mask regardless of how much slack is created by rounding up. test/hotspot/jtreg/compiler/arguments/TestManyParameters.java line 26: > 24: /** > 25: * @test > 26: * @requires os.arch=="amd64" | os.arch=="x86_64" Suggestion: * @requires os.simpleArch == "x64" test/hotspot/jtreg/compiler/arguments/TestManyParameters.java line 28: > 26: * @requires os.arch=="amd64" | os.arch=="x86_64" > 27: * @bug 8342156 > 28: * @summary Check that C2 restriction on number of method arguments is not too Suggestion: * @summary Check that C2's restriction on number of method arguments is not too test/hotspot/jtreg/compiler/arguments/TestManyParameters.java line 43: > 41: > 42: public static void main(String[] args) { > 43: test(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54); Is this the maximum number of arguments that C2 could handle before the addition of APX? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1810269683 PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1810227860 PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1810229060 PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1810232586 From rrich at openjdk.org Tue Oct 22 09:07:42 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 22 Oct 2024 09:07:42 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: <0-FvHn-Ql9E_u2GhuCgxlkqqQkJaZvfYXf_mhrXiH5k=.9a328eb4-efb1-48f0-a54e-24a336da214a@github.com> On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values, so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Hi Martin, is there documentation on which you base the implementation? My quick search wasn't very successful. How do we know that the printed values are correct? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2428702426 From mdoerr at openjdk.org Tue Oct 22 09:21:19 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 22 Oct 2024 09:21:19 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: <0-FvHn-Ql9E_u2GhuCgxlkqqQkJaZvfYXf_mhrXiH5k=.9a328eb4-efb1-48f0-a54e-24a336da214a@github.com> References: <0-FvHn-Ql9E_u2GhuCgxlkqqQkJaZvfYXf_mhrXiH5k=.9a328eb4-efb1-48f0-a54e-24a336da214a@github.com> Message-ID: On Tue, 22 Oct 2024 09:04:54 GMT, Richard Reingruber wrote: > Hi Martin, is there documentation on which you base the implementation? My quick search wasn't very successful. How do we know that the printed values are correct? You can find the data structures here: https://github.com/bminor/glibc/blob/dcad78507433a9a64b8b548b19e110933f8d939a/sysdeps/x86_64/sys/ucontext.h#L106 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2428735226 From mbaesken at openjdk.org Tue Oct 22 09:21:20 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 22 Oct 2024 09:21:20 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values, so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f MS documents also the CONTEXT structure with various register values, including XMM https://learn.microsoft.com/de-de/windows/win32/api/winnt/ns-winnt-context ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2428740649 From mli at openjdk.org Tue Oct 22 09:23:40 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 22 Oct 2024 09:23:40 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: <-FYW9yWcn9euWjBA9qpWiqVm5NaaNo-ZmJuSKl3wWTo=.4e2515cd-18eb-4754-80ba-782f611ab429@github.com> References: <-FYW9yWcn9euWjBA9qpWiqVm5NaaNo-ZmJuSKl3wWTo=.4e2515cd-18eb-4754-80ba-782f611ab429@github.com> Message-ID: On Mon, 21 Oct 2024 09:53:16 GMT, Fei Gao wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> add missing files > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8207: > >> 8205: for (int op = 0; op < VectorSupport::NUM_VECTOR_OP_MATH; op++) { >> 8206: int vop = VectorSupport::VECTOR_OP_MATH_START + op; >> 8207: if (vop == VectorSupport::VECTOR_OP_TANH) { > > Could you please add a comment that mentions the reason, for example > `// Skip "tanh" because there is performance regression` Sure, will fix it. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21502#discussion_r1810337900 From mli at openjdk.org Tue Oct 22 09:28:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 22 Oct 2024 09:28:36 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: Message-ID: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> > Hi, > Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. > This pr is based on https://github.com/openjdk/jdk/pull/20781. > > Thanks! > > ## Test > ### tests: > * test/jdk/jdk/incubator/vector/ > * test/hotspot/jtreg/compiler/vectorapi/ > > ### options: > * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:+EnableVectorSupport -XX:-UseVectorStubs > > ## Performance > > ### Tests > jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). > > ### Options > * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' > * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' > > ### Performance data > I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: add comment for tanh ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21502/files - new: https://git.openjdk.org/jdk/pull/21502/files/e4b98bfb..c19d1a6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21502&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21502&range=02-03 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21502.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21502/head:pull/21502 PR: https://git.openjdk.org/jdk/pull/21502 From dlunden at openjdk.org Tue Oct 22 09:54:43 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 22 Oct 2024 09:54:43 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v2] In-Reply-To: References: Message-ID: > Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. > > ### Changeset > > - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. > - Add a regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21612/files - new: https://git.openjdk.org/jdk/pull/21612/files/b6ceb7a8..6f3ccc40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21612&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21612&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21612.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21612/head:pull/21612 PR: https://git.openjdk.org/jdk/pull/21612 From dlunden at openjdk.org Tue Oct 22 09:54:43 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 22 Oct 2024 09:54:43 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v2] In-Reply-To: <8UbXTjkMPw6xs9rvGtJukgbq8dOM4q-ogBLXlA7sdhA=.41db68e4-1f59-47e2-b0eb-6caa9a7a7d7b@github.com> References: <8UbXTjkMPw6xs9rvGtJukgbq8dOM4q-ogBLXlA7sdhA=.41db68e4-1f59-47e2-b0eb-6caa9a7a7d7b@github.com> Message-ID: On Tue, 22 Oct 2024 08:51:31 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Roberto Casta?eda Lozano > > src/hotspot/share/adlc/formsopt.cpp line 177: > >> 175: // Add one more word to avoid problematic rounding. Specifically, APX added >> 176: // 16 more registers but did not result in a mask size increase. >> 177: // Round up to the next doubleword size. > > I agree with Dean that the comment should state more clearly what is problematic with the current rounding. From your analysis in the JBS issue, I take it that the problem is that > 1) the number of bits left for stack locations depends on the value of `words_for_regs`, > 2) before the addition of APX registers we were relying on the slack created by the up-rounding to accommodate enough stack locations, and > 3) after adding APX registers this slack has been reduced significantly. > > Is this the case? If so, I suggest to update the comment to something like: > > Suggestion: > > // Round up to the next doubleword size. > // Add one more word to accommodate a reasonable number of stack locations > // in the register mask regardless of how much slack is created by rounding up. I'm fine with this change, although I'd then argue that we could simplify it and just say we add 4 words (instead of 3) for incoming/outgoing arguments. @vnkozlov You suggested the current wording that specifically mentions APX, what do you think? > test/hotspot/jtreg/compiler/arguments/TestManyParameters.java line 43: > >> 41: >> 42: public static void main(String[] args) { >> 43: test(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54); > > Is this the maximum number of arguments that C2 could handle before the addition of APX? Yes, that's my understanding (more details are in the JBS issue). @chhagedorn: Can you confirm? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1810388578 PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1810391955 From aph at openjdk.org Tue Oct 22 10:16:22 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Oct 2024 10:16:22 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v3] In-Reply-To: References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Mon, 21 Oct 2024 20:14:30 GMT, Dean Long wrote: > BTW, apparently Neoverse has 0 latency moves even for 32-bit registers, so they must do something clever with clearing the high bits. So they are. Some dark magic there. :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2428871730 From aph at openjdk.org Tue Oct 22 10:20:12 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Oct 2024 10:20:12 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v4] In-Reply-To: <50saqCLba9Bx2oSGvZdJNsfjc8ZaOtDSMWIDkqP8QSA=.7658d698-3a89-484f-b977-0694eae251ba@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> <50saqCLba9Bx2oSGvZdJNsfjc8ZaOtDSMWIDkqP8QSA=.7658d698-3a89-484f-b977-0694eae251ba@github.com> Message-ID: On Mon, 21 Oct 2024 21:22:57 GMT, Chad Rakoczy wrote: >> [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) >> >> Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change >> >> Ran array copy and tier 1 on aarch64 machine >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 >> ============================== >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 >> jtreg:test/jdk:tier1 2436 2436 0 0 >> jtreg:test/langtools:tier1 4577 4577 0 0 >> jtreg:test/jaxp:tier1 0 0 0 0 >> jtreg:test/lib-test:tier1 34 34 0 0 >> ============================== > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add comment I agree, let's commit this. The advantage of Zero Latency MOVs on some implementations (e.g. Arm Neoverse V2 optimization guide, 4.12) is worth having. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21589#pullrequestreview-2384732639 From lucy at openjdk.org Tue Oct 22 10:25:16 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 22 Oct 2024 10:25:16 GMT Subject: RFR: 8342701: [PPC64] TestOSRLotsOfLocals.java crashes In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:10:02 GMT, Martin Doerr wrote: > Fix for "assert(nbits == 32 || (-(1 << (nbits-1)) <= x && x < (1 << (nbits-1)))) failed: value out of range" in new test "TestOSRLotsOfLocals" (see JBS). Looks good to me. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21613#pullrequestreview-2384741843 From chagedorn at openjdk.org Tue Oct 22 10:59:14 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Oct 2024 10:59:14 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v2] In-Reply-To: References: <8UbXTjkMPw6xs9rvGtJukgbq8dOM4q-ogBLXlA7sdhA=.41db68e4-1f59-47e2-b0eb-6caa9a7a7d7b@github.com> Message-ID: On Tue, 22 Oct 2024 09:50:48 GMT, Daniel Lund?n wrote: >> test/hotspot/jtreg/compiler/arguments/TestManyParameters.java line 43: >> >>> 41: >>> 42: public static void main(String[] args) { >>> 43: test(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54); >> >> Is this the maximum number of arguments that C2 could handle before the addition of APX? > > Yes, that's my understanding (more details are in the JBS issue). @chhagedorn: Can you confirm? Yes, exactly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1810501922 From chagedorn at openjdk.org Tue Oct 22 11:01:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Oct 2024 11:01:30 GMT Subject: RFR: 8342809: C2 hits "assert(is_If()) failed: invalid node class: Con" during IGVN due to unhandled top Message-ID: `RegularPredicate::may_be_predicate_if()` is now called from the `AssertionPredicateWithHalt` class which is invoked during IGVN. We therefore need to be able to handle top as a possible input for an Assertion Predicate success projection. This patch fixes this. Thanks, Christian ------------- Commit messages: - 8342809: C2 hits "assert(is_If()) failed: invalid node class: Con" during IGVN due to unhandled top Changes: https://git.openjdk.org/jdk/pull/21634/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21634&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342809 Stats: 65 lines in 2 files changed: 64 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21634/head:pull/21634 PR: https://git.openjdk.org/jdk/pull/21634 From chagedorn at openjdk.org Tue Oct 22 11:11:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Oct 2024 11:11:17 GMT Subject: RFR: 8342330: C2: "node pinned on loop exit test?" assert failure [v2] In-Reply-To: References: Message-ID: <3Hued5vLt3u-wLKbqal9fLlO2VSAhh69pEAST9e96jc=.2f6900a3-923b-4db5-8fed-00406be2960f@github.com> On Tue, 22 Oct 2024 08:27:10 GMT, Roland Westrelin wrote: >> The assert fires because range check elimination processes a test of >> the shape: >> >> >> if (i * 4 != (x - objectField.intField) - 1)) { >> ... >> } >> >> >> and `(x - objectField.intField) - 1)` has control on the exit >> projection of the pre loop. >> >> This happens because: >> >> - `objectField.intField` depends on the null check of `objectField` >> which is performed in the pre loop. >> >> - `i * scale + (objectField.intField + 1) == x` is transformed into: >> `i * scale == x - (objectField.intField + 1)` >> >> - `(x - objectField.intField) - 1)` only has uses out of the pre loop >> and is sunk out of the loop. It ends up pinned on the the exit >> projection of the pre loop. >> >> >> There is already logic in `PhaseIdealLoop::ctrl_of_use_out_of_loop()` >> to handle similar cases but, here, the difference is that the use >> (`SubI` of 1) for what's being sunk doesn't have control in the main >> loop but between the pre and main loop so that logic doesn't catch >> this case. >> >> There is also a possible bug in that logic: >> >> >> n_loop->_next == get_loop(u_loop->_head->as_CountedLoop()->skip_strip_mined()) >> >> >> assumes the loop that follows the pre loop in the loop tree is the >> main loop which is not guaranteed. >> >> In this particular case, the assert is harmless: RCE can't eliminate >> the condition but it's hard to rule out a similar scenario with a >> condition that RCE could remove. I propose revisiting the condition in >> `PhaseIdealLoop::ctrl_of_use_out_of_loop()` so it skips all uses that >> are dominated by the loop exit of the pre loop. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Tobias Hartmann Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21601#pullrequestreview-2384858522 From roland at openjdk.org Tue Oct 22 11:19:40 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 22 Oct 2024 11:19:40 GMT Subject: RFR: 8342809: C2 hits "assert(is_If()) failed: invalid node class: Con" during IGVN due to unhandled top In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 10:55:24 GMT, Christian Hagedorn wrote: > `RegularPredicate::may_be_predicate_if()` is now called from the `AssertionPredicateWithHalt` class which is invoked during IGVN. We therefore need to be able to handle top as a possible input for an Assertion Predicate success projection. This patch fixes this. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21634#pullrequestreview-2384877561 From roland at openjdk.org Tue Oct 22 11:22:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 22 Oct 2024 11:22:30 GMT Subject: RFR: 8342330: C2: "node pinned on loop exit test?" assert failure [v2] In-Reply-To: <_bUbJzH10zwdRO_35iZth59cm-IbV0RLGKTesX69eV8=.33389e9e-d6b1-4132-a574-a8cabb7f9fb4@github.com> References: <_bUbJzH10zwdRO_35iZth59cm-IbV0RLGKTesX69eV8=.33389e9e-d6b1-4132-a574-a8cabb7f9fb4@github.com> Message-ID: On Tue, 22 Oct 2024 07:39:29 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/loopopts.cpp >> >> Co-authored-by: Tobias Hartmann > > Looks reasonable to me. @TobiHartmann @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/21601#issuecomment-2429006946 From roland at openjdk.org Tue Oct 22 11:22:31 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 22 Oct 2024 11:22:31 GMT Subject: Integrated: 8342330: C2: "node pinned on loop exit test?" assert failure In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 08:40:31 GMT, Roland Westrelin wrote: > The assert fires because range check elimination processes a test of > the shape: > > > if (i * 4 != (x - objectField.intField) - 1)) { > ... > } > > > and `(x - objectField.intField) - 1)` has control on the exit > projection of the pre loop. > > This happens because: > > - `objectField.intField` depends on the null check of `objectField` > which is performed in the pre loop. > > - `i * scale + (objectField.intField + 1) == x` is transformed into: > `i * scale == x - (objectField.intField + 1)` > > - `(x - objectField.intField) - 1)` only has uses out of the pre loop > and is sunk out of the loop. It ends up pinned on the the exit > projection of the pre loop. > > > There is already logic in `PhaseIdealLoop::ctrl_of_use_out_of_loop()` > to handle similar cases but, here, the difference is that the use > (`SubI` of 1) for what's being sunk doesn't have control in the main > loop but between the pre and main loop so that logic doesn't catch > this case. > > There is also a possible bug in that logic: > > > n_loop->_next == get_loop(u_loop->_head->as_CountedLoop()->skip_strip_mined()) > > > assumes the loop that follows the pre loop in the loop tree is the > main loop which is not guaranteed. > > In this particular case, the assert is harmless: RCE can't eliminate > the condition but it's hard to rule out a similar scenario with a > condition that RCE could remove. I propose revisiting the condition in > `PhaseIdealLoop::ctrl_of_use_out_of_loop()` so it skips all uses that > are dominated by the loop exit of the pre loop. This pull request has now been integrated. Changeset: 004aaea7 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/004aaea76db091569aa88eeb6b08db3408f288cd Stats: 86 lines in 2 files changed: 82 ins; 0 del; 4 mod 8342330: C2: "node pinned on loop exit test?" assert failure Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21601 From chagedorn at openjdk.org Tue Oct 22 11:24:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Oct 2024 11:24:21 GMT Subject: RFR: 8342809: C2 hits "assert(is_If()) failed: invalid node class: Con" during IGVN due to unhandled top In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 10:55:24 GMT, Christian Hagedorn wrote: > `RegularPredicate::may_be_predicate_if()` is now called from the `AssertionPredicateWithHalt` class which is invoked during IGVN. We therefore need to be able to handle top as a possible input for an Assertion Predicate success projection. This patch fixes this. > > Thanks, > Christian Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21634#issuecomment-2429009479 From rrich at openjdk.org Tue Oct 22 11:34:28 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 22 Oct 2024 11:34:28 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: <9IiKXnQanLQUv8xp_FSy7tCMA89yhiadFbpBU2lcm3M=.57e6b08b-88dd-414b-868d-fe8ac0b692d3@github.com> On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values, so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Well, with my quick-search I found information like that too. I don't see though, how it helps understand that the printed values are correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2429030736 From rrich at openjdk.org Tue Oct 22 11:52:17 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 22 Oct 2024 11:52:17 GMT Subject: RFR: 8342701: [PPC64] TestOSRLotsOfLocals.java crashes In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:10:02 GMT, Martin Doerr wrote: > Fix for "assert(nbits == 32 || (-(1 << (nbits-1)) <= x && x < (1 << (nbits-1)))) failed: value out of range" in new test "TestOSRLotsOfLocals" (see JBS). Looks good. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21613#pullrequestreview-2384953294 From roland at openjdk.org Tue Oct 22 11:53:33 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 22 Oct 2024 11:53:33 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v2] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into JDK-8342692 - more - more - more - more - more - fix & test ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=01 Stats: 1165 lines in 18 files changed: 1103 ins; 16 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From epeter at openjdk.org Tue Oct 22 11:58:59 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Oct 2024 11:58:59 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java Message-ID: I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. Reasons: - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). - Strengthening the rules. - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. Ah. And about runtime of the test. On my machine I get this (in ms): Generate: 27 Compile: 5845 Run: 23435 Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. ------------- Commit messages: - more comment - simplify further by removing explicit CPU/Platform vector width - remove 2 useless runs - whitespace - aliasing modes - further cosmetics and comments - add more cases - simplify code - cosmetics - cleanup - ... and 19 more: https://git.openjdk.org/jdk/compare/ebc17c7c...cd483d52 Changes: https://git.openjdk.org/jdk/pull/21541/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21541&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342387 Stats: 15511 lines in 1 file changed: 31 ins; 15085 del; 395 mod Patch: https://git.openjdk.org/jdk/pull/21541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21541/head:pull/21541 PR: https://git.openjdk.org/jdk/pull/21541 From roland at openjdk.org Tue Oct 22 12:02:16 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 22 Oct 2024 12:02:16 GMT Subject: RFR: 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode In-Reply-To: References: <7v5I_lOLC9pFaISHTB9K2pgJEQcjMtfiCX8HBjmm-ug=.d050c9e9-b1d7-47e3-a132-da417504735e@github.com> Message-ID: On Tue, 22 Oct 2024 08:42:12 GMT, Christian Hagedorn wrote: > The `OpaqueTemplateAssertionPredicate` is only used for a Template Assertion Predicate from which we create new Initialized Assertion Predicates from when splitting a loop during loop opts, for example for Loop Peeling. When we no longer split loops, we do not need to create new Initialized Assertion Predicates anymore. So, we replace the `OpaqueTemplateAssertionPredicate` nodes with true to let the Template Assertion Predicates be folded away in the post loop opts IGVN round. However, the `OpaqueInitializedAssertionPredicate` nodes, which accompany the Initialized Assertion Predicates and make sure control is also folded away, will be kept and are only removed during macro node expansion (like the `Opaque4` nodes before). That makes sense. Thanks for the explanation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21608#issuecomment-2429087421 From mdoerr at openjdk.org Tue Oct 22 12:34:26 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 22 Oct 2024 12:34:26 GMT Subject: RFR: 8342701: [PPC64] TestOSRLotsOfLocals.java crashes In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:10:02 GMT, Martin Doerr wrote: > Fix for "assert(nbits == 32 || (-(1 << (nbits-1)) <= x && x < (1 << (nbits-1)))) failed: value out of range" in new test "TestOSRLotsOfLocals" (see JBS). Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21613#issuecomment-2429154901 From dlunden at openjdk.org Tue Oct 22 12:48:14 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 22 Oct 2024 12:48:14 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v2] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 09:54:43 GMT, Daniel Lund?n wrote: >> Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. >> >> ### Changeset >> >> - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. >> - Add a regression test. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano Adding @chhagedorn as a contributor due to his work on finding the issue and creating a first version of the regression test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21612#issuecomment-2429187621 From epeter at openjdk.org Tue Oct 22 13:08:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Oct 2024 13:08:49 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v2] In-Reply-To: References: Message-ID: > I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. > > Reasons: > - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. > - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. > - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: > - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). > - Strengthening the rules. > - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. > > I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. > > Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. > > Ah. And about runtime of the test. On my machine I get this (in ms): > > Generate: 27 > Compile: 5845 > Run: 23435 > > Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add @compile for IR Framework ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21541/files - new: https://git.openjdk.org/jdk/pull/21541/files/cd483d52..460176c0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21541&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21541&range=00-01 Stats: 34 lines in 1 file changed: 34 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21541/head:pull/21541 PR: https://git.openjdk.org/jdk/pull/21541 From mdoerr at openjdk.org Tue Oct 22 13:19:22 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 22 Oct 2024 13:19:22 GMT Subject: Integrated: 8342701: [PPC64] TestOSRLotsOfLocals.java crashes In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:10:02 GMT, Martin Doerr wrote: > Fix for "assert(nbits == 32 || (-(1 << (nbits-1)) <= x && x < (1 << (nbits-1)))) failed: value out of range" in new test "TestOSRLotsOfLocals" (see JBS). This pull request has now been integrated. Changeset: 3bba0f3d Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/3bba0f3dc8faf83a3aadcd704ae2ae4967e6daa4 Stats: 19 lines in 1 file changed: 16 ins; 0 del; 3 mod 8342701: [PPC64] TestOSRLotsOfLocals.java crashes Reviewed-by: lucy, rrich ------------- PR: https://git.openjdk.org/jdk/pull/21613 From mbaesken at openjdk.org Tue Oct 22 14:09:22 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 22 Oct 2024 14:09:22 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values, so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21615#pullrequestreview-2385328829 From thartmann at openjdk.org Tue Oct 22 14:19:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 22 Oct 2024 14:19:21 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v2] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 13:08:49 GMT, Emanuel Peter wrote: >> I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. >> >> Reasons: >> - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. >> - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. >> - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: >> - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). >> - Strengthening the rules. >> - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. >> >> I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. >> >> Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. >> >> Ah. And about runtime of the test. On my machine I get this (in ms): >> >> Generate: 27 >> Compile: 5845 >> Run: 23435 >> >> Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add @compile for IR Framework This is hard to review due to the scattered view in the diff. As you suggested, I had a look at the new file and it's nice how compact and simple the test now is. The changes look good to me. > I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. Yes, I agree. The new version is much more maintainable and there is no risk anymore that the test generation script and the test start to diverge. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21541#pullrequestreview-2385358021 From rcastanedalo at openjdk.org Tue Oct 22 14:28:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Oct 2024 14:28:18 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v2] In-Reply-To: References: <8UbXTjkMPw6xs9rvGtJukgbq8dOM4q-ogBLXlA7sdhA=.41db68e4-1f59-47e2-b0eb-6caa9a7a7d7b@github.com> Message-ID: On Tue, 22 Oct 2024 10:56:49 GMT, Christian Hagedorn wrote: >> Yes, that's my understanding (more details are in the JBS issue). @chhagedorn: Can you confirm? > > Yes, exactly. OK, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1810833012 From rcastanedalo at openjdk.org Tue Oct 22 14:28:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Oct 2024 14:28:17 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v2] In-Reply-To: References: <8UbXTjkMPw6xs9rvGtJukgbq8dOM4q-ogBLXlA7sdhA=.41db68e4-1f59-47e2-b0eb-6caa9a7a7d7b@github.com> Message-ID: On Tue, 22 Oct 2024 09:48:42 GMT, Daniel Lund?n wrote: > we could simplify it and just say we add 4 words (instead of 3) for incoming/outgoing arguments That would work for me too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1810835945 From fbredberg at openjdk.org Tue Oct 22 14:41:28 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 22 Oct 2024 14:41:28 GMT Subject: RFR: 8342683: Use non-short forward jump when passing stop() Message-ID: Fixed a "short forward jump exceeds 8-bit offset at" error in `fast_unlock_lightweight()` that appears on x86 based platforms when using `-XX:+ShowMessageBoxOnError`. Tested ok in tier1-3 on all x86 based platforms. ------------- Commit messages: - 8342683: Use non-short forward jump when passing stop() Changes: https://git.openjdk.org/jdk/pull/21635/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21635&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342683 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21635.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21635/head:pull/21635 PR: https://git.openjdk.org/jdk/pull/21635 From aboldtch at openjdk.org Tue Oct 22 14:46:30 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 22 Oct 2024 14:46:30 GMT Subject: RFR: 8342683: Use non-short forward jump when passing stop() In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 11:57:58 GMT, Fredrik Bredberg wrote: > Fixed a "short forward jump exceeds 8-bit offset at" error in `fast_unlock_lightweight()` that appears on x86 based platforms when using `-XX:+ShowMessageBoxOnError`. > > Tested ok in tier1-3 on all x86 based platforms. lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21635#pullrequestreview-2385440146 From shade at openjdk.org Tue Oct 22 15:05:14 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 22 Oct 2024 15:05:14 GMT Subject: RFR: 8342683: Use non-short forward jump when passing stop() In-Reply-To: References: Message-ID: <3i7Yksyj2hLn4R3QcTREJCtV2zdarOtw3-8uQJzMzNw=.4f6793af-871a-4b40-95cb-54a245365115@github.com> On Tue, 22 Oct 2024 11:57:58 GMT, Fredrik Bredberg wrote: > Fixed a "short forward jump exceeds 8-bit offset at" error in `fast_unlock_lightweight()` that appears on x86 based platforms when using `-XX:+ShowMessageBoxOnError`. > > Tested ok in tier1-3 on all x86 based platforms. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21635#pullrequestreview-2385499761 From thartmann at openjdk.org Tue Oct 22 15:11:15 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 22 Oct 2024 15:11:15 GMT Subject: RFR: 8342809: C2 hits "assert(is_If()) failed: invalid node class: Con" during IGVN due to unhandled top In-Reply-To: References: Message-ID: <_ihUM8p6JXpyf0JGpfabVY4OgCCfUT-JYcLP3XTvv5Y=.693646d5-95d9-4b4e-9ae2-a7b7ee2b1572@github.com> On Tue, 22 Oct 2024 10:55:24 GMT, Christian Hagedorn wrote: > `RegularPredicate::may_be_predicate_if()` is now called from the `AssertionPredicateWithHalt` class which is invoked during IGVN. We therefore need to be able to handle top as a possible input for an Assertion Predicate success projection. This patch fixes this. > > Thanks, > Christian Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21634#pullrequestreview-2385515427 From duke at openjdk.org Tue Oct 22 15:48:15 2024 From: duke at openjdk.org (duke) Date: Tue, 22 Oct 2024 15:48:15 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v4] In-Reply-To: <50saqCLba9Bx2oSGvZdJNsfjc8ZaOtDSMWIDkqP8QSA=.7658d698-3a89-484f-b977-0694eae251ba@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> <50saqCLba9Bx2oSGvZdJNsfjc8ZaOtDSMWIDkqP8QSA=.7658d698-3a89-484f-b977-0694eae251ba@github.com> Message-ID: <5ZausX6hpf_MIeHFFV_afrqOZsym-SHjIVR2RudQJXY=.668a20f8-a23a-4f6e-801f-c1dba6d67060@github.com> On Mon, 21 Oct 2024 21:22:57 GMT, Chad Rakoczy wrote: >> [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) >> >> Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change >> >> Ran array copy and tier 1 on aarch64 machine >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 >> ============================== >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 >> jtreg:test/jdk:tier1 2436 2436 0 0 >> jtreg:test/langtools:tier1 4577 4577 0 0 >> jtreg:test/jaxp:tier1 0 0 0 0 >> jtreg:test/lib-test:tier1 34 34 0 0 >> ============================== > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add comment @chadrako Your change (at version 707fed3680e611e910061ab77671ee75ef3ef594) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2429642323 From psandoz at openjdk.org Tue Oct 22 15:59:38 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 22 Oct 2024 15:59:38 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:25:37 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Factor out IR tests and Transforms to follow-up PRs. Marked as reviewed by psandoz (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2385653877 From psandoz at openjdk.org Tue Oct 22 15:59:38 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 22 Oct 2024 15:59:38 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v30] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:25:46 GMT, Jatin Bhateja wrote: > Can you kindly run this though your test infrastructure and approve if it goes fine ? > Internal tier 1 to 3 testing passed (i needed to merge with master at 7133d1b983d, due to some updates to unrelated test configuration files the test infrastructure expects). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2429666927 From duke at openjdk.org Tue Oct 22 16:14:19 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 22 Oct 2024 16:14:19 GMT Subject: Integrated: 8342601: AArch64: Micro-optimize bit shift in copy_memory In-Reply-To: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: On Fri, 18 Oct 2024 18:35:02 GMT, Chad Rakoczy wrote: > [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) > > Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change > > Ran array copy and tier 1 on aarch64 machine > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 > ============================== > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 > jtreg:test/jdk:tier1 2436 2436 0 0 > jtreg:test/langtools:tier1 4577 4577 0 0 > jtreg:test/jaxp:tier1 0 0 0 0 > jtreg:test/lib-test:tier1 34 34 0 0 > ============================== This pull request has now been integrated. Changeset: 893266c4 Author: Chad Rakoczy URL: https://git.openjdk.org/jdk/commit/893266c48f26e089d0449d2c161b04430741970c Stats: 12 lines in 1 file changed: 8 ins; 0 del; 4 mod 8342601: AArch64: Micro-optimize bit shift in copy_memory Reviewed-by: dlong, aph, shade ------------- PR: https://git.openjdk.org/jdk/pull/21589 From kvn at openjdk.org Tue Oct 22 16:52:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 22 Oct 2024 16:52:15 GMT Subject: RFR: 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:46:16 GMT, Christian Hagedorn wrote: > ### Two Uses of `Opaque4` > The `Opaque4` node is currently used for two things: > 1. Non-null checks of intrinsics to accompany a cast to non-null where we implicitly know that an object is non-null but C2 does not. This is required to fold control away if the cast becomes top at some point. > 2. Template Assertion Predicates > > ### How to Differentiate between Uses > The only reliable way to differentiate between the two uses is to check if there are `OpaqueLoop*Nodes` above the `Opaque4` node which should only be found for a Template Assertion Predicate. This can be done with the method `assertion_predicate_has_loop_opaque_node()` which follows the inputs of the `Opaque4` recursively. > > ### Problems by Sharing `Opaque4` Nodes for Two Concepts > This sharing of the `Opaque4` comes with some problems: > - One need to be careful when checking for `Opaque4` nodes if the code can be applied for non-null checks and/or Template Assertion Predicates. Ideally, we should add assertions to exclude the unexpected case (mostly done). > - Walking the graph with `assertion_predicate_has_loop_opaque_node()` has some overhead, especially when checking the negation, we could walk quite some nodes in the worst case. > - There is no real benefit of (re)using `Opaque4` nodes for two concepts and could easily be separated. > > ### Split `Opaque4` into Two Classes to Separate Uses > Therefore, this patch proposes to split `Opaque4` into a new `OpaqueNonNull` and a `OpaqueTemplateAssertionPredicate` class (the latter accompanies the already split off `OpaqueInitializedAssertionPredicate` class). I went through all the uses and updated them accordingly. > > As a consequence, I could turn `assertion_predicate_has_loop_opaque_node()` into a debug only method since it's now only used in assertion code. I additionally removed the second input of the `Opaque4` nodes since these nodes have always eventually been replaced by true. > > ### Turning `OpaqueTemplateAssertionPredicate` into a Non-Macro Node > The `Opaque4` was a macro node. The `OpaqueNotNull` still is but the `OpaqueTemplateAssertionPredicate` is turned into a non-macro node and is removed after loop opts. At that point, loops can no longer be split and we do no need to create Initialized Assertion Predicates from the templates anymore. Thus, they can be cleaned up in the post loop opts IGVN. > > ### Pre-Requisite for Replacing Uncommon Traps with Halt Nodes for Template Assertion Predicates > Eventually, I want to get rid of UCTs for Template Assertion Pre... Seems reasonable. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21608#pullrequestreview-2385740291 From kvn at openjdk.org Tue Oct 22 17:32:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 22 Oct 2024 17:32:06 GMT Subject: RFR: 8342683: Use non-short forward jump when passing stop() In-Reply-To: References: Message-ID: <2HllX8sCPu2j89D83YTn9lHabh9FaMX-dBTx1oDf7ek=.fa7e1fa7-986d-43f2-bc0d-c389585c3373@github.com> On Tue, 22 Oct 2024 11:57:58 GMT, Fredrik Bredberg wrote: > Fixed a "short forward jump exceeds 8-bit offset at" error in `fast_unlock_lightweight()` that appears on x86 based platforms when using `-XX:+ShowMessageBoxOnError`. > > Tested ok in tier1-3 on all x86 based platforms. Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21635#pullrequestreview-2385867042 From duke at openjdk.org Tue Oct 22 18:32:14 2024 From: duke at openjdk.org (Sorna Sarathi N) Date: Tue, 22 Oct 2024 18:32:14 GMT Subject: RFR: 8340445: [PPC64] Wrong ConditionRegister used in ppc64.ad: flagsRegCR0 cr1 Message-ID: This PR changes array_equalsB and array_equalsC to use flagsRegCR1 instead of flagsRegCR0 for KILL effects. This change enhances clarity while maintaining current functionality. Build(release debug level) and tier1 testing are successful JBS Issue: [JDK-8340445](https://bugs.openjdk.org/browse/JDK-8340445) ------------- Commit messages: - JDK-8340445 : [PPC64] Wrong ConditionRegister used in ppc64.ad: flagsRegCR0 cr1 Changes: https://git.openjdk.org/jdk/pull/21353/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21353&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340445 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21353/head:pull/21353 PR: https://git.openjdk.org/jdk/pull/21353 From mdoerr at openjdk.org Tue Oct 22 18:32:14 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 22 Oct 2024 18:32:14 GMT Subject: RFR: 8340445: [PPC64] Wrong ConditionRegister used in ppc64.ad: flagsRegCR0 cr1 In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 10:26:07 GMT, Sorna Sarathi N wrote: > This PR changes array_equalsB and array_equalsC to use flagsRegCR1 instead of flagsRegCR0 for KILL effects. This change enhances clarity while maintaining current functionality. > > Build(release debug level) and tier1 testing are successful > > JBS Issue: [JDK-8340445](https://bugs.openjdk.org/browse/JDK-8340445) LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21353#pullrequestreview-2354530226 From dnsimon at openjdk.org Tue Oct 22 19:31:21 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 22 Oct 2024 19:31:21 GMT Subject: RFR: 8342854: [JVMCI] Block secondary thread reporting a JVMCI fatal error Message-ID: A fatal crash on a second thread causes the thread to [sleep infinitely](https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/share/utilities/vmError.cpp#L1709-L1718) while error reporting continues on the first crashing thread. The same should be done for reporting fatal crashes in libjvmci to avoid interleaving reports. This PR implements this change. ------------- Commit messages: - [JVMCI] Block secondary thread reporting a JVMCI fatal error Changes: https://git.openjdk.org/jdk/pull/21646/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21646&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342854 Stats: 12 lines in 2 files changed: 0 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21646/head:pull/21646 PR: https://git.openjdk.org/jdk/pull/21646 From mdoerr at openjdk.org Tue Oct 22 19:46:04 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 22 Oct 2024 19:46:04 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: <9IiKXnQanLQUv8xp_FSy7tCMA89yhiadFbpBU2lcm3M=.57e6b08b-88dd-414b-868d-fe8ac0b692d3@github.com> References: <9IiKXnQanLQUv8xp_FSy7tCMA89yhiadFbpBU2lcm3M=.57e6b08b-88dd-414b-868d-fe8ac0b692d3@github.com> Message-ID: On Tue, 22 Oct 2024 11:31:15 GMT, Richard Reingruber wrote: > Well, with my quick-search I found information like that too. I don't see though, how it helps understand that the printed values are correct. I think the only way to verify that the printed values are correct is testing it. I'm currently experimenting with setting up XMM values and then crashing intentionally. I can see that my values get printed, but on linux, XMM[0] seems to be always 0 and the other registers off by one. Strange. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2430107093 From aph at openjdk.org Tue Oct 22 20:24:13 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Oct 2024 20:24:13 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v3] In-Reply-To: References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> Message-ID: <5EVoUJrd9rDg-LEsStcCd5PPbnUJUpoY-26uEANiJzs=.53dee144-a737-4eaf-8d23-79a859af0de9@github.com> On Tue, 22 Oct 2024 10:12:39 GMT, Andrew Haley wrote: > > BTW, apparently Neoverse has 0 latency moves even for 32-bit registers, so they must do something clever with clearing the high bits. > > So they are. Some dark magic there. :-) Oh, and I just realized: this means that clearing the upper bits of a compressed Klass pointer can be done with a zero-latency MOV as well, rather than an AND. That's an optimization worth having. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2430183263 From duke at openjdk.org Tue Oct 22 22:32:14 2024 From: duke at openjdk.org (duke) Date: Tue, 22 Oct 2024 22:32:14 GMT Subject: Withdrawn: 8321003: RISC-V: C2 MulReductionVI In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 09:48:11 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to implement MulReductionVI/MulReductionVL/MulReductionVF/MulReductionVD? > On riscv, there is no straightforward instructions to do it, but we can do it with a reduction tree, which could reduce the time complexity to lg(N). > Thanks > > ## Performance > TBD This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19015 From thartmann at openjdk.org Wed Oct 23 05:42:08 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Oct 2024 05:42:08 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" In-Reply-To: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> Message-ID: On Mon, 21 Oct 2024 20:27:10 GMT, Cesar Soares Lucas wrote: > Please, review this patch to fix an issue that may occur when serializing debug information related to reduce allocation merges. The problem happens when there are more than one JVMS in a `uncommon_trap` and a _younger_ JVMS doesn't have the RAM inputs as a local/expression/monitor but an older JVMS does. In that situation the loop at line 1173 of output.cpp will set the `is_root` property of the ObjectValue to `false` when processing the younger JVMS even though it may have been set to `true` when visiting the older JVMS. > > Tested on: > - Win, Mac & Linux tier1-4 on x64 & Aarch64. > - CTW with some thousands of jars. I executed some extended testing. All green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21624#issuecomment-2430956271 From chagedorn at openjdk.org Wed Oct 23 06:00:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Oct 2024 06:00:15 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v2] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 13:08:49 GMT, Emanuel Peter wrote: >> I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. >> >> Reasons: >> - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. >> - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. >> - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: >> - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). >> - Strengthening the rules. >> - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. >> >> I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. >> >> Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. >> >> Ah. And about runtime of the test. On my machine I get this (in ms): >> >> Generate: 27 >> Compile: 5845 >> Run: 23435 >> >> Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add @compile for IR Framework That's a very nice application and proof of concept for the Compile Framework! It's quite concise and easier to understand what's being tested. Only some minor, mostly style comments, otherwise, looks good to me, too! test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 29: > 27: * @summary Test SuperWord: vector size, offsets, dependencies, alignment. > 28: * @library /test/lib / > 29: * @compile ../../lib/ir_framework/TestFramework.java Since this is quite a significant test change, do you wanna the this JBS number as well? test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 477: > 475: Arrays.stream(types).map(type -> type.generateInit()).collect(Collectors.joining("\n")), > 476: Arrays.stream(types).map(type -> type.generateVerify()).collect(Collectors.joining("\n")), > 477: getTests().stream().map(test -> test.generate()).collect(Collectors.joining("\n"))); Since you don't need the lambda parameter, you can directly use a method reference: Suggestion: Arrays.stream(types).map(Type::generateInit).collect(Collectors.joining("\n")), Arrays.stream(types).map(Type::generateVerify).collect(Collectors.joining("\n")), getTests().stream().map(TestDefinition::generate).collect(Collectors.joining("\n"))); test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 480: > 478: } > 479: > 480: public static void main(String args[]) { While at it: Suggestion: public static void main(String[] args) { test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 575: > 573: } > 574: > 575: static Type[] types = new Type[] { Can be made final, maybe also put in capital letters. Suggestion: static final Type[] TYPES = new Type[] { test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 610: > 608: Set set = Arrays.stream(always).boxed().collect(Collectors.toSet()); > 609: > 610: // Sample some random values on an exponental scale Suggestion: // Sample some random values on an exponential scale test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 619: > 617: > 618: List offsets = new ArrayList(set); > 619: return offsets; Can be simplified to: Suggestion: return new ArrayList<>(set); test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 630: > 628: String generate() { > 629: int start = offset >= 0 ? 0 : -offset; > 630: String end = offset >=0 ? "SIZE - " + offset : "SIZE"; Suggestion: String end = offset >= 0 ? "SIZE - " + offset : "SIZE"; test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 630: > 628: String generate() { > 629: int start = offset >= 0 ? 0 : -offset; > 630: String end = offset >=0 ? "SIZE - " + offset : "SIZE"; Suggestion: String end = offset >= 0 ? "SIZE - " + offset : "SIZE"; test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 635: > 633: String secondArgument; > 634: String loadFrom; > 635: switch(RANDOM.nextInt(3)) { Suggestion: switch (RANDOM.nextInt(3)) { test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 736: > 734: IRRule r1 = new IRRule(type, type.irNode); > 735: r1.addApplyIf("\"AlignVector\", \"false\""); > 736: r1.addApplyIf("\"MaxVectorSize\", \">=" + minVectorWidth + "\""); Maybe at some point, the Compile Framework could offer such an IR rule builder class. But might be something for a future RFE. test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 790: > 788: > 789: static List getTests() { > 790: List tests = new ArrayList(); Suggestion: List tests = new ArrayList<>(); test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 847: > 845: builder.append(applyIf.size() > 1 ? "And" : ""); > 846: builder.append(" = {"); > 847: builder.append(applyIf.stream().collect(Collectors.joining(", "))); Can be replaced with: Suggestion: builder.append(String.join(", ", applyIf)); ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21541#pullrequestreview-2387165420 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811866151 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811870865 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811873196 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811883958 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811885149 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811888095 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811890440 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811890666 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811890836 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811919124 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811924861 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811921197 From epeter at openjdk.org Wed Oct 23 06:06:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 06:06:11 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v2] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 05:50:21 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add @compile for IR Framework > > test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 736: > >> 734: IRRule r1 = new IRRule(type, type.irNode); >> 735: r1.addApplyIf("\"AlignVector\", \"false\""); >> 736: r1.addApplyIf("\"MaxVectorSize\", \">=" + minVectorWidth + "\""); > > Maybe at some point, the Compile Framework could offer such an IR rule builder class. But might be something for a future RFE. Nice idea. Probably that would rather be part of the (future) Template Framework? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1811931534 From epeter at openjdk.org Wed Oct 23 06:31:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 06:31:11 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v3] In-Reply-To: References: Message-ID: > I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. > > Reasons: > - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. > - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. > - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: > - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). > - Strengthening the rules. > - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. > > I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. > > Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. > > Ah. And about runtime of the test. On my machine I get this (in ms): > > Generate: 27 > Compile: 5845 > Run: 23435 > > Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Suggestions by Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21541/files - new: https://git.openjdk.org/jdk/pull/21541/files/460176c0..46b95030 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21541&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21541&range=01-02 Stats: 47 lines in 1 file changed: 0 ins; 1 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/21541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21541/head:pull/21541 PR: https://git.openjdk.org/jdk/pull/21541 From chagedorn at openjdk.org Wed Oct 23 06:32:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Oct 2024 06:32:05 GMT Subject: RFR: 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:46:16 GMT, Christian Hagedorn wrote: > ### Two Uses of `Opaque4` > The `Opaque4` node is currently used for two things: > 1. Non-null checks of intrinsics to accompany a cast to non-null where we implicitly know that an object is non-null but C2 does not. This is required to fold control away if the cast becomes top at some point. > 2. Template Assertion Predicates > > ### How to Differentiate between Uses > The only reliable way to differentiate between the two uses is to check if there are `OpaqueLoop*Nodes` above the `Opaque4` node which should only be found for a Template Assertion Predicate. This can be done with the method `assertion_predicate_has_loop_opaque_node()` which follows the inputs of the `Opaque4` recursively. > > ### Problems by Sharing `Opaque4` Nodes for Two Concepts > This sharing of the `Opaque4` comes with some problems: > - One need to be careful when checking for `Opaque4` nodes if the code can be applied for non-null checks and/or Template Assertion Predicates. Ideally, we should add assertions to exclude the unexpected case (mostly done). > - Walking the graph with `assertion_predicate_has_loop_opaque_node()` has some overhead, especially when checking the negation, we could walk quite some nodes in the worst case. > - There is no real benefit of (re)using `Opaque4` nodes for two concepts and could easily be separated. > > ### Split `Opaque4` into Two Classes to Separate Uses > Therefore, this patch proposes to split `Opaque4` into a new `OpaqueNonNull` and a `OpaqueTemplateAssertionPredicate` class (the latter accompanies the already split off `OpaqueInitializedAssertionPredicate` class). I went through all the uses and updated them accordingly. > > As a consequence, I could turn `assertion_predicate_has_loop_opaque_node()` into a debug only method since it's now only used in assertion code. I additionally removed the second input of the `Opaque4` nodes since these nodes have always eventually been replaced by true. > > ### Turning `OpaqueTemplateAssertionPredicate` into a Non-Macro Node > The `Opaque4` was a macro node. The `OpaqueNotNull` still is but the `OpaqueTemplateAssertionPredicate` is turned into a non-macro node and is removed after loop opts. At that point, loops can no longer be split and we do no need to create Initialized Assertion Predicates from the templates anymore. Thus, they can be cleaned up in the post loop opts IGVN. > > ### Pre-Requisite for Replacing Uncommon Traps with Halt Nodes for Template Assertion Predicates > Eventually, I want to get rid of UCTs for Template Assertion Pre... Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21608#issuecomment-2431032253 From chagedorn at openjdk.org Wed Oct 23 06:42:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Oct 2024 06:42:12 GMT Subject: RFR: 8342809: C2 hits "assert(is_If()) failed: invalid node class: Con" during IGVN due to unhandled top In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 10:55:24 GMT, Christian Hagedorn wrote: > `RegularPredicate::may_be_predicate_if()` is now called from the `AssertionPredicateWithHalt` class which is invoked during IGVN. We therefore need to be able to handle top as a possible input for an Assertion Predicate success projection. This patch fixes this. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21634#issuecomment-2431047333 From chagedorn at openjdk.org Wed Oct 23 06:42:13 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Oct 2024 06:42:13 GMT Subject: Integrated: 8342809: C2 hits "assert(is_If()) failed: invalid node class: Con" during IGVN due to unhandled top In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 10:55:24 GMT, Christian Hagedorn wrote: > `RegularPredicate::may_be_predicate_if()` is now called from the `AssertionPredicateWithHalt` class which is invoked during IGVN. We therefore need to be able to handle top as a possible input for an Assertion Predicate success projection. This patch fixes this. > > Thanks, > Christian This pull request has now been integrated. Changeset: 018db8c1 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/018db8c16a826b4b6b5eec76163616a07289b55a Stats: 65 lines in 2 files changed: 64 ins; 0 del; 1 mod 8342809: C2 hits "assert(is_If()) failed: invalid node class: Con" during IGVN due to unhandled top Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21634 From chagedorn at openjdk.org Wed Oct 23 07:17:06 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Oct 2024 07:17:06 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v4] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 02:39:00 GMT, Dhamoder Nalla wrote: >> In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. >> >> When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. >> >> The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > change CRLF to LF src/hotspot/share/opto/macro.cpp line 821: > 819: // If scalarize operation is adding too many nodes, bail out > 820: if (C->check_node_count(300, "out of nodes while scalarizing object")) { > 821: return nullptr; Would a bailout from this scalarization be enough or do we really require to record the method as non-compilable (which is done with `check_node_count()`? In the latter case, we could also try something like "recompilation without EA" as done, for example, here (i.e. `retry_no_escape_analysis`): https://github.com/openjdk/jdk/blob/37cfaa8deb4cc15864bb6dc2c8a87fc97cff2f0d/src/hotspot/share/opto/escape.cpp#L3858-L3866 I also suggest to use the `NodeLimitFudgeFactor` instead of `300` to have it controllable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1812029925 From rrich at openjdk.org Wed Oct 23 07:23:04 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 23 Oct 2024 07:23:04 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Thanks Martin for doing this. I actually wanted to attempt writing a test myself before I read your message. > I can see that my values get printed, but on linux, XMM[0] seems to be always 0 and the other registers off by one. Strange. Strange indeed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2431125556 From roland at openjdk.org Wed Oct 23 07:40:05 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 23 Oct 2024 07:40:05 GMT Subject: RFR: 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode In-Reply-To: References: Message-ID: <1zhceQCC9hvWHWxX8MmEaB4JRddnhTvzLxlwtief4NU=.0fa25943-2272-4f43-b0a9-f601d5788926@github.com> On Mon, 21 Oct 2024 12:46:16 GMT, Christian Hagedorn wrote: > ### Two Uses of `Opaque4` > The `Opaque4` node is currently used for two things: > 1. Non-null checks of intrinsics to accompany a cast to non-null where we implicitly know that an object is non-null but C2 does not. This is required to fold control away if the cast becomes top at some point. > 2. Template Assertion Predicates > > ### How to Differentiate between Uses > The only reliable way to differentiate between the two uses is to check if there are `OpaqueLoop*Nodes` above the `Opaque4` node which should only be found for a Template Assertion Predicate. This can be done with the method `assertion_predicate_has_loop_opaque_node()` which follows the inputs of the `Opaque4` recursively. > > ### Problems by Sharing `Opaque4` Nodes for Two Concepts > This sharing of the `Opaque4` comes with some problems: > - One need to be careful when checking for `Opaque4` nodes if the code can be applied for non-null checks and/or Template Assertion Predicates. Ideally, we should add assertions to exclude the unexpected case (mostly done). > - Walking the graph with `assertion_predicate_has_loop_opaque_node()` has some overhead, especially when checking the negation, we could walk quite some nodes in the worst case. > - There is no real benefit of (re)using `Opaque4` nodes for two concepts and could easily be separated. > > ### Split `Opaque4` into Two Classes to Separate Uses > Therefore, this patch proposes to split `Opaque4` into a new `OpaqueNonNull` and a `OpaqueTemplateAssertionPredicate` class (the latter accompanies the already split off `OpaqueInitializedAssertionPredicate` class). I went through all the uses and updated them accordingly. > > As a consequence, I could turn `assertion_predicate_has_loop_opaque_node()` into a debug only method since it's now only used in assertion code. I additionally removed the second input of the `Opaque4` nodes since these nodes have always eventually been replaced by true. > > ### Turning `OpaqueTemplateAssertionPredicate` into a Non-Macro Node > The `Opaque4` was a macro node. The `OpaqueNotNull` still is but the `OpaqueTemplateAssertionPredicate` is turned into a non-macro node and is removed after loop opts. At that point, loops can no longer be split and we do no need to create Initialized Assertion Predicates from the templates anymore. Thus, they can be cleaned up in the post loop opts IGVN. > > ### Pre-Requisite for Replacing Uncommon Traps with Halt Nodes for Template Assertion Predicates > Eventually, I want to get rid of UCTs for Template Assertion Pre... src/hotspot/share/opto/loopTransform.cpp line 1932: > 1930: // - For the last access a[init+new_stride-orig_stride] (with the new unroll stride) > 1931: prev_proj = create_initialized_assertion_predicate(iff, init, max_value, prev_proj); > 1932: } else { Why is it safe to remove this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21608#discussion_r1812067550 From chagedorn at openjdk.org Wed Oct 23 07:48:06 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Oct 2024 07:48:06 GMT Subject: RFR: 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode In-Reply-To: <1zhceQCC9hvWHWxX8MmEaB4JRddnhTvzLxlwtief4NU=.0fa25943-2272-4f43-b0a9-f601d5788926@github.com> References: <1zhceQCC9hvWHWxX8MmEaB4JRddnhTvzLxlwtief4NU=.0fa25943-2272-4f43-b0a9-f601d5788926@github.com> Message-ID: On Wed, 23 Oct 2024 07:34:59 GMT, Roland Westrelin wrote: >> ### Two Uses of `Opaque4` >> The `Opaque4` node is currently used for two things: >> 1. Non-null checks of intrinsics to accompany a cast to non-null where we implicitly know that an object is non-null but C2 does not. This is required to fold control away if the cast becomes top at some point. >> 2. Template Assertion Predicates >> >> ### How to Differentiate between Uses >> The only reliable way to differentiate between the two uses is to check if there are `OpaqueLoop*Nodes` above the `Opaque4` node which should only be found for a Template Assertion Predicate. This can be done with the method `assertion_predicate_has_loop_opaque_node()` which follows the inputs of the `Opaque4` recursively. >> >> ### Problems by Sharing `Opaque4` Nodes for Two Concepts >> This sharing of the `Opaque4` comes with some problems: >> - One need to be careful when checking for `Opaque4` nodes if the code can be applied for non-null checks and/or Template Assertion Predicates. Ideally, we should add assertions to exclude the unexpected case (mostly done). >> - Walking the graph with `assertion_predicate_has_loop_opaque_node()` has some overhead, especially when checking the negation, we could walk quite some nodes in the worst case. >> - There is no real benefit of (re)using `Opaque4` nodes for two concepts and could easily be separated. >> >> ### Split `Opaque4` into Two Classes to Separate Uses >> Therefore, this patch proposes to split `Opaque4` into a new `OpaqueNonNull` and a `OpaqueTemplateAssertionPredicate` class (the latter accompanies the already split off `OpaqueInitializedAssertionPredicate` class). I went through all the uses and updated them accordingly. >> >> As a consequence, I could turn `assertion_predicate_has_loop_opaque_node()` into a debug only method since it's now only used in assertion code. I additionally removed the second input of the `Opaque4` nodes since these nodes have always eventually been replaced by true. >> >> ### Turning `OpaqueTemplateAssertionPredicate` into a Non-Macro Node >> The `Opaque4` was a macro node. The `OpaqueNotNull` still is but the `OpaqueTemplateAssertionPredicate` is turned into a non-macro node and is removed after loop opts. At that point, loops can no longer be split and we do no need to create Initialized Assertion Predicates from the templates anymore. Thus, they can be cleaned up in the post loop opts IGVN. >> >> ### Pre-Requisite for Replacing Uncommon Traps with Halt Nodes for Template Assertion Predicates >> Eventually, I wa... > > src/hotspot/share/opto/loopTransform.cpp line 1932: > >> 1930: // - For the last access a[init+new_stride-orig_stride] (with the new unroll stride) >> 1931: prev_proj = create_initialized_assertion_predicate(iff, init, max_value, prev_proj); >> 1932: } else { > > Why is it safe to remove this? Before the patch, we could have an `Opaque4` for an Template Assertion Predicate or Non-null check. Therefore, we additionally called `assertion_predicate_has_loop_opaque_node(iff)` to distinguish them. Now, it is enough to check for `bol->is_OpaqueTemplateAssertionPredicate()`. We can still have an `OpaqueNotNull` node which I now assert on L1942 instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21608#discussion_r1812087201 From epeter at openjdk.org Wed Oct 23 07:51:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 07:51:35 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v4] In-Reply-To: References: Message-ID: > I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. > > Reasons: > - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. > - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. > - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: > - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). > - Strengthening the rules. > - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. > > I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. > > Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. > > Ah. And about runtime of the test. On my machine I get this (in ms): > > Generate: 27 > Compile: 5845 > Run: 23435 > > Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 32 additional commits since the last revision: - Merge branch 'master' into JDK-8342387-refactor-TestDependencyOffsets - Suggestions by Christian - add @compile for IR Framework - more comment - simplify further by removing explicit CPU/Platform vector width - remove 2 useless runs - whitespace - aliasing modes - further cosmetics and comments - add more cases - ... and 22 more: https://git.openjdk.org/jdk/compare/f58ac10a...fac481de ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21541/files - new: https://git.openjdk.org/jdk/pull/21541/files/46b95030..fac481de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21541&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21541&range=02-03 Stats: 109839 lines in 735 files changed: 102520 ins; 5688 del; 1631 mod Patch: https://git.openjdk.org/jdk/pull/21541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21541/head:pull/21541 PR: https://git.openjdk.org/jdk/pull/21541 From chagedorn at openjdk.org Wed Oct 23 07:55:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Oct 2024 07:55:11 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v4] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 07:51:35 GMT, Emanuel Peter wrote: >> I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. >> >> Reasons: >> - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. >> - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. >> - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: >> - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). >> - Strengthening the rules. >> - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. >> >> I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. >> >> Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. >> >> Ah. And about runtime of the test. On my machine I get this (in ms): >> >> Generate: 27 >> Compile: 5845 >> Run: 23435 >> >> Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 32 additional commits since the last revision: > > - Merge branch 'master' into JDK-8342387-refactor-TestDependencyOffsets > - Suggestions by Christian > - add @compile for IR Framework > - more comment > - simplify further by removing explicit CPU/Platform vector width > - remove 2 useless runs > - whitespace > - aliasing modes > - further cosmetics and comments > - add more cases > - ... and 22 more: https://git.openjdk.org/jdk/compare/51d48b54...fac481de Thanks for the update, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21541#pullrequestreview-2387622948 From chagedorn at openjdk.org Wed Oct 23 07:55:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Oct 2024 07:55:12 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v2] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 06:03:47 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 736: >> >>> 734: IRRule r1 = new IRRule(type, type.irNode); >>> 735: r1.addApplyIf("\"AlignVector\", \"false\""); >>> 736: r1.addApplyIf("\"MaxVectorSize\", \">=" + minVectorWidth + "\""); >> >> Maybe at some point, the Compile Framework could offer such an IR rule builder class. But might be something for a future RFE. > > Nice idea. Probably that would rather be part of the (future) Template Framework? Yes, that's also a good option. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1812100729 From stuefe at openjdk.org Wed Oct 23 07:58:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 23 Oct 2024 07:58:05 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: <9IiKXnQanLQUv8xp_FSy7tCMA89yhiadFbpBU2lcm3M=.57e6b08b-88dd-414b-868d-fe8ac0b692d3@github.com> Message-ID: <19zPHOF4C3CkxqffEUNFyZikXTBpWKFogFeEMlvDUCw=.8375243c-b72c-4079-bf2d-973e8b012e1f@github.com> On Tue, 22 Oct 2024 19:42:47 GMT, Martin Doerr wrote: > > Well, with my quick-search I found information like that too. I don't see though, how it helps understand that the printed values are correct. > > I think the only way to verify that the printed values are correct is testing it. I'm currently experimenting with setting up XMM values and then crashing intentionally. I can see that my values get printed, but on linux, XMM[0] seems to be always 0 and the other registers off by one. Strange. I got curious, then confused, and looked. See https://stackoverflow.com/questions/43415882/reading-sse-registers-xmm-ymm-in-a-signal-handler - even though it refers to ymm registers. The answer seems to indicate that the glibc structure is not what you should be using, but going via uc_mcontext seems to be the right way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2431201868 From chagedorn at openjdk.org Wed Oct 23 08:03:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Oct 2024 08:03:10 GMT Subject: RFR: 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:46:16 GMT, Christian Hagedorn wrote: > ### Two Uses of `Opaque4` > The `Opaque4` node is currently used for two things: > 1. Non-null checks of intrinsics to accompany a cast to non-null where we implicitly know that an object is non-null but C2 does not. This is required to fold control away if the cast becomes top at some point. > 2. Template Assertion Predicates > > ### How to Differentiate between Uses > The only reliable way to differentiate between the two uses is to check if there are `OpaqueLoop*Nodes` above the `Opaque4` node which should only be found for a Template Assertion Predicate. This can be done with the method `assertion_predicate_has_loop_opaque_node()` which follows the inputs of the `Opaque4` recursively. > > ### Problems by Sharing `Opaque4` Nodes for Two Concepts > This sharing of the `Opaque4` comes with some problems: > - One need to be careful when checking for `Opaque4` nodes if the code can be applied for non-null checks and/or Template Assertion Predicates. Ideally, we should add assertions to exclude the unexpected case (mostly done). > - Walking the graph with `assertion_predicate_has_loop_opaque_node()` has some overhead, especially when checking the negation, we could walk quite some nodes in the worst case. > - There is no real benefit of (re)using `Opaque4` nodes for two concepts and could easily be separated. > > ### Split `Opaque4` into Two Classes to Separate Uses > Therefore, this patch proposes to split `Opaque4` into a new `OpaqueNonNull` and a `OpaqueTemplateAssertionPredicate` class (the latter accompanies the already split off `OpaqueInitializedAssertionPredicate` class). I went through all the uses and updated them accordingly. > > As a consequence, I could turn `assertion_predicate_has_loop_opaque_node()` into a debug only method since it's now only used in assertion code. I additionally removed the second input of the `Opaque4` nodes since these nodes have always eventually been replaced by true. > > ### Turning `OpaqueTemplateAssertionPredicate` into a Non-Macro Node > The `Opaque4` was a macro node. The `OpaqueNotNull` still is but the `OpaqueTemplateAssertionPredicate` is turned into a non-macro node and is removed after loop opts. At that point, loops can no longer be split and we do no need to create Initialized Assertion Predicates from the templates anymore. Thus, they can be cleaned up in the post loop opts IGVN. > > ### Pre-Requisite for Replacing Uncommon Traps with Halt Nodes for Template Assertion Predicates > Eventually, I want to get rid of UCTs for Template Assertion Pre... Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21608#issuecomment-2431213406 From chagedorn at openjdk.org Wed Oct 23 08:03:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Oct 2024 08:03:11 GMT Subject: Integrated: 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:46:16 GMT, Christian Hagedorn wrote: > ### Two Uses of `Opaque4` > The `Opaque4` node is currently used for two things: > 1. Non-null checks of intrinsics to accompany a cast to non-null where we implicitly know that an object is non-null but C2 does not. This is required to fold control away if the cast becomes top at some point. > 2. Template Assertion Predicates > > ### How to Differentiate between Uses > The only reliable way to differentiate between the two uses is to check if there are `OpaqueLoop*Nodes` above the `Opaque4` node which should only be found for a Template Assertion Predicate. This can be done with the method `assertion_predicate_has_loop_opaque_node()` which follows the inputs of the `Opaque4` recursively. > > ### Problems by Sharing `Opaque4` Nodes for Two Concepts > This sharing of the `Opaque4` comes with some problems: > - One need to be careful when checking for `Opaque4` nodes if the code can be applied for non-null checks and/or Template Assertion Predicates. Ideally, we should add assertions to exclude the unexpected case (mostly done). > - Walking the graph with `assertion_predicate_has_loop_opaque_node()` has some overhead, especially when checking the negation, we could walk quite some nodes in the worst case. > - There is no real benefit of (re)using `Opaque4` nodes for two concepts and could easily be separated. > > ### Split `Opaque4` into Two Classes to Separate Uses > Therefore, this patch proposes to split `Opaque4` into a new `OpaqueNonNull` and a `OpaqueTemplateAssertionPredicate` class (the latter accompanies the already split off `OpaqueInitializedAssertionPredicate` class). I went through all the uses and updated them accordingly. > > As a consequence, I could turn `assertion_predicate_has_loop_opaque_node()` into a debug only method since it's now only used in assertion code. I additionally removed the second input of the `Opaque4` nodes since these nodes have always eventually been replaced by true. > > ### Turning `OpaqueTemplateAssertionPredicate` into a Non-Macro Node > The `Opaque4` was a macro node. The `OpaqueNotNull` still is but the `OpaqueTemplateAssertionPredicate` is turned into a non-macro node and is removed after loop opts. At that point, loops can no longer be split and we do no need to create Initialized Assertion Predicates from the templates anymore. Thus, they can be cleaned up in the post loop opts IGVN. > > ### Pre-Requisite for Replacing Uncommon Traps with Halt Nodes for Template Assertion Predicates > Eventually, I want to get rid of UCTs for Template Assertion Pre... This pull request has now been integrated. Changeset: 7131f053 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e Stats: 237 lines in 18 files changed: 66 ins; 19 del; 152 mod 8342043: Split Opaque4Node into OpaqueTemplateAssertionPredicateNode and OpaqueNotNullNode Reviewed-by: kvn, roland ------------- PR: https://git.openjdk.org/jdk/pull/21608 From jbhateja at openjdk.org Wed Oct 23 08:14:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 23 Oct 2024 08:14:12 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering In-Reply-To: References: Message-ID: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> On Mon, 21 Oct 2024 04:11:03 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. > > In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! src/hotspot/cpu/arm/c2_lowering_arm.cpp line 29: > 27: #include "opto/phaseX.hpp" > 28: > 29: Node* PhaseLowering::lower_node(Node* in) { Suggestion: Node* PhaseLowering::lower_node(Node* n) { src/hotspot/cpu/ppc/c2_lowering_ppc.cpp line 29: > 27: #include "opto/phaseX.hpp" > 28: > 29: Node* PhaseLowering::lower_node(Node* in) { Suggestion: Node* PhaseLowering::lower_node(Node* n) { src/hotspot/cpu/riscv/c2_lowering_riscv.cpp line 29: > 27: #include "opto/phaseX.hpp" > 28: > 29: Node* PhaseLowering::lower_node(Node* in) { Suggestion: Node* PhaseLowering::lower_node(Node* n) { src/hotspot/cpu/s390/c2_lowering_s390.cpp line 29: > 27: #include "opto/phaseX.hpp" > 28: > 29: Node* PhaseLowering::lower_node(Node* in) { Suggestion: Node* PhaseLowering::lower_node(Node* n) { src/hotspot/cpu/x86/c2_lowering_x86.cpp line 29: > 27: #include "opto/phaseX.hpp" > 28: > 29: Node* PhaseLowering::lower_node(Node* in) { Suggestion: Node* PhaseLowering::lower_node(Node* n) { src/hotspot/share/opto/compile.cpp line 2466: > 2464: print_method(PHASE_BEFORE_LOWERING, 3); > 2465: > 2466: PhaseLowering lower(&igvn); Any specific reason to have lowering after loop optimizations ? Lowered nodes may change the loop body size thereby impacting unrolling decisions. src/hotspot/share/opto/phaseX.cpp line 2301: > 2299: while(_igvn->_worklist.size() != 0) { > 2300: Node* n = _igvn->_worklist.pop(); > 2301: Node* new_node = lower_node(n); _PhaseLowring::lower_node_ may do complex transformation where by replacing a graph pallet rooted at current node by another pallet. For each newly created node in new pallet, it should make sure to either directly run _igvn.transform, thereby triggering Ideal / Identity / Value sub-passed over it, OR insert the node into _igvn.worklist for lazy processing, in latter case you are consuming entire worklist after running over only Value transforms before existing the lowering phase. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1812113082 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1812113953 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1812114516 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1812118894 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1812119441 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1812140992 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1812110851 From roland at openjdk.org Wed Oct 23 08:34:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 23 Oct 2024 08:34:41 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate Message-ID: Superword creates a `Replicate` node at a `ConvL2I` node and uses the type of the result of the `ConvL2I` to pick the type of the `Replicate` instead of the type of the input to the `ConvL2I`. ------------- Commit messages: - test - fix Changes: https://git.openjdk.org/jdk/pull/21660/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21660&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341834 Stats: 49 lines in 2 files changed: 48 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21660.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21660/head:pull/21660 PR: https://git.openjdk.org/jdk/pull/21660 From epeter at openjdk.org Wed Oct 23 09:07:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 09:07:35 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v5] In-Reply-To: References: Message-ID: > I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. > > Reasons: > - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. > - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. > - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: > - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). > - Strengthening the rules. > - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. > > I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. > > Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. > > Ah. And about runtime of the test. On my machine I get this (in ms): > > Generate: 27 > Compile: 5845 > Run: 23435 > > Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix for aarch64 Matcher::min_vector_size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21541/files - new: https://git.openjdk.org/jdk/pull/21541/files/fac481de..2960ebf6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21541&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21541&range=03-04 Stats: 104 lines in 1 file changed: 29 ins; 3 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/21541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21541/head:pull/21541 PR: https://git.openjdk.org/jdk/pull/21541 From dlong at openjdk.org Wed Oct 23 09:08:12 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 23 Oct 2024 09:08:12 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 22 Oct 2024 07:19:54 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > changes to NoOverflowInt for Dean OverflowInt updates look good. (That's the only part I reviewed.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2431394335 From dlong at openjdk.org Wed Oct 23 09:08:13 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 23 Oct 2024 09:08:13 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v5] In-Reply-To: <5s8RC77-wK6VzWPn3rMzYbbx5u1rrz38MFYwmFcKlRk=.db068153-574a-4c92-9e16-62bdda59f8cd@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <5s8RC77-wK6VzWPn3rMzYbbx5u1rrz38MFYwmFcKlRk=.db068153-574a-4c92-9e16-62bdda59f8cd@github.com> Message-ID: On Tue, 22 Oct 2024 07:06:11 GMT, Emanuel Peter wrote: >> It is not necessary, and I actually test that in my gtest. The subtraction below handles the `min_jint` case. > > `ASSERT_TRUE(NoOverflowInt(min_jint).abs().is_NaN());` > > But if you insist on it I can add it in. No, it's fine. I missed the fact that the subtract already handles it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1812270307 From thartmann at openjdk.org Wed Oct 23 10:57:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Oct 2024 10:57:03 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 08:30:15 GMT, Roland Westrelin wrote: > Superword creates a `Replicate` node at a `ConvL2I` node and uses the > type of the result of the `ConvL2I` to pick the type of the > `Replicate` instead of the type of the input to the `ConvL2I`. Is this a regression from [JDK-8332163](https://bugs.openjdk.org/browse/JDK-8332163) or [JDK-8248830](https://bugs.openjdk.org/browse/JDK-8248830)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21660#issuecomment-2431724475 From amitkumar at openjdk.org Wed Oct 23 11:06:07 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 23 Oct 2024 11:06:07 GMT Subject: RFR: 8340445: [PPC64] Wrong ConditionRegister used in ppc64.ad: flagsRegCR0 cr1 In-Reply-To: References: Message-ID: <52aOidPCKYwPbT8Bnh70Nnax2fCIKFDDl8djCCeJVnA=.9a647e25-2614-4ca0-9ed8-fd952190549e@github.com> On Fri, 4 Oct 2024 10:26:07 GMT, Sorna Sarathi wrote: > This PR changes array_equalsB and array_equalsC to use flagsRegCR1 instead of flagsRegCR0 for KILL effects. This change enhances clarity while maintaining current functionality. > > Build(release debug level) and tier1 testing are successful > > JBS Issue: [JDK-8340445](https://bugs.openjdk.org/browse/JDK-8340445) Looks good. Please update copyright headers. ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/21353#pullrequestreview-2388221898 From amitkumar at openjdk.org Wed Oct 23 11:06:07 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 23 Oct 2024 11:06:07 GMT Subject: RFR: 8340445: [PPC64] Wrong ConditionRegister used in ppc64.ad: flagsRegCR0 cr1 In-Reply-To: <52aOidPCKYwPbT8Bnh70Nnax2fCIKFDDl8djCCeJVnA=.9a647e25-2614-4ca0-9ed8-fd952190549e@github.com> References: <52aOidPCKYwPbT8Bnh70Nnax2fCIKFDDl8djCCeJVnA=.9a647e25-2614-4ca0-9ed8-fd952190549e@github.com> Message-ID: On Wed, 23 Oct 2024 11:01:03 GMT, Amit Kumar wrote: >Please update copyright headers. I guess we can skip. Oracle's copyright year is already 2024. :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21353#issuecomment-2431744292 From thartmann at openjdk.org Wed Oct 23 11:50:18 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Oct 2024 11:50:18 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v2] In-Reply-To: References: Message-ID: On Sun, 20 Oct 2024 16:40:32 GMT, Quan Anh Mai wrote: >> Hi, >> >> This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. >> >> Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. >> >> This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into constanttable > - refactor array constant, fix codebuffer reallocation Looks good to me overall but reviewers who looked at [JDK-8278947](https://bugs.openjdk.org/browse/JDK-8278947) (@vnkozlov, @iwanowww) should have a look at this as well. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1715: > 1713: } else { > 1714: switch (vlen_in_bytes) { > 1715: case 4: movflt(dst, src); break; Indentation of the case statements needs to be fixed. Also above. src/hotspot/cpu/x86/x86.ad line 2743: > 2741: case T_BYTE: val->at(i) = con; break; > 2742: case T_SHORT: { > 2743: jshort c = con; Why are these casts needed? Isn't `T con` already of the appropriate j-type? ------------- PR Review: https://git.openjdk.org/jdk/pull/21596#pullrequestreview-2388280645 PR Review Comment: https://git.openjdk.org/jdk/pull/21596#discussion_r1812526891 PR Review Comment: https://git.openjdk.org/jdk/pull/21596#discussion_r1812556856 From epeter at openjdk.org Wed Oct 23 11:55:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 11:55:04 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 08:30:15 GMT, Roland Westrelin wrote: > Superword creates a `Replicate` node at a `ConvL2I` node and uses the > type of the result of the `ConvL2I` to pick the type of the > `Replicate` instead of the type of the input to the `ConvL2I`. Hi Roland, Isn't it a little strange that we have a Replicate before a ConvL2I pack? That means that the ConvL2I did not common, even though they have the same inputs. I guess that is due to the `type` not being identical - a rather rare case. Pack: 8 0: 739 ConvL2I === _ 742 [[ 738 ]] #int:2..99:www !orig=612,467,407,[138],[156] !jvms: Test4::test @ bci:41 (line 11) 1: 741 ConvL2I === _ 742 [[ 740 ]] #int:2..198:www !orig=613,468,440 Pack: 9 0: 755 ConvL2I === _ 756 [[ 738 ]] #int:2..109:www !orig=594,458,[399],137 !jvms: Test4::test @ bci:37 (line 10) 1: 754 ConvL2I === _ 756 [[ 740 ]] #int:2..208:www !orig=591,514 I played around with the test case as well, reducing it further: `./java -XX:CompileCommand=quiet -XX:CompileCommand=compileonly,Test4::test -Xcomp -XX:+TraceSuperWord -XX:+TraceNewVectors -XX:UseAVX=2 Test4.java` public class Test4 { public static long val = 0; public static void test(int x) { x = Math.max(0, Math.min(10, x)); // type 0..10 short a[] = new short[500]; for (long l = 0; l < 100; l++) { val = l + x; // store seems required, hmm int y = (int)val; int z = (int)l; // this becomes multiple ConvL2I a[z] = (short)(z - y); } } public static void main(String[] args) { Math.min(0, 1); Math.max(0, 1); test(0); } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/21660#issuecomment-2431873793 From thartmann at openjdk.org Wed Oct 23 12:01:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Oct 2024 12:01:10 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v2] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 11:53:33 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into JDK-8342692 > - more > - more > - more > - more > - more > - fix & test I didn't look at it yet but submitted some quick testing. The build on Mac AArch64 fails: [2024-10-23T11:56:28,256Z] /System/Volumes/Data/mesos/work_dir/slaves/7a20d425-e769-4142-b5c1-e3cc2d88e03e-S37429/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/6cb59bf8-52bf-4698-bc7c-7bac27fa71af/runs/66922795-8c84-4b35-8612-4d25564e6c23/workspace/open/src/hotspot/share/opto/loopTransform.cpp:2069:69: error: format specifies type 'long' but the argument has type 'julong' (aka 'unsigned long long') [-Werror,-Wformat] [2024-10-23T11:56:28,256Z] tty->print("Unroll %d(%2ld) ", loop_head->unrolled_count()*2, loop_head->trip_count()); [2024-10-23T11:56:28,256Z] ~~~~ ^~~~~~~~~~~~~~~~~~~~~~~ [2024-10-23T11:56:28,256Z] %2llu [2024-10-23T11:56:28,256Z] /System/Volumes/Data/mesos/work_dir/slaves/7a20d425-e769-4142-b5c1-e3cc2d88e03e-S37429/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/6cb59bf8-52bf-4698-bc7c-7bac27fa71af/runs/66922795-8c84-4b35-8612-4d25564e6c23/workspace/open/src/hotspot/share/opto/loopTransform.cpp:2322:35: error: format specifies type 'long' but the argument has type 'julong' (aka 'unsigned long long') [-Werror,-Wformat] [2024-10-23T11:56:28,256Z] tty->print("MaxUnroll %ld ", cl->trip_count()); [2024-10-23T11:56:28,256Z] ~~~ ^~~~~~~~~~~~~~~~ [2024-10-23T11:56:28,256Z] %llu ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2431891426 From epeter at openjdk.org Wed Oct 23 12:02:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 12:02:04 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 08:30:15 GMT, Roland Westrelin wrote: > Superword creates a `Replicate` node at a `ConvL2I` node and uses the > type of the result of the `ConvL2I` to pick the type of the > `Replicate` instead of the type of the input to the `ConvL2I`. The fix seems correct, but I fear that maybe older versions could be affected, it would just be very difficult to create that `ConvL2I` pack. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21660#issuecomment-2431893017 From epeter at openjdk.org Wed Oct 23 12:10:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 12:10:04 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 08:30:15 GMT, Roland Westrelin wrote: > Superword creates a `Replicate` node at a `ConvL2I` node and uses the > type of the result of the `ConvL2I` to pick the type of the > `Replicate` instead of the type of the input to the `ConvL2I`. I would add a comment in the VM code. The one-liner is a little dense to read. Basically the issue is that `velt_basic_type(p0)` gives us the output type of `p0`. But what we need is the input type of `p0` - this is what we are replicating for. Now I'm wondering if there are any other `p0` nodes that have diverging input/output type? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21660#issuecomment-2431911557 From epeter at openjdk.org Wed Oct 23 12:24:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 12:24:04 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 08:30:15 GMT, Roland Westrelin wrote: > Superword creates a `Replicate` node at a `ConvL2I` node and uses the > type of the result of the `ConvL2I` to pick the type of the > `Replicate` instead of the type of the input to the `ConvL2I`. I found these: // Java API for Long.bitCount/numberOfLeadingZeros/numberOfTrailingZeros // returns int type, but Vector API for them returns long type. To unify // the implementation in backend, AutoVectorization splits the vector // implementation for Java API into an execution node with long type plus // another node converting long to int. bool VectorNode::is_scalar_op_that_returns_int_but_vector_op_returns_long(int opc) { switch (opc) { case Op_PopCountL: case Op_CountLeadingZerosL: case Op_CountTrailingZerosL: return true; default: return false; } } But they are single-input ops, so if they have the same inputs, they would common, and not create a pack with an input replicate node. Hmm. I tried to play with `MulAddS2I`, but so far no success with getting an example that vectorizes with Replicate... Do you think there are any other cases than `Conv` where input and output do not match? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21660#issuecomment-2431946173 From epeter at openjdk.org Wed Oct 23 12:25:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 12:25:11 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v4] In-Reply-To: References: Message-ID: <6_DyHtag7ygvtbx9hM4Ctg4NX2ec8Br353WA_GmacZ8=.3e786fee-fd85-43fa-bd61-cb37960e38fd@github.com> On Wed, 23 Oct 2024 07:52:44 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 32 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8342387-refactor-TestDependencyOffsets >> - Suggestions by Christian >> - add @compile for IR Framework >> - more comment >> - simplify further by removing explicit CPU/Platform vector width >> - remove 2 useless runs >> - whitespace >> - aliasing modes >> - further cosmetics and comments >> - add more cases >> - ... and 22 more: https://git.openjdk.org/jdk/compare/02227cd9...fac481de > > Thanks for the update, looks good! @chhagedorn @TobiHartmann I had to fix something - different architectures have different `min_vector_size` for different types. Feel free to review again, the tests are now passing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21541#issuecomment-2431954828 From duke at openjdk.org Wed Oct 23 12:35:11 2024 From: duke at openjdk.org (duke) Date: Wed, 23 Oct 2024 12:35:11 GMT Subject: RFR: 8340445: [PPC64] Wrong ConditionRegister used in ppc64.ad: flagsRegCR0 cr1 In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 10:26:07 GMT, Sorna Sarathi wrote: > This PR changes array_equalsB and array_equalsC to use flagsRegCR1 instead of flagsRegCR0 for KILL effects. This change enhances clarity while maintaining current functionality. > > Build(release debug level) and tier1 testing are successful > > JBS Issue: [JDK-8340445](https://bugs.openjdk.org/browse/JDK-8340445) @Sorna-Sarathi Your change (at version 332168447d9f8c6eb364e7d68c0d2393c578f66d) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21353#issuecomment-2432002931 From thartmann at openjdk.org Wed Oct 23 12:37:06 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Oct 2024 12:37:06 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v5] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 09:07:35 GMT, Emanuel Peter wrote: >> I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. >> >> Reasons: >> - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. >> - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. >> - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: >> - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). >> - Strengthening the rules. >> - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. >> >> I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. >> >> Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. >> >> Ah. And about runtime of the test. On my machine I get this (in ms): >> >> Generate: 27 >> Compile: 5845 >> Run: 23435 >> >> Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix for aarch64 Matcher::min_vector_size test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 26: > 24: /* > 25: * @test id=vanilla-A > 26: * @bug 8298935 8308606 8310308 8312570 8310190 8342387 I don't think the bug number should be added here. It refers to the bugs that this regression test will trigger. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1812657813 From duke at openjdk.org Wed Oct 23 12:43:16 2024 From: duke at openjdk.org (Sorna Sarathi) Date: Wed, 23 Oct 2024 12:43:16 GMT Subject: Integrated: 8340445: [PPC64] Wrong ConditionRegister used in ppc64.ad: flagsRegCR0 cr1 In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 10:26:07 GMT, Sorna Sarathi wrote: > This PR changes array_equalsB and array_equalsC to use flagsRegCR1 instead of flagsRegCR0 for KILL effects. This change enhances clarity while maintaining current functionality. > > Build(release debug level) and tier1 testing are successful > > JBS Issue: [JDK-8340445](https://bugs.openjdk.org/browse/JDK-8340445) This pull request has now been integrated. Changeset: 964d8d22 Author: Sorna Sarathi Committer: Amit Kumar URL: https://git.openjdk.org/jdk/commit/964d8d2234595afaf4dfe48ea5cacdbfd3792d03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8340445: [PPC64] Wrong ConditionRegister used in ppc64.ad: flagsRegCR0 cr1 Reviewed-by: mdoerr, amitkumar ------------- PR: https://git.openjdk.org/jdk/pull/21353 From thartmann at openjdk.org Wed Oct 23 12:43:16 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Oct 2024 12:43:16 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v5] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 09:07:35 GMT, Emanuel Peter wrote: >> I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. >> >> Reasons: >> - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. >> - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. >> - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: >> - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). >> - Strengthening the rules. >> - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. >> >> I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. >> >> Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. >> >> Ah. And about runtime of the test. On my machine I get this (in ms): >> >> Generate: 27 >> Compile: 5845 >> Run: 23435 >> >> Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix for aarch64 Matcher::min_vector_size Marked as reviewed by thartmann (Reviewer). test/hotspot/jtreg/compiler/loopopts/superword/TestDependencyOffsets.java line 587: > 585: /* > 586: * Every CPU can define its own Matcher::min_vector_size. This happens to be different for > 587: * our targetted platforms: x86 / sse4.1 and aarch64 / asimd. Suggestion: * our targeted platforms: x86 / sse4.1 and aarch64 / asimd. ------------- PR Review: https://git.openjdk.org/jdk/pull/21541#pullrequestreview-2388528239 PR Review Comment: https://git.openjdk.org/jdk/pull/21541#discussion_r1812667762 From epeter at openjdk.org Wed Oct 23 13:23:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 13:23:42 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v6] In-Reply-To: References: Message-ID: > I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. > > Reasons: > - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. > - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. > - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: > - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). > - Strengthening the rules. > - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. > > I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. > > Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. > > Ah. And about runtime of the test. On my machine I get this (in ms): > > Generate: 27 > Compile: 5845 > Run: 23435 > > Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: spelling and rm new bug number ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21541/files - new: https://git.openjdk.org/jdk/pull/21541/files/2960ebf6..5ed203ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21541&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21541&range=04-05 Stats: 35 lines in 1 file changed: 0 ins; 0 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/21541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21541/head:pull/21541 PR: https://git.openjdk.org/jdk/pull/21541 From epeter at openjdk.org Wed Oct 23 13:29:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 13:29:16 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <1Absnfnkvp8CVAsnsMV3r0n76YNnnKfpCIRpB7Cl9Lo=.173955e7-7799-468e-a04b-fef8abe1d9ad@github.com> On Wed, 23 Oct 2024 09:04:59 GMT, Dean Long wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> changes to NoOverflowInt for Dean > > OverflowInt updates look good. (That's the only part I reviewed.) @dean-long ok, thanks for the NoOverflowInt review ? In that case, I think we need yet another reviewer @vnkozlov , right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2432170657 From thartmann at openjdk.org Wed Oct 23 13:29:17 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Oct 2024 13:29:17 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v6] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 13:23:42 GMT, Emanuel Peter wrote: >> I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. >> >> Reasons: >> - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. >> - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. >> - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: >> - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). >> - Strengthening the rules. >> - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. >> >> I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. >> >> Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. >> >> Ah. And about runtime of the test. On my machine I get this (in ms): >> >> Generate: 27 >> Compile: 5845 >> Run: 23435 >> >> Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > spelling and rm new bug number Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21541#pullrequestreview-2388691376 From epeter at openjdk.org Wed Oct 23 13:29:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 23 Oct 2024 13:29:17 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v5] In-Reply-To: References: Message-ID: <3lFYKg73kirb8KISR6Jgwzg0Pgbz6i5b2ge_Dvch5gA=.a24089b1-9b7a-439d-b901-8bf01a0e60b0@github.com> On Wed, 23 Oct 2024 12:40:39 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix for aarch64 Matcher::min_vector_size > > Marked as reviewed by thartmann (Reviewer). @TobiHartmann thanks for the additional comments, all applied. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21541#issuecomment-2432167488 From mli at openjdk.org Wed Oct 23 13:31:15 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 23 Oct 2024 13:31:15 GMT Subject: RFR: 8342884: RISC-V: verify float <--> float16 conversion Message-ID: <3STkw_TrCn1ib_wPr4NlZxahYim-9hVafE_4g7gZ8WQ=.8ad2e319-440d-43f1-b091-1b603be28e09@github.com> Hi, Can you help to review this simple patch? it removes the `Experimental` of the `UseZvfh`. Thanks Currently, only float <--> float16 conversions use Zvfh extension, I've run the jmh tests on bananapi, the performance result shows it's good. Benchmark-XX:+UseZfh -XX:+UnlockExperimentalVMOptions -XX:+/-UseZvfh | (size) | Mode | Cnt | Score -intrinsic | Score +intrinsic | Error | Units | Improvement -- | -- | -- | -- | -- | -- | -- | -- | -- Fp16ConversionBenchmark.float16ToFloat | 2048 | avgt | 10 | 8129.72 | 4729.125 | 71.937 | ns/op | 1.719 Fp16ConversionBenchmark.float16ToFloatMemory | 2048 | avgt | 10 | 16.9 | 16.894 | 0.002 | ns/op | 1 Fp16ConversionBenchmark.floatToFloat16 | 2048 | avgt | 10 | 12561.962 | 3767.944 | 12.652 | ns/op | 3.334 Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | avgt | 10 | 18.146 | 18.147 | 0.003 | ns/op | 1 ------------- Commit messages: - initial commit - Merge branch 'openjdk:master' into master - Revert "initial commit" - initial commit Changes: https://git.openjdk.org/jdk/pull/21664/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21664&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342884 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21664/head:pull/21664 PR: https://git.openjdk.org/jdk/pull/21664 From chagedorn at openjdk.org Wed Oct 23 14:03:08 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 23 Oct 2024 14:03:08 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v6] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 13:23:42 GMT, Emanuel Peter wrote: >> I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. >> >> Reasons: >> - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. >> - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. >> - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: >> - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). >> - Strengthening the rules. >> - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. >> >> I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. >> >> Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. >> >> Ah. And about runtime of the test. On my machine I get this (in ms): >> >> Generate: 27 >> Compile: 5845 >> Run: 23435 >> >> Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > spelling and rm new bug number Still good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21541#pullrequestreview-2388848227 From roland at openjdk.org Wed Oct 23 14:32:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 23 Oct 2024 14:32:51 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v3] In-Reply-To: References: Message-ID: <2FVz5G_WC9zJg2TGQ-MPmcWcSPgX5eVknTo58tm3TFI=.ca976a58-a49f-4030-9882-d52febeb82b1@github.com> > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: build fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/d7cde041..2deb4dba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=01-02 Stats: 5 lines in 2 files changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From kvn at openjdk.org Wed Oct 23 15:22:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 23 Oct 2024 15:22:09 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 22 Oct 2024 07:19:54 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > changes to NoOverflowInt for Dean Latest code is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19970#pullrequestreview-2389171587 From fbredberg at openjdk.org Wed Oct 23 15:29:12 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 23 Oct 2024 15:29:12 GMT Subject: RFR: 8342683: Use non-short forward jump when passing stop() In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 11:57:58 GMT, Fredrik Bredberg wrote: > Fixed a "short forward jump exceeds 8-bit offset at" error in `fast_unlock_lightweight()` that appears on x86 based platforms when using `-XX:+ShowMessageBoxOnError`. > > Tested ok in tier1-3 on all x86 based platforms. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21635#issuecomment-2432620139 From fbredberg at openjdk.org Wed Oct 23 15:29:12 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 23 Oct 2024 15:29:12 GMT Subject: Integrated: 8342683: Use non-short forward jump when passing stop() In-Reply-To: References: Message-ID: <-JosP3c3YfwDBGGt-VlWeh-shbVty_zoeF61FZWydMM=.fee187c9-8605-42b2-a1e0-e8014771070e@github.com> On Tue, 22 Oct 2024 11:57:58 GMT, Fredrik Bredberg wrote: > Fixed a "short forward jump exceeds 8-bit offset at" error in `fast_unlock_lightweight()` that appears on x86 based platforms when using `-XX:+ShowMessageBoxOnError`. > > Tested ok in tier1-3 on all x86 based platforms. This pull request has now been integrated. Changeset: afb62f73 Author: Fredrik Bredberg URL: https://git.openjdk.org/jdk/commit/afb62f73499c09f4a7bde6f522fcd3ef1278e526 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod 8342683: Use non-short forward jump when passing stop() Reviewed-by: aboldtch, shade, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21635 From mdoerr at openjdk.org Wed Oct 23 15:34:14 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 23 Oct 2024 15:34:14 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 07:20:52 GMT, Richard Reingruber wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Thanks Martin for doing this. I actually wanted to attempt writing a test myself before I read your message. > >> I can see that my values get printed, but on linux, XMM[0] seems to be always 0 and the other registers off by one. Strange. > > Strange indeed. Thanks for your input @reinrich, @tstuefe. I've tried taking the values from `uc->uc_mcontext.fpregs->_xmm[i]`. This seems to work fine if we got a signal. However, "should_not_reach_here()" and friends don't trigger any signal. They call `MacroAssembler::debug64`. Surprisingly, my old code based on `uc->__fpregs_mem._xmm[i].element[0]` works for them (besides the off-by-one issue), but the new proposal doesn't. Any idea? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2432634946 From kvn at openjdk.org Wed Oct 23 15:35:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 23 Oct 2024 15:35:15 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 08:30:15 GMT, Roland Westrelin wrote: > Superword creates a `Replicate` node at a `ConvL2I` node and uses the > type of the result of the `ConvL2I` to pick the type of the > `Replicate` instead of the type of the input to the `ConvL2I`. src/hotspot/share/opto/superwordVTransformBuilder.cpp line 231: > 229: } else { > 230: // Replicate the scalar same_input to every vector element. > 231: BasicType element_type = p0->is_Convert() ? p0->in(1)->bottom_type()->basic_type() : _vloop_analyzer.types().velt_basic_type(p0); What vectors are generated (or not) with this change? The array in the test ins `int[]` but the element_type will be Long now. Will it bailout vectorization? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21660#discussion_r1813048372 From rrich at openjdk.org Wed Oct 23 15:41:10 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 23 Oct 2024 15:41:10 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f For the record: the kernel sets the `fpregs` pointer [here](https://github.com/torvalds/linux/blob/eb26cbb1a754ccde5d4d74527dad5ba051808fad/arch/x86/kernel/signal_64.c#L128) :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2432667153 From qamai at openjdk.org Wed Oct 23 17:31:13 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 23 Oct 2024 17:31:13 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering In-Reply-To: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> References: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> Message-ID: On Wed, 23 Oct 2024 08:10:38 GMT, Jatin Bhateja wrote: >> Hi all, >> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. >> >> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! > > src/hotspot/share/opto/compile.cpp line 2466: > >> 2464: print_method(PHASE_BEFORE_LOWERING, 3); >> 2465: >> 2466: PhaseLowering lower(&igvn); > > Any specific reason to have lowering after loop optimizations ? > Lowered nodes may change the loop body size thereby impacting unrolling decisions. Because lowering is a transformation that increases the complexity of the graph. - A `d = ExtractD(z, 4)` expanded into `x = VectorExtract(z, 2); d = ExtractD(x, 0)` increases the number of nodes by 1. - A logic cone transformed into a `MacroLogicV` introduces another kind of node that may not be recognized by other nodes. As a result, we should do this as the last step when other transformation has finished their jobs. For the concerns regarding loop body size, we still have a function in `Matcher` for that purpose. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1813233875 From qamai at openjdk.org Wed Oct 23 17:42:05 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 23 Oct 2024 17:42:05 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering In-Reply-To: References: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> Message-ID: On Wed, 23 Oct 2024 17:28:25 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/compile.cpp line 2466: >> >>> 2464: print_method(PHASE_BEFORE_LOWERING, 3); >>> 2465: >>> 2466: PhaseLowering lower(&igvn); >> >> Any specific reason to have lowering after loop optimizations ? >> Lowered nodes may change the loop body size thereby impacting unrolling decisions. > > Because lowering is a transformation that increases the complexity of the graph. > > - A `d = ExtractD(z, 4)` expanded into `x = VectorExtract(z, 2); d = ExtractD(x, 0)` increases the number of nodes by 1. > - A logic cone transformed into a `MacroLogicV` introduces another kind of node that may not be recognized by other nodes. > > As a result, we should do this as the last step when other transformation has finished their jobs. For the concerns regarding loop body size, we still have a function in `Matcher` for that purpose. Another reason is that lowering being done late allows us to have more freedom to break some invariants of the nodes, such as looking through `VectorReinterpret`. An example is this (really crafted) case: Int256Vector v; int a = v.lane(5); float b = v.reinterpretAsFloats().lane(7); This would be transformed into: vector v; vector v0 = VectorExtract(v, 1); int a = ExtractI(v0, 1); vector v1 = VectorReinterpret(v, ); vector v2 = VectorExtract(v1, 1); float b = ExtractF(v2, 3); By allowing lowering to look through `VectorReinterpret` and break the invariant of `Extract` nodes that the element types of their inputs and outputs must be the same, we can `gvn` `v1` and `v`, `v2` and `v0`. Simplify the graph: vector v; vector v0 = VectorExtract(v, 1); int a = ExtractI(v0, 1); float b = ExtractF(v0, 3); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1813252989 From jrose at openjdk.org Wed Oct 23 18:36:13 2024 From: jrose at openjdk.org (John R Rose) Date: Wed, 23 Oct 2024 18:36:13 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v4] In-Reply-To: <50saqCLba9Bx2oSGvZdJNsfjc8ZaOtDSMWIDkqP8QSA=.7658d698-3a89-484f-b977-0694eae251ba@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> <50saqCLba9Bx2oSGvZdJNsfjc8ZaOtDSMWIDkqP8QSA=.7658d698-3a89-484f-b977-0694eae251ba@github.com> Message-ID: On Mon, 21 Oct 2024 21:22:57 GMT, Chad Rakoczy wrote: >> [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) >> >> Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change >> >> Ran array copy and tier 1 on aarch64 machine >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 >> ============================== >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 >> jtreg:test/jdk:tier1 2436 2436 0 0 >> jtreg:test/langtools:tier1 4577 4577 0 0 >> jtreg:test/jaxp:tier1 0 0 0 0 >> jtreg:test/lib-test:tier1 34 34 0 0 >> ============================== > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add comment The motivating comment helps, thanks. The fact that ORR is special-cased (on some chip we care about) is important. This is linked to the ARM convention that MOV is sugar for ORR (in these cases). A couple of possible principles come to mind about this little exercise. I will risk further PR churn here because they are fairly generally applicable. Possible Lesson 1: Document (briefly) the motivation for a micro-optimization that is not self-evident, so future maintainers know why it is there, and conversely when it might be time to simplify or reformulate it. Or even repeat it: There might be more places for this micro-optimization in the future, such as C2 or C1 backends, or additional assembly intrinsics. Some comments should hint what is the comment thread, if the trick is repeated. Possible Lesson 2: Consider factoring dependencies on special CPU knowledge (for micro-optimizations) to places where the ISA is made available (macro-assembler) instead of sprinkled where we are emitting instructions. In an extreme case, a micro-opt which is specific to one hardware implementation can use the VM-version bits to say, "am I doing this micro-opt today"? Clearly such a test is more closely relevant to the macro-assembler than it is to some bit-copy or oop-compress code. > [Dean] As LSR is an alias, I think we would expect it to generate the underlying ubfm encoding, so if we were going to optimize based on the shift value, we could introduce a new API with a name like shift_right(). As an example, FWIW, back in the days when I thought about the SPARC port full time, I might have reached (in a case like this) for a helper method called `maybe_lsr` and refactored the new if/then/else here as `__ maybe_lsr(r15, count, shift);`. For extra benefit, I might add a guard suppressing the `mov` if source and destination are identical, all because the `maybe_` is a clear enough signal that the emitted code is not 1-1 with the stated instruction. The body of `maybe_lsr` might have discussion of move-forwarding. It might also call another local API point `maybe_mov` which has the guard. I would have done this even if it was used only in one place, because of the possibility, as the system grew (it was early days) of having to perform the same micro-optimization more than once. But, there are also plenty of reasons not to do it that way, and I admit that (with the warning comment added) what we have now is completely workable. Aleksey, I looked at your proposed patch in [JDK-8341895](https://bugs.openjdk.org/browse/JDK-8341895). If I were doing it (but I'm not) I might pick a slightly more informative name (`maybe_ubfm`, `ubfm_or_mov`, or something like that), depending on how the actual uses would read. Also, the term `lsr` is a little higher level (more like a C op), and maybe such work with helper functions belongs at that upper level. ARM has lots of pseudo-instructions like `lsr` (for `ubfm`) and `mov` (for `orr`), and it would seem to be lots of unused effort to make magic versions of all the lowerings. HTH ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2433131622 From never at openjdk.org Wed Oct 23 19:41:06 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 23 Oct 2024 19:41:06 GMT Subject: RFR: 8342854: [JVMCI] Block secondary thread reporting a JVMCI fatal error In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 19:27:09 GMT, Doug Simon wrote: > A fatal crash on a second thread causes the thread to [sleep infinitely](https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/share/utilities/vmError.cpp#L1709-L1718) while error reporting continues on the first crashing thread. The same should be done for reporting fatal crashes in libjvmci to avoid interleaving reports. This PR implements this change. Looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21646#pullrequestreview-2390006089 From dnsimon at openjdk.org Wed Oct 23 20:04:09 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 23 Oct 2024 20:04:09 GMT Subject: RFR: 8342854: [JVMCI] Block secondary thread reporting a JVMCI fatal error In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 19:27:09 GMT, Doug Simon wrote: > A fatal crash on a second thread causes the thread to [sleep infinitely](https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/share/utilities/vmError.cpp#L1709-L1718) while error reporting continues on the first crashing thread. The same should be done for reporting fatal crashes in libjvmci to avoid interleaving reports. This PR implements this change. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21646#issuecomment-2433322230 From dnsimon at openjdk.org Wed Oct 23 20:04:09 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 23 Oct 2024 20:04:09 GMT Subject: Integrated: 8342854: [JVMCI] Block secondary thread reporting a JVMCI fatal error In-Reply-To: References: Message-ID: <9O1MlMK_p4VqtlfI7upeFUQNvckigiGsjshjbJsW-O4=.5525fec1-efde-4326-b523-d1075911e916@github.com> On Tue, 22 Oct 2024 19:27:09 GMT, Doug Simon wrote: > A fatal crash on a second thread causes the thread to [sleep infinitely](https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/share/utilities/vmError.cpp#L1709-L1718) while error reporting continues on the first crashing thread. The same should be done for reporting fatal crashes in libjvmci to avoid interleaving reports. This PR implements this change. This pull request has now been integrated. Changeset: 98403b75 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/98403b75df0a0737bdf082231f38c5c0019fe4c9 Stats: 12 lines in 2 files changed: 0 ins; 1 del; 11 mod 8342854: [JVMCI] Block secondary thread reporting a JVMCI fatal error Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/21646 From duke at openjdk.org Wed Oct 23 21:47:34 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 23 Oct 2024 21:47:34 GMT Subject: RFR: 8339507: appears to be causing 8GB build machines to hang Message-ID: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> We generate each test randomly. The `asmtest.out.h` is currently around 100 KB. We have added the `--full` argument to allow users to create a complete test set. ------------- Commit messages: - Random generate test Changes: https://git.openjdk.org/jdk/pull/21670/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21670&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339507 Stats: 85065 lines in 2 files changed: 926 ins; 83631 del; 508 mod Patch: https://git.openjdk.org/jdk/pull/21670.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21670/head:pull/21670 PR: https://git.openjdk.org/jdk/pull/21670 From dlong at openjdk.org Wed Oct 23 22:08:23 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 23 Oct 2024 22:08:23 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v7] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 01:44:42 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - remove blank line > - Merge master > - bail out on old methods > - redo VM state > - fix errors > - make sure to be in VM state when checking is_old > - simplification based on reviewer comments > - rename and restrict usage I'm still working on this. It's surprisingly tricky to add new bailout failure points. We also have to add new bailout check locations so we don't try to continue while in an inconsistent state (same issues as exception handling). I'm simulating redefined methods randomly (similar to C2 StressBailout and fail_randomly()) to stress test it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2420337538 From dlong at openjdk.org Wed Oct 23 22:08:21 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 23 Oct 2024 22:08:21 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v8] In-Reply-To: References: Message-ID: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request incrementally with two additional commits since the last revision: - add missing bailout checks - C1 fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21148/files - new: https://git.openjdk.org/jdk/pull/21148/files/80024872..fb3308c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=06-07 Stats: 42 lines in 8 files changed: 32 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From kvn at openjdk.org Wed Oct 23 22:54:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 23 Oct 2024 22:54:07 GMT Subject: RFR: 8342862: JDK-8339507 appears to be causing 8GB build machines to hang In-Reply-To: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> Message-ID: On Wed, 23 Oct 2024 21:35:45 GMT, hanklo6 wrote: > We generate each test randomly. The `asmtest.out.h` is currently around 100 KB. We have added the `--full` argument to allow users to create a complete test set. Please, update instructions comment in `test_assemblerx86.cpp`. What subset of instructions will be generated by default? Did you compare compilation time between files with default and "full" sets of instructions. Please, change title of the bug in JBS in this PR to: "Gtest added by 8339507 appears to be causing 8GB build machines to hang" It seems "JDK-8339507" at the beginning of title confuses our tools/bots to where add PR link and may cause issues later too. ------------- PR Review: https://git.openjdk.org/jdk/pull/21670#pullrequestreview-2390803885 PR Comment: https://git.openjdk.org/jdk/pull/21670#issuecomment-2433667803 From kvn at openjdk.org Wed Oct 23 22:57:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 23 Oct 2024 22:57:05 GMT Subject: RFR: 8342862: JDK-8339507 appears to be causing 8GB build machines to hang In-Reply-To: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> Message-ID: On Wed, 23 Oct 2024 21:35:45 GMT, hanklo6 wrote: > We generate each test randomly. The `asmtest.out.h` is currently around 100 KB. We have added the `--full` argument to allow users to create a complete test set. What if we split it into several files? Will combined compilation time be different? At least it will not consume a lot of memory and page swapping. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21670#issuecomment-2433679841 From duke at openjdk.org Wed Oct 23 23:43:04 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 23 Oct 2024 23:43:04 GMT Subject: RFR: 8342862: JDK-8339507 appears to be causing 8GB build machines to hang In-Reply-To: References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> Message-ID: <0h26nRNz_8tvJyMEMHg_8YxtSGlRt9klBHoQAhyWMtQ=.8cd8332e-b260-441a-8d93-a967e9225b17@github.com> On Wed, 23 Oct 2024 22:50:59 GMT, Vladimir Kozlov wrote: > Please, update instructions comment in `test_assemblerx86.cpp`. > > What subset of instructions will be generated by default? Did you compare compilation time between files with default and "full" sets of instructions. @vnkozlov The default test set includes the original map0/map1 instructions with random registers, address, and immediate. I tested on my machine (i9-13900k with 32cores), with the compilation time being 10s for the default test and 30s for the full test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21670#issuecomment-2433786958 From duke at openjdk.org Wed Oct 23 23:49:19 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 23 Oct 2024 23:49:19 GMT Subject: RFR: 8342862: Gtest added by 8339507 appears to be causing 8GB build machines to hang [v2] In-Reply-To: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> Message-ID: > We generate each test randomly. The `asmtest.out.h` is currently around 100 KB. We have added the `--full` argument to allow users to create a complete test set. hanklo6 has updated the pull request incrementally with one additional commit since the last revision: add instruction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21670/files - new: https://git.openjdk.org/jdk/pull/21670/files/dae652eb..e79a5a65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21670&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21670&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21670.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21670/head:pull/21670 PR: https://git.openjdk.org/jdk/pull/21670 From sviswanathan at openjdk.org Wed Oct 23 23:54:05 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 23 Oct 2024 23:54:05 GMT Subject: RFR: 8342862: Gtest added by 8339507 appears to be causing 8GB build machines to hang [v2] In-Reply-To: References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> Message-ID: <-FQtibOVa7FoWLm6d5SdOBp_8d2OkSgO8B9m3bBBNu8=.dff98610-09f1-402b-aff4-d5511d2440e0@github.com> On Wed, 23 Oct 2024 23:49:19 GMT, hanklo6 wrote: >> We generate each test randomly. The `asmtest.out.h` is currently around 100 KB. We have added the `--full` argument to allow users to create a complete test set. > > hanklo6 has updated the pull request incrementally with one additional commit since the last revision: > > add instruction Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21670#pullrequestreview-2390882379 From duke at openjdk.org Wed Oct 23 23:54:06 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 23 Oct 2024 23:54:06 GMT Subject: RFR: 8342862: Gtest added by 8339507 appears to be causing 8GB build machines to hang In-Reply-To: References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> Message-ID: On Wed, 23 Oct 2024 22:54:11 GMT, Vladimir Kozlov wrote: > What if we split it into several files? Will combined compilation time be different? At least it will not consume a lot of memory and page swapping. But splitting into several files still results in large code size. We are trying to reduce the code size as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21670#issuecomment-2433796285 From fyang at openjdk.org Thu Oct 24 00:14:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 24 Oct 2024 00:14:04 GMT Subject: RFR: 8342884: RISC-V: verify float <--> float16 conversion In-Reply-To: <3STkw_TrCn1ib_wPr4NlZxahYim-9hVafE_4g7gZ8WQ=.8ad2e319-440d-43f1-b091-1b603be28e09@github.com> References: <3STkw_TrCn1ib_wPr4NlZxahYim-9hVafE_4g7gZ8WQ=.8ad2e319-440d-43f1-b091-1b603be28e09@github.com> Message-ID: On Wed, 23 Oct 2024 13:22:32 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? it removes the `Experimental` of the `UseZvfh`. > Thanks > > Currently, only float <--> float16 conversions use Zvfh extension, I've run the jmh tests on bananapi, the performance result shows it's good. > > Benchmark-XX:+UseZfh -XX:+UnlockExperimentalVMOptions -XX:+/-UseZvfh | (size) | Mode | Cnt | Score -intrinsic | Score +intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- > Fp16ConversionBenchmark.float16ToFloat | 2048 | avgt | 10 | 8129.72 | 4729.125 | 71.937 | ns/op | 1.719 > Fp16ConversionBenchmark.float16ToFloatMemory | 2048 | avgt | 10 | 16.9 | 16.894 | 0.002 | ns/op | 1 > Fp16ConversionBenchmark.floatToFloat16 | 2048 | avgt | 10 | 12561.962 | 3767.944 | 12.652 | ns/op | 3.334 > Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | avgt | 10 | 18.146 | 18.147 | 0.003 | ns/op | 1 > > Thanks for carrying the verification. src/hotspot/cpu/riscv/globals_riscv.hpp line 119: > 117: "Use Zihintpause instructions") \ > 118: product(bool, UseZvbb, false, EXPERIMENTAL, "Use Zvbb instructions") \ > 119: product(bool, UseZvfh, false, "Use Zvfh instructions") \ Nit: Better to move it out of the `EXPERIMENTAL` group. Maybe immediately after the line for `UseZfh`. ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21664#pullrequestreview-2390903644 PR Review Comment: https://git.openjdk.org/jdk/pull/21664#discussion_r1814025526 From jkarthikeyan at openjdk.org Thu Oct 24 01:26:10 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 24 Oct 2024 01:26:10 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: > Hi all, > This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. > > In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Address some changes from code review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21599/files - new: https://git.openjdk.org/jdk/pull/21599/files/0ce85525..7fbc4509 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=00-01 Stats: 27 lines in 9 files changed: 12 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/21599.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21599/head:pull/21599 PR: https://git.openjdk.org/jdk/pull/21599 From jkarthikeyan at openjdk.org Thu Oct 24 01:26:11 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 24 Oct 2024 01:26:11 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 06:12:01 GMT, Quan Anh Mai wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Address some changes from code review > > src/hotspot/share/opto/compile.cpp line 2464: > >> 2462: { >> 2463: TracePhase tp("lower", &timers[_t_lower]); >> 2464: print_method(PHASE_BEFORE_LOWERING, 3); > > Isn't `BEFORE_LOWERING` the same as `AFTER_BARRIER_EXPANSION` right above? This is a fair point, I added it originally since macro expansion also printed before and after but I see now that there is some extra logic that isn't covered before macro expansion. I think it makes sense to remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1814089501 From dholmes at openjdk.org Thu Oct 24 01:28:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Oct 2024 01:28:05 GMT Subject: RFR: 8342862: Gtest added by 8339507 appears to be causing 8GB build machines to hang [v2] In-Reply-To: References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> Message-ID: <12ewVvv9YdQdrofmAJ373PX579jDG9WzRLH64HKXPRo=.8226793f-cf41-4bc9-88be-1eac46ed263b@github.com> On Wed, 23 Oct 2024 23:49:19 GMT, hanklo6 wrote: >> We generate each test randomly. The `asmtest.out.h` is currently around 100 KB. We have added the `--full` argument to allow users to create a complete test set. > > hanklo6 has updated the pull request incrementally with one additional commit since the last revision: > > add instruction What if you split into separate test files (.cpp) as well so that rather than one gtest that checks everything you have several smaller ones. That will reduce compilation overhead as required and also allow full coverage, potentially running tests in parallel. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21670#issuecomment-2434013037 From jbhateja at openjdk.org Thu Oct 24 02:11:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Oct 2024 02:11:12 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> Message-ID: On Wed, 23 Oct 2024 17:39:22 GMT, Quan Anh Mai wrote: >> Because lowering is a transformation that increases the complexity of the graph. >> >> - A `d = ExtractD(z, 4)` expanded into `x = VectorExtract(z, 2); d = ExtractD(x, 0)` increases the number of nodes by 1. >> - A logic cone transformed into a `MacroLogicV` introduces another kind of node that may not be recognized by other nodes. >> >> As a result, we should do this as the last step when other transformation has finished their jobs. For the concerns regarding loop body size, we still have a function in `Matcher` for that purpose. > > Another reason is that lowering being done late allows us to have more freedom to break some invariants of the nodes, such as looking through `VectorReinterpret`. An example is this (really crafted) case: > > Int256Vector v; > int a = v.lane(5); > float b = v.reinterpretAsFloats().lane(7); > > This would be transformed into: > > vector v; > vector v0 = VectorExtract(v, 1); > int a = ExtractI(v0, 1); > vector v1 = VectorReinterpret(v, ); > vector v2 = VectorExtract(v1, 1); > float b = ExtractF(v2, 3); > > By allowing lowering to look through `VectorReinterpret` and break the invariant of `Extract` nodes that the element types of their inputs and outputs must be the same, we can `gvn` `v1` and `v`, `v2` and `v0`. Simplify the graph: > > vector v; > vector v0 = VectorExtract(v, 1); > int a = ExtractI(v0, 1); > float b = ExtractF(v0, 3); > Because lowering is a transformation that increases the complexity of the graph. > > * A `d = ExtractD(z, 4)` expanded into `x = VectorExtract(z, 2); d = ExtractD(x, 0)` increases the number of nodes by 1. > * A logic cone transformed into a `MacroLogicV` introduces another kind of node that may not be recognized by other nodes. > > As a result, we should do this as the last step when other transformation has finished their jobs. For the concerns regarding loop body size, we still have a function in `Matcher` for that purpose. Yes, you rightly pointed out, given the fact that lowering in some cases may significantly impact the graph shape it should be accounted by loop optimizations. Unrolling decisions are based on loop body size and a rudimentary cost model e.g. macro logic optimization which folds entire logic tree into one x86 specific lowered IR should promote unrolling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1814134951 From jkarthikeyan at openjdk.org Thu Oct 24 02:11:13 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 24 Oct 2024 02:11:13 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 06:11:26 GMT, Quan Anh Mai wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Address some changes from code review > > src/hotspot/share/opto/phaseX.cpp line 2277: > >> 2275: >> 2276: // Try to find an existing version of the same node >> 2277: Node* existing = _igvn->hash_find_insert(n); > > I think it would be easier if you have a switch in `gvn` that says you passed the point of doing `Ideal`, moving forward you will probably want to have a `IdealLowering` to transform nodes during this phase. `Identity` I think is fine since it returns an existing node. Ah, do you mean having a method in `Node` that holds the lowering code? I was originally planning on doing it this way, but I think it would pose some problems where certain nodes' `Lower()` methods would only be overridden on certain backends, which would become messy. One of my goals was to keep the lowering code separate from shared code, so new lowerings could be implemented by just updating the main `lower_node` function in the backend. About GVN, I think it makes sense to do it in a separate phase because GVN is used quite generally whereas lowering is only done once. Since the `transform_old` function in IGVN is pretty complex as well, I think it's simpler to just implement `Value()` and GVN separately. Thinking on it more I think Identity is probably a good idea too, since as you mention it can't introduce new nodes into the graph. Mainly I wanted to avoid the case where `Ideal()` could fold a lowered graph back into the original form, causing an infinite loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1814136201 From jkarthikeyan at openjdk.org Thu Oct 24 02:17:12 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 24 Oct 2024 02:17:12 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:55:39 GMT, Magnus Ihse Bursie wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Address some changes from code review > > Build changes look good (but would be slightly better without the extra blank line). I have not reviewed the actual hotspot changes. Thanks for looking at the build changes, @magicus! I've pushed a commit that removes the extra newline in the makefiles and adds newlines to the ends of files that were missing them. Thanks for taking a look as well, @merykitty and @jatin-bhateja! I've pushed a commit that should address the code suggestions and left some comments as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2434090294 From jkarthikeyan at openjdk.org Thu Oct 24 02:17:13 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 24 Oct 2024 02:17:13 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> References: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> Message-ID: On Wed, 23 Oct 2024 07:57:05 GMT, Jatin Bhateja wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Address some changes from code review > > src/hotspot/share/opto/phaseX.cpp line 2301: > >> 2299: while(_igvn->_worklist.size() != 0) { >> 2300: Node* n = _igvn->_worklist.pop(); >> 2301: Node* new_node = lower_node(n); > > _PhaseLowring::lower_node_ may do complex transformation where by replacing a graph pallet rooted at current node by another pallet. For each newly created node in new pallet, it should make sure to either directly run _igvn.transform, thereby triggering Ideal / Identity / Value sub-passed over it, OR insert the node into _igvn.worklist for lazy processing, in latter case you are consuming entire worklist after running over only Value transforms before existing the lowering phase. I think we shouldn't run `Ideal` on the graph, because there is a chance that it could undo the lowering changes that we just did. This gives lowering more freedom to change the graph in different ways that would otherwise be undone by Ideal. We run `Value` on all of the transformed nodes mainly so the types table is accurate, so we can call upon the type of any node during lowering. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1814139289 From sviswanathan at openjdk.org Thu Oct 24 02:44:05 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 24 Oct 2024 02:44:05 GMT Subject: RFR: 8342862: Gtest added by 8339507 appears to be causing 8GB build machines to hang [v2] In-Reply-To: <12ewVvv9YdQdrofmAJ373PX579jDG9WzRLH64HKXPRo=.8226793f-cf41-4bc9-88be-1eac46ed263b@github.com> References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> <12ewVvv9YdQdrofmAJ373PX579jDG9WzRLH64HKXPRo=.8226793f-cf41-4bc9-88be-1eac46ed263b@github.com> Message-ID: On Thu, 24 Oct 2024 01:25:43 GMT, David Holmes wrote: > What if you split into separate test files (.cpp) as well so that rather than one gtest that checks everything you have several smaller ones. That will reduce compilation overhead as required and also allow full coverage, potentially running tests in parallel. Do we really want that? Just to put things in perspective the original file size was 45963 lines and the reduced one is now 1293 lines so we are talking about 35 such files. What Hank has done in this PR is to test each instruction with random register, address, and immediate instead of all the combinations that he had before so we still have a decent coverage. The test generator tool continues to have the ability to generate the full combination optionally. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21670#issuecomment-2434122058 From jbhateja at openjdk.org Thu Oct 24 03:13:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Oct 2024 03:13:09 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> Message-ID: On Thu, 24 Oct 2024 02:13:20 GMT, Jasmine Karthikeyan wrote: > I think we shouldn't run `Ideal` on the graph, because there is a chance that it could undo the lowering changes that we just did. This gives lowering more freedom to change the graph in different ways that would otherwise be undone by Ideal. We run `Value` on all of the transformed nodes mainly so the types table is accurate, so we can call upon the type of any node during lowering. We need to preserve lowering canonicalizations, but lowered graph is susceptible to further idealizations, else we could do lowering during final graph reshaping just before matching. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1814176800 From dhanalla at openjdk.org Thu Oct 24 04:07:46 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Thu, 24 Oct 2024 04:07:46 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v5] In-Reply-To: References: Message-ID: <70xY08rC3WcSOn0RRE1aqe6fgnJ80RGMV42pDYaLFR8=.8dff8f92-3a93-4a72-bf7a-a07964f70f4b@github.com> > In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. > > When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. > > The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: bailout EA instead of C2 compilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20504/files - new: https://git.openjdk.org/jdk/pull/20504/files/8f9cd174..324c3ee9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=03-04 Stats: 12 lines in 2 files changed: 10 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20504/head:pull/20504 PR: https://git.openjdk.org/jdk/pull/20504 From dhanalla at openjdk.org Thu Oct 24 04:10:06 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Thu, 24 Oct 2024 04:10:06 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v4] In-Reply-To: References: Message-ID: <4oWQ5tScx2i8xp1XO-q7R-SczbUZT_Klq757GyFkmlY=.2907afac-3872-4ff6-88e0-2a05144ff21b@github.com> On Wed, 23 Oct 2024 07:14:45 GMT, Christian Hagedorn wrote: >> Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: >> >> change CRLF to LF > > src/hotspot/share/opto/macro.cpp line 821: > >> 819: // If scalarize operation is adding too many nodes, bail out >> 820: if (C->check_node_count(300, "out of nodes while scalarizing object")) { >> 821: return nullptr; > > Would a bailout from this scalarization be enough or do we really require to record the method as non-compilable (which is done with `check_node_count()`? In the latter case, we could also try something like "recompilation without EA" as done, for example, here (i.e. `retry_no_escape_analysis`): > > https://github.com/openjdk/jdk/blob/37cfaa8deb4cc15864bb6dc2c8a87fc97cff2f0d/src/hotspot/share/opto/escape.cpp#L3858-L3866 > > I also suggest to use the `NodeLimitFudgeFactor` instead of `300` to have it controllable. Thank you for your suggestion @chhagedorn. I agree that 'recompilation without EA' makes more sense, and I have made the necessary changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1814214574 From dhanalla at openjdk.org Thu Oct 24 04:14:47 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Thu, 24 Oct 2024 04:14:47 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v6] In-Reply-To: References: Message-ID: > In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. > > When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. > > The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: fix trailing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20504/files - new: https://git.openjdk.org/jdk/pull/20504/files/324c3ee9..4c444f10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20504/head:pull/20504 PR: https://git.openjdk.org/jdk/pull/20504 From epeter at openjdk.org Thu Oct 24 05:48:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 05:48:12 GMT Subject: Integrated: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java In-Reply-To: References: Message-ID: <5HR6OoMIHq95cUDBQDAqNjH8J5inzSTchm7ZQpwfFRA=.f5cec3bb-aae6-476a-a88e-c55de0a7060c@github.com> On Wed, 16 Oct 2024 14:35:13 GMT, Emanuel Peter wrote: > I want to refactor `TestDependencyOffsets.java` using the `CompileFramework`. > > Reasons: > - I soon need to modify the IR rules in this test soon anyway (https://github.com/openjdk/jdk/pull/21521), and so a refactor might be good to do first. > - The generator script used to be a `generator.py`, stored not on `git` but `JBS`. Not great. Now we have it in Java code, and maintenance is easier. > - Since I first wrote that test, I have now introduced the `IRNode.VECTOR_SIZE`. This allows: > - Simplifying the logic for the IR rules (removed the `IRRange` and `IRBool`, and the `Platform`). > - Strengthening the rules. > - I was able to add `random` offsets. This means we have better coverage, and do not rely on just hand-crafted values. > > I extensively use `String.format` and `StringBuilder`... would be nicer to have string-templates but they don't exist yet. > > Recommendation for review: the old file was huge. Finding the new code in the diff can be hard. I would start by only reading the new file. > > Ah. And about runtime of the test. On my machine I get this (in ms): > > Generate: 27 > Compile: 5845 > Run: 23435 > > Test generation is negligible. 6sec on compilation, 20+sec on execution. I think that is an ok balance, at least we can say that generation and compilation only take about 1/6 of total time - an acceptable overhead I would say. This pull request has now been integrated. Changeset: e96b4cf0 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/e96b4cf0a81914c6a615bb4f62ea3f139a4737f3 Stats: 15659 lines in 1 file changed: 172 ins; 15167 del; 320 mod 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/21541 From epeter at openjdk.org Thu Oct 24 05:48:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 05:48:12 GMT Subject: RFR: 8342387: C2 SuperWord: refactor and improve compiler/loopopts/superword/TestDependencyOffsets.java [v6] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 14:00:42 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> spelling and rm new bug number > > Still good. @chhagedorn @TobiHartmann thanks for the reviews ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21541#issuecomment-2434352287 From epeter at openjdk.org Thu Oct 24 06:05:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 06:05:10 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 15:32:05 GMT, Vladimir Kozlov wrote: >> Superword creates a `Replicate` node at a `ConvL2I` node and uses the >> type of the result of the `ConvL2I` to pick the type of the >> `Replicate` instead of the type of the input to the `ConvL2I`. > > src/hotspot/share/opto/superwordVTransformBuilder.cpp line 231: > >> 229: } else { >> 230: // Replicate the scalar same_input to every vector element. >> 231: BasicType element_type = p0->is_Convert() ? p0->in(1)->bottom_type()->basic_type() : _vloop_analyzer.types().velt_basic_type(p0); > > What vectors are generated (or not) with this change? The array in the test ins `int[]` but the element_type will be Long now. Will it bailout vectorization? @vnkozlov You can very easily see how it goes with my `Test4` above, I split the things onto different lines so we can see what is from where easily. The pack that `p0` belongs to is a `ConvL2I` pack. In my case, I have an `short[]`, just to make things even more interesting. Since the type is propagated from use -> def, the output of the `ConvL2I` is interpreted as a `short`, it is essentially a truncated `int`. `velt_basic_type(p0) == T_SHORT`. The vector node should be a `VectorCastL2X === _ 873 [[ ]] #vectors`, i.e. casting from long-vector to short-vector. But now we see that the input to the pack of `p0` is all the same, and so we want to introduce a `Replicate`. We should of course replicate for `long`. But `velt_basic_type(p0) == T_SHORT` - so you get a `Replicate === _ 717 [[ ]] #vectorx`, and then eventually a `VectorCastS2X === _ 890 [[ ]] #vectors`... but of course the AD file has no matching node for a VectorCast from short to short -> `bad AD file`. The issue is really that `velt_basic_type(p0)` gives us the output-type, but we actually would need the input-type. In almost all cases input-type == output-type. But of course that does not hold with Convert. With Roland's fix, we now ask for the output-type of the `ConvL2I`'s input. That is the same as asking for the `ConvL2I`'s input-type. That way, we know what type to Replicate for - the `element_type`. @rwestrel given that @vnkozlov also did not right away understand what is going on, I think you need to properly explain what happens in the comments ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21660#discussion_r1814315631 From epeter at openjdk.org Thu Oct 24 06:23:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 06:23:17 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:25:37 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Factor out IR tests and Transforms to follow-up PRs. src/hotspot/cpu/x86/x86.ad line 10725: > 10723: %} > 10724: > 10725: instruct vector_saturating_subword_mem(vec dst, vec src1, memory src2) Nit: above you always have `add` and `sub` in the name and the `format`. I and here and in some cases below not. Would be nice if it was consistent - would also make reading the OptoAssembly easier if one knows if it is an add or sub ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814331929 From thartmann at openjdk.org Thu Oct 24 06:25:06 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 24 Oct 2024 06:25:06 GMT Subject: RFR: 8342862: Gtest added by 8339507 appears to be causing 8GB build machines to hang [v2] In-Reply-To: References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> <12ewVvv9YdQdrofmAJ373PX579jDG9WzRLH64HKXPRo=.8226793f-cf41-4bc9-88be-1eac46ed263b@github.com> Message-ID: On Thu, 24 Oct 2024 02:41:54 GMT, Sandhya Viswanathan wrote: > What Hank has done in this PR is to test each instruction with random register, address, and immediate instead of all the combinations that he had before so we still have a decent coverage. The test generator tool continues to have the ability to generate the full combination optionally. I think that is good enough and preferable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21670#issuecomment-2434398111 From epeter at openjdk.org Thu Oct 24 06:34:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 06:34:14 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:25:37 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Factor out IR tests and Transforms to follow-up PRs. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 32: > 30: * @since 24 > 31: */ > 32: public final class VectorMath { I think this class could have been split into a separate RFE, together with its tests. I would prefer that next time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814340685 From epeter at openjdk.org Thu Oct 24 06:34:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 06:34:14 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 06:28:31 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Factor out IR tests and Transforms to follow-up PRs. > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 32: > >> 30: * @since 24 >> 31: */ >> 32: public final class VectorMath { > > I think this class could have been split into a separate RFE, together with its tests. I would prefer that next time. Also: why did we not add these `Long.minUnsigned` etc? I guess that was already discussed? Because we can easily also use this with the auto-vectorizer or more generally. Saturating and unsigned ops are generally useful I think... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814343654 From thartmann at openjdk.org Thu Oct 24 06:52:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 24 Oct 2024 06:52:04 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v3] In-Reply-To: <2FVz5G_WC9zJg2TGQ-MPmcWcSPgX5eVknTo58tm3TFI=.ca976a58-a49f-4030-9882-d52febeb82b1@github.com> References: <2FVz5G_WC9zJg2TGQ-MPmcWcSPgX5eVknTo58tm3TFI=.ca976a58-a49f-4030-9882-d52febeb82b1@github.com> Message-ID: On Wed, 23 Oct 2024 14:32:51 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > build fix More failures: compiler/loopopts/TestOverunrolling.java -XX:-TieredCompilation -XX:+AlwaysIncrementalInline # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/loopnode.cpp:6195), pid=2648018, tid=2648035 # assert(!had_error) failed: bad dominance # # JRE version: Java(TM) SE Runtime Environment (24.0) (fastdebug build 24-internal-2024-10-23-1151312.tobias.hartmann.jdk4) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-internal-2024-10-23-1151312.tobias.hartmann.jdk4, compiled mode, sharing, compressed oops, compressed class ptrs, parallel gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x12f07a7] PhaseIdealLoop::compute_lca_of_uses(Node*, Node*, bool)+0x927 Current CompileTask: C2:19645 2268 b compiler.loopopts.TestOverunrolling::test3 (89 bytes) Stack: [0x00007f59a4cee000,0x00007f59a4dee000], sp=0x00007f59a4de8ca0, free space=1003k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x12f07a7] PhaseIdealLoop::compute_lca_of_uses(Node*, Node*, bool)+0x927 (loopnode.cpp:6195) V [libjvm.so+0x12f0b78] PhaseIdealLoop::build_loop_late_post_work(Node*, bool)+0x1d8 (loopnode.cpp:6610) V [libjvm.so+0x12f1b20] PhaseIdealLoop::build_loop_late(VectorSet&, Node_List&, Node_Stack&)+0x190 (loopnode.cpp:6561) V [libjvm.so+0x12f2938] PhaseIdealLoop::build_and_optimize()+0x6d8 (loopnode.cpp:4974) V [libjvm.so+0xa356c5] PhaseIdealLoop::verify(PhaseIterGVN&)+0x3c5 (loopnode.hpp:1144) V [libjvm.so+0xa303b3] Compile::Optimize()+0x743 (compile.cpp:2397) V [libjvm.so+0xa34683] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b23 (compile.cpp:852) V [libjvm.so+0x87ee45] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142) V [libjvm.so+0xa40518] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x928 (compileBroker.cpp:2303) V [libjvm.so+0xa411a8] CompileBroker::compiler_thread_loop()+0x478 (compileBroker.cpp:1961) V [libjvm.so+0xef10fc] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:759) V [libjvm.so+0x181dad6] Thread::call_run()+0xb6 (thread.cpp:234) V [libjvm.so+0x14ff5b8] thread_native_entry(Thread*)+0x128 (os_linux.cpp:858) compiler/loopopts/superword/TestMemorySegment.java Failed IR Rules (6) of Methods (6) ---------------------------------- 1) Method "static java.lang.Object[] compiler.loopopts.superword.TestMemorySegmentImpl.testIntLoop_longIndex_intInvar_sameAdr_byte(java.lang.foreign.MemorySegment,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true"}, counts={"_#V#LOAD_VECTOR_B#_", "> 0", "_#V#ADD_VB#_", "> 0", "_#STORE_VECTOR#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(AddVB.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 3: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 2) Method "static java.lang.Object[] compiler.loopopts.superword.TestMemorySegmentImpl.testIntLoop_longIndex_intInvar_sameAdr_int(java.lang.foreign.MemorySegment,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true"}, counts={"_#V#LOAD_VECTOR_I#_", "> 0", "_#V#ADD_VI#_", "> 0", "_#STORE_VECTOR#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"AlignVector", "false"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(AddVI.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 3: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 3) Method "static java.lang.Object[] compiler.loopopts.superword.TestMemorySegmentImpl.testIntLoop_longIndex_longInvar_sameAdr_byte(java.lang.foreign.MemorySegment,long)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true"}, counts={"_#V#LOAD_VECTOR_B#_", "> 0", "_#V#ADD_VB#_", "> 0", "_#STORE_VECTOR#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(AddVB.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 3: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 4) Method "static java.lang.Object[] compiler.loopopts.superword.TestMemorySegmentImpl.testIntLoop_longIndex_longInvar_sameAdr_int(java.lang.foreign.MemorySegment,long)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true"}, counts={"_#V#LOAD_VECTOR_I#_", "> 0", "_#V#ADD_VI#_", "> 0", "_#STORE_VECTOR#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"AlignVector", "false"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(AddVI.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 3: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 5) Method "static java.lang.Object[] compiler.loopopts.superword.TestMemorySegmentImpl.testLongLoop_longIndex_intInvar_sameAdr_byte(java.lang.foreign.MemorySegment,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true"}, counts={"_#V#LOAD_VECTOR_B#_", "> 0", "_#V#ADD_VB#_", "> 0", "_#STORE_VECTOR#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(AddVB.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 3: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 6) Method "static java.lang.Object[] compiler.loopopts.superword.TestMemorySegmentImpl.testLongLoop_longIndex_longInvar_sameAdr_byte(java.lang.foreign.MemorySegment,long)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sse4.1", "true", "asimd", "true"}, counts={"_#V#LOAD_VECTOR_B#_", "> 0", "_#V#ADD_VB#_", "> 0", "_#STORE_VECTOR#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={"64-bit", "true"}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(AddVB.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 3: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! compiler/predicates/TestAssertionPredicateDoesntConstantFold.java -XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/loopopts.cpp:1739), pid=2300415, tid=2300431 # assert(!n->is_Store() && !n->is_LoadStore()) failed: no node with a side effect # # JRE version: Java(TM) SE Runtime Environment (24.0) (fastdebug build 24-internal-2024-10-23-1151312.tobias.hartmann.jdk4) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-internal-2024-10-23-1151312.tobias.hartmann.jdk4, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1300f90] PhaseIdealLoop::try_sink_out_of_loop(Node*) [clone .part.0]+0xbd0 Current CompileTask: C2:157 14 b TestAssertionPredicateDoesntConstantFold::test (69 bytes) Stack: [0x00007f502acee000,0x00007f502adee000], sp=0x00007f502ade8cf0, free space=1003k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1300f90] PhaseIdealLoop::try_sink_out_of_loop(Node*) [clone .part.0]+0xbd0 (loopopts.cpp:1739) V [libjvm.so+0x1301148] PhaseIdealLoop::split_if_with_blocks_post(Node*)+0x98 (loopopts.cpp:1706) V [libjvm.so+0x13019fa] PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x9a (loopopts.cpp:1986) V [libjvm.so+0x12f338b] PhaseIdealLoop::build_and_optimize()+0x112b (loopnode.cpp:5086) V [libjvm.so+0xa36958] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3a8 (loopnode.hpp:1129) V [libjvm.so+0xa2f9a4] Compile::optimize_loops(PhaseIterGVN&, LoopOptsMode)+0x74 (compile.cpp:2179) V [libjvm.so+0xa3072c] Compile::Optimize()+0xabc (compile.cpp:2426) V [libjvm.so+0xa34683] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b23 (compile.cpp:852) V [libjvm.so+0x87ee45] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142) V [libjvm.so+0xa40518] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x928 (compileBroker.cpp:2303) V [libjvm.so+0xa411a8] CompileBroker::compiler_thread_loop()+0x478 (compileBroker.cpp:1961) V [libjvm.so+0xef10fc] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:759) V [libjvm.so+0x181dad6] Thread::call_run()+0xb6 (thread.cpp:234) V [libjvm.so+0x14ff5b8] thread_native_entry(Thread*)+0x128 (os_linux.cpp:858) serviceability/sa/ClhsdbCDSCore.java -Duse.JTREG_TEST_THREAD_FACTORY=Virtual -XX:+UseZGC -XX:-ZGenerational -XX:-VerifyContinuations # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007effa39fa616, pid=1722549, tid=1722585 # # JRE version: Java(TM) SE Runtime Environment (24.0) (fastdebug build 24-internal-2024-10-23-1151312.tobias.hartmann.jdk4) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-internal-2024-10-23-1151312.tobias.hartmann.jdk4, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1874616] Unsafe_PutInt+0x106 Stack: [0x00007efd6e325000,0x00007efd6e425000], sp=0x00007efd6e422c60, free space=1015k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1874616] Unsafe_PutInt+0x106 (unsafe.cpp:251) j jdk.internal.misc.Unsafe.putInt(Ljava/lang/Object;JI)V+0 java.base at 24-internal j jdk.internal.misc.Unsafe.putInt(JI)V+4 java.base at 24-internal j CrashApp.main([Ljava/lang/String;)V+5 j java.lang.invoke.LambdaForm$DMH+0x00007efd024123e0.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;)V+10 java.base at 24-internal j java.lang.invoke.LambdaForm$MH+0x00007efd02414c10.invoke(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+33 java.base at 24-internal j java.lang.invoke.Invokers$Holder.invokeExact_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+20 java.base at 24-internal j jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+55 java.base at 24-internal j jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+23 java.base at 24-internal j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+102 java.base at 24-internal j jdk.test.lib.process.ProcessTools.lambda$main$0(Ljava/lang/reflect/Method;[Ljava/lang/String;Ljdk/test/lib/process/ProcessTools$MainThreadGroup;)V+10 j jdk.test.lib.process.ProcessTools$$Lambda+0x00007efd070016e0.run()V+12 j java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base at 24-internal j java.lang.VirtualThread.run(Ljava/lang/Runnable;)V+66 java.base at 24-internal j java.lang.VirtualThread$VThreadContinuation$1.run()V+8 java.base at 24-internal j jdk.internal.vm.Continuation.enter0()V+4 java.base at 24-internal j jdk.internal.vm.Continuation.enter(Ljdk/internal/vm/Continuation;Z)V+1 java.base at 24-internal J 96 jdk.internal.vm.Continuation.enterSpecial(Ljdk/internal/vm/Continuation;ZZ)V java.base at 24-internal (0 bytes) @ 0x00007eff8c2d9ea4 [0x00007eff8c2d9d40+0x0000000000000164] j jdk.internal.vm.Continuation.run()V+122 java.base at 24-internal j java.lang.VirtualThread.runContinuation()V+72 java.base at 24-internal j java.lang.VirtualThread$$Lambda+0x00007efd07047610.run()V+4 java.base at 24-internal j java.util.concurrent.ForkJoinTask$RunnableExecuteAction.compute()Ljava/lang/Void;+4 java.base at 24-internal [...] compiler/c2/irTests/TestLongRangeChecks.java -XX:UseAVX=0 -XX:UseSSE=2 Various IR verification failures compiler/escapeAnalysis/TestMissingAntiDependency.java -XX:StressLongCountedLoop=200000000 # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/gcm.cpp:904), pid=2322548, tid=2322560 # assert(use_mem_state != load->find_exact_control(load->in(0))) failed: dependence cycle found # # JRE version: Java(TM) SE Runtime Environment (24.0) (fastdebug build 24-internal-2024-10-23-1149160.tobias.hartmann.jdk4) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-internal-2024-10-23-1149160.tobias.hartmann.jdk4, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, parallel gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xdb3ffe] PhaseCFG::insert_anti_dependences(Block*, Node*, bool)+0x232e # Current CompileTask: C2:285 92 b 4 TestMissingAntiDependency::test (89 bytes) Stack: [0x00007f34579fb000,0x00007f3457afb000], sp=0x00007f3457af6690, free space=1005k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xdb3ffe] PhaseCFG::insert_anti_dependences(Block*, Node*, bool)+0x232e (gcm.cpp:904) V [libjvm.so+0xdba0b6] PhaseCFG::schedule_late(VectorSet&, Node_Stack&)+0xa16 (gcm.cpp:1521) V [libjvm.so+0xdbaa0f] PhaseCFG::global_code_motion()+0x3ef (gcm.cpp:1632) V [libjvm.so+0xdbd826] PhaseCFG::do_global_code_motion()+0x66 (gcm.cpp:1755) V [libjvm.so+0xa318d3] Compile::Code_Gen()+0x3c3 (compile.cpp:2960) V [libjvm.so+0xa347a0] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1c40 (compile.cpp:885) V [libjvm.so+0x87ee45] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142) V [libjvm.so+0xa40518] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x928 (compileBroker.cpp:2303) V [libjvm.so+0xa411a8] CompileBroker::compiler_thread_loop()+0x478 (compileBroker.cpp:1961) V [libjvm.so+0xef10fc] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:759) V [libjvm.so+0x181dad6] Thread::call_run()+0xb6 (thread.cpp:234) V [libjvm.so+0x14ff5b8] thread_native_entry(Thread*)+0x128 (os_linux.cpp:858) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2434438932 From epeter at openjdk.org Thu Oct 24 06:58:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 06:58:18 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:25:37 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Factor out IR tests and Transforms to follow-up PRs. Ok, now it looks better. I would still have preferred the `VectorMath` class with its test to be in a separate PR. Please do that next time. I have a few more questions / comments, but we are really close now. I'm especially worried about only testing a few constant values - we should try to go for better coverage - of all values if possible. src/hotspot/cpu/x86/x86.ad line 10790: > 10788: predicate(is_subword_type(Matcher::vector_element_basic_type(n)) && > 10789: n->is_SaturatingVector() && !n->as_SaturatingVector()->is_unsigned()); > 10790: match(Set dst (SaturatingAddV (Binary dst (LoadVector src)) mask)); Do equivalent store operations exist we could also match for? test/jdk/jdk/incubator/vector/VectorMathTest.java line 70: > 68: public static short[] INPUT_SS = {Short.MIN_VALUE, (short)(Short.MIN_VALUE + TEN_S), ZERO_S, (short)(Short.MAX_VALUE - TEN_S), Short.MAX_VALUE}; > 69: public static int[] INPUT_SI = {Integer.MIN_VALUE, (Integer.MIN_VALUE + TEN_I), ZERO_I, Integer.MAX_VALUE - TEN_I, Integer.MAX_VALUE}; > 70: public static long[] INPUT_SL = {Long.MIN_VALUE, Long.MIN_VALUE + TEN_L, ZERO_L, Long.MAX_VALUE - TEN_L, Long.MAX_VALUE}; Ok, now we have 4 or 5 hand-crafted examples. Is that sufficient? Some random values would be nice, then we know that at least eventually we have full coverage. test/jdk/jdk/incubator/vector/templates/Kernel-SaturatingBinary-Masked-op.template line 8: > 6: > 7: for (int ic = 0; ic < INVOC_COUNT; ic++) { > 8: for (int i = 0; i < a.length; i += SPECIES.length()) { I think this does not check if the generated vectors are too long. We had bugs in the past where we should have created say 2-element vectors, but the backend wrongly created 4-element vectors. This is especially an issue with vectors that do direct memory access. With a simple "counting-up" test, you will probably not catch this. It could be good to have a "counting-down" example as well. What do you think? test/jdk/jdk/incubator/vector/templates/Unit-header.template line 1244: > 1242: return fill(s * BUFFER_REPS, > 1243: i -> ($type$)($Boxtype$.MIN_VALUE + 100)); > 1244: }) Not sure I see this right. But are we only providing these 4 constants as inputs, and all values in the input arrays will be identical? If that is true: we should have some random inputs, or at least dependent on `i`. ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2391445047 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814366518 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814355931 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814363960 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814371488 From epeter at openjdk.org Thu Oct 24 06:58:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 06:58:18 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 06:44:36 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Factor out IR tests and Transforms to follow-up PRs. > > test/jdk/jdk/incubator/vector/templates/Kernel-SaturatingBinary-Masked-op.template line 8: > >> 6: >> 7: for (int ic = 0; ic < INVOC_COUNT; ic++) { >> 8: for (int i = 0; i < a.length; i += SPECIES.length()) { > > I think this does not check if the generated vectors are too long. We had bugs in the past where we should have created say 2-element vectors, but the backend wrongly created 4-element vectors. This is especially an issue with vectors that do direct memory access. > > With a simple "counting-up" test, you will probably not catch this. It could be good to have a "counting-down" example as well. What do you think? Also: all of these cases load, and directly store again. Does that not mean all tests will probably pick the "..._mem" backend operations? Or do we actually end up testing all backend operations with the tests we have here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814368049 From roland at openjdk.org Thu Oct 24 07:46:04 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 24 Oct 2024 07:46:04 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v3] In-Reply-To: References: <2FVz5G_WC9zJg2TGQ-MPmcWcSPgX5eVknTo58tm3TFI=.ca976a58-a49f-4030-9882-d52febeb82b1@github.com> Message-ID: On Thu, 24 Oct 2024 06:49:25 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> build fix > > More failures: > > > compiler/loopopts/TestOverunrolling.java > -XX:-TieredCompilation -XX:+AlwaysIncrementalInline > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/workspace/open/src/hotspot/share/opto/loopnode.cpp:6195), pid=2648018, tid=2648035 > # assert(!had_error) failed: bad dominance > # > # JRE version: Java(TM) SE Runtime Environment (24.0) (fastdebug build 24-internal-2024-10-23-1151312.tobias.hartmann.jdk4) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-internal-2024-10-23-1151312.tobias.hartmann.jdk4, compiled mode, sharing, compressed oops, compressed class ptrs, parallel gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x12f07a7] PhaseIdealLoop::compute_lca_of_uses(Node*, Node*, bool)+0x927 > > Current CompileTask: > C2:19645 2268 b compiler.loopopts.TestOverunrolling::test3 (89 bytes) > > Stack: [0x00007f59a4cee000,0x00007f59a4dee000], sp=0x00007f59a4de8ca0, free space=1003k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x12f07a7] PhaseIdealLoop::compute_lca_of_uses(Node*, Node*, bool)+0x927 (loopnode.cpp:6195) > V [libjvm.so+0x12f0b78] PhaseIdealLoop::build_loop_late_post_work(Node*, bool)+0x1d8 (loopnode.cpp:6610) > V [libjvm.so+0x12f1b20] PhaseIdealLoop::build_loop_late(VectorSet&, Node_List&, Node_Stack&)+0x190 (loopnode.cpp:6561) > V [libjvm.so+0x12f2938] PhaseIdealLoop::build_and_optimize()+0x6d8 (loopnode.cpp:4974) > V [libjvm.so+0xa356c5] PhaseIdealLoop::verify(PhaseIterGVN&)+0x3c5 (loopnode.hpp:1144) > V [libjvm.so+0xa303b3] Compile::Optimize()+0x743 (compile.cpp:2397) > V [libjvm.so+0xa34683] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b23 (compile.cpp:852) > V [libjvm.so+0x87ee45] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142) > V [libjvm.so+0xa40518] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x928 (compileBroker.cpp:2303) > V [libjvm.so+0xa411a8] CompileBroker::compiler_thread_loop()+0x478 (compileBroker.cpp:1961) > V [libjvm.so+0xef10fc] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:759) > V [libjvm.so+0x181dad6] Thread::call_run()+0xb6 (thread.cpp:234) > V [libjvm.so+0x14ff5b8] thread_native_entry(Thread*)+0x128 (os_linux.cpp:858) > > > > compiler/loopopts/superword/TestMemorySegment.java > > Failed IR Rules (6) of Methods (6) > ---------------------------------- > 1) Method "static java.lang.Object[] compiler.loopopts.superword.Tes... Thanks @TobiHartmann for the test results. For the failure in compiler/escapeAnalysis/TestMissingAntiDependency.java I already filed JDK-8341976. I will work on the other ones. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2434534573 From aph at openjdk.org Thu Oct 24 08:22:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 24 Oct 2024 08:22:17 GMT Subject: RFR: 8342601: AArch64: Micro-optimize bit shift in copy_memory [v4] In-Reply-To: <50saqCLba9Bx2oSGvZdJNsfjc8ZaOtDSMWIDkqP8QSA=.7658d698-3a89-484f-b977-0694eae251ba@github.com> References: <_xnBkuPuEpz1gPJrDt27YWw_eb2DVv9gsOaj2leLcHg=.78c538cb-b0eb-4fda-a9ef-c64159c994be@github.com> <50saqCLba9Bx2oSGvZdJNsfjc8ZaOtDSMWIDkqP8QSA=.7658d698-3a89-484f-b977-0694eae251ba@github.com> Message-ID: On Mon, 21 Oct 2024 21:22:57 GMT, Chad Rakoczy wrote: >> [JDK-8342601](https://bugs.openjdk.org/browse/JDK-8342601) >> >> Fix minor inefficiency in `copy_memory` by adding check before doing bit shift to see if we are able to do a move instruction instead. Change is low risk because of the low complexity of the change >> >> Ran array copy and tier 1 on aarch64 machine >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/arraycopy 49 49 0 0 >> ============================== >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2591 2591 0 0 >> jtreg:test/jdk:tier1 2436 2436 0 0 >> jtreg:test/langtools:tier1 4577 4577 0 0 >> jtreg:test/jaxp:tier1 0 0 0 0 >> jtreg:test/lib-test:tier1 34 34 0 0 >> ============================== > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Thought about your thoughts: Just to be clear: `ORR` isn't special cased by the hardware, `MOV` is. The front end has logic to recognize just the bit pattern that corresponds to a register-register `MOV`. Re Possible Lesson 1, I guess it would be sufficient to say "`// Take advantage of zero-latency MOVs if we can`". Re Possible Lesson 2. I've been eager to push back against special-case tweaks for individual microarchitectures, on the grounds that it'll mess up the AArch64 port, leading to complexity that is hard to justify. Having said that, there are not many companies designing AArch64 cores, and the optimizations they do are fairly similar, some more advanced than others but all going in the same general direction. So we can usually simply do the optimization for all, and no one is hurt by that. Re optimizations in MacroAssembler. We already have quite a few, and they are very useful. The most successful ones have been load/store instruction fusion to `LDP`/`STP` and memory fence fusion. The latter is a significant performance gain in real-world benchmarks. Because register-register `MOV` is already a macro rather than an instruction, we've generated nothing for `MOV Rx, Rx` since the beginning. Where we really need to generate a certain instruction, we'll use an explicit call to `Assembler::`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21589#issuecomment-2434607576 From mli at openjdk.org Thu Oct 24 09:12:52 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 24 Oct 2024 09:12:52 GMT Subject: RFR: 8342884: RISC-V: verify float <--> float16 conversion [v2] In-Reply-To: <3STkw_TrCn1ib_wPr4NlZxahYim-9hVafE_4g7gZ8WQ=.8ad2e319-440d-43f1-b091-1b603be28e09@github.com> References: <3STkw_TrCn1ib_wPr4NlZxahYim-9hVafE_4g7gZ8WQ=.8ad2e319-440d-43f1-b091-1b603be28e09@github.com> Message-ID: <1m79HSCmGtubgtikAcVZxe0R301KXln-tX9asygp1rg=.55455efd-e061-45d4-9734-990725250d4c@github.com> > Hi, > Can you help to review this simple patch? it removes the `Experimental` of the `UseZvfh`. > Thanks > > Currently, only float <--> float16 conversions use Zvfh extension, I've run the jmh tests on bananapi, the performance result shows it's good. > > Benchmark-XX:+UseZfh -XX:+UnlockExperimentalVMOptions -XX:+/-UseZvfh | (size) | Mode | Cnt | Score -intrinsic | Score +intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- > Fp16ConversionBenchmark.float16ToFloat | 2048 | avgt | 10 | 8129.72 | 4729.125 | 71.937 | ns/op | 1.719 > Fp16ConversionBenchmark.float16ToFloatMemory | 2048 | avgt | 10 | 16.9 | 16.894 | 0.002 | ns/op | 1 > Fp16ConversionBenchmark.floatToFloat16 | 2048 | avgt | 10 | 12561.962 | 3767.944 | 12.652 | ns/op | 3.334 > Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | avgt | 10 | 18.146 | 18.147 | 0.003 | ns/op | 1 > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: hw probe ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21664/files - new: https://git.openjdk.org/jdk/pull/21664/files/580d3879..944dacdd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21664&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21664&range=00-01 Stats: 5 lines in 2 files changed: 4 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21664.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21664/head:pull/21664 PR: https://git.openjdk.org/jdk/pull/21664 From mli at openjdk.org Thu Oct 24 09:12:52 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 24 Oct 2024 09:12:52 GMT Subject: RFR: 8342884: RISC-V: verify float <--> float16 conversion [v2] In-Reply-To: References: <3STkw_TrCn1ib_wPr4NlZxahYim-9hVafE_4g7gZ8WQ=.8ad2e319-440d-43f1-b091-1b603be28e09@github.com> Message-ID: On Thu, 24 Oct 2024 00:11:21 GMT, Fei Yang wrote: > Nit: Better to move it out of the `EXPERIMENTAL` group. Maybe immediately after the line for `UseZfh`. > Currently the pattern is ordering by name, so I'd prefer to keep it. :) > (Seems we should also auto-enable this feature through hwprobe like other non-experimental flags. I see macros `RISCV_HWPROBE_EXT_ZVFH` and `RISCV_HWPROBE_EXT_ZVFHMIN` in file os_cpu/linux_riscv/riscv_hwprobe.cpp, but neither of them are checked in function `RiscvHwprobe::add_features_from_query_result`) Thanks, add test of RISCV_HWPROBE_EXT_ZVFH, but not RISCV_HWPROBE_EXT_ZVFHMIN, as we don't have the corresponding define and usage ext_Zvfhmin and so on. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21664#discussion_r1814592635 From fyang at openjdk.org Thu Oct 24 09:24:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 24 Oct 2024 09:24:05 GMT Subject: RFR: 8342884: RISC-V: verify float <--> float16 conversion [v2] In-Reply-To: <1m79HSCmGtubgtikAcVZxe0R301KXln-tX9asygp1rg=.55455efd-e061-45d4-9734-990725250d4c@github.com> References: <3STkw_TrCn1ib_wPr4NlZxahYim-9hVafE_4g7gZ8WQ=.8ad2e319-440d-43f1-b091-1b603be28e09@github.com> <1m79HSCmGtubgtikAcVZxe0R301KXln-tX9asygp1rg=.55455efd-e061-45d4-9734-990725250d4c@github.com> Message-ID: On Thu, 24 Oct 2024 09:12:52 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? it removes the `Experimental` of the `UseZvfh`. >> Thanks >> >> Currently, only float <--> float16 conversions use Zvfh extension, I've run the jmh tests on bananapi, the performance result shows it's good. >> >> Benchmark-XX:+UseZfh -XX:+UnlockExperimentalVMOptions -XX:+/-UseZvfh | (size) | Mode | Cnt | Score -intrinsic | Score +intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Fp16ConversionBenchmark.float16ToFloat | 2048 | avgt | 10 | 8129.72 | 4729.125 | 71.937 | ns/op | 1.719 >> Fp16ConversionBenchmark.float16ToFloatMemory | 2048 | avgt | 10 | 16.9 | 16.894 | 0.002 | ns/op | 1 >> Fp16ConversionBenchmark.floatToFloat16 | 2048 | avgt | 10 | 12561.962 | 3767.944 | 12.652 | ns/op | 3.334 >> Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | avgt | 10 | 18.146 | 18.147 | 0.003 | ns/op | 1 >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > hw probe Latest version looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21664#pullrequestreview-2391857247 From stuefe at openjdk.org Thu Oct 24 09:35:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 24 Oct 2024 09:35:05 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 15:38:45 GMT, Richard Reingruber wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > For the record: the kernel sets the `fpregs` pointer [here](https://github.com/torvalds/linux/blob/eb26cbb1a754ccde5d4d74527dad5ba051808fad/arch/x86/kernel/signal_64.c#L128) :) > Thanks for your input @reinrich, @tstuefe. I've tried taking the values from `uc->uc_mcontext.fpregs->_xmm[i]`. This seems to work fine if we got a signal. However, "should_not_reach_here()" and friends don't trigger any signal. They call `MacroAssembler::debug64`. odd. debug64 ends up calling fatal, which should use the assertion poison page to get a valid context. see debug.hpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2434771381 From rrich at openjdk.org Thu Oct 24 09:41:05 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 24 Oct 2024 09:41:05 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 15:38:45 GMT, Richard Reingruber wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > For the record: the kernel sets the `fpregs` pointer [here](https://github.com/torvalds/linux/blob/eb26cbb1a754ccde5d4d74527dad5ba051808fad/arch/x86/kernel/signal_64.c#L128) :) > > Thanks for your input @reinrich, @tstuefe. I've tried taking the values from `uc->uc_mcontext.fpregs->_xmm[i]`. This seems to work fine if we got a signal. However, "should_not_reach_here()" and friends don't trigger any signal. They call `MacroAssembler::debug64`. > > odd. debug64 ends up calling fatal, which should use the assertion poison page to get a valid context. see debug.hpp. The [`ucontext_t` is memcpyed](https://github.com/openjdk/jdk/blob/f0b130e54f33d3190640ce33c991e35f27e9f812/src/hotspot/share/utilities/debug.cpp#L727). The `fpregs` pointer in the copy is probably invalid when used. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2434784694 From stuefe at openjdk.org Thu Oct 24 09:48:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 24 Oct 2024 09:48:05 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 09:37:43 GMT, Richard Reingruber wrote: > > Thanks for your input @reinrich, @tstuefe. I've tried taking the values from `uc->uc_mcontext.fpregs->_xmm[i]`. This seems to work fine if we got a signal. However, "should_not_reach_here()" and friends don't trigger any signal. They call `MacroAssembler::debug64`. > > odd. debug64 ends up calling fatal, which should use the assertion poison page to get a valid context. see debug.hpp. to extend: we trigger an artificial segfault in fatal with the poison page, and grab that context. Generally that works. Question is why not here. > > > Thanks for your input @reinrich, @tstuefe. I've tried taking the values from `uc->uc_mcontext.fpregs->_xmm[i]`. This seems to work fine if we got a signal. However, "should_not_reach_here()" and friends don't trigger any signal. They call `MacroAssembler::debug64`. > > > > > > odd. debug64 ends up calling fatal, which should use the assertion poison page to get a valid context. see debug.hpp. > > The [`ucontext_t` is memcpyed](https://github.com/openjdk/jdk/blob/f0b130e54f33d3190640ce33c991e35f27e9f812/src/hotspot/share/utilities/debug.cpp#L727). The `fpregs` pointer in the copy is probably invalid when used. oh good point! I remember dealing with a similar problem on PPC years ago and implementing a deep copy somehow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2434792978 PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2434796142 From rrich at openjdk.org Thu Oct 24 09:48:05 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 24 Oct 2024 09:48:05 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: <_wfr0HaDAqb1_uvoG7mdl2R9pvxNJW1UKmETejTds3o=.8f2ae83c-2451-487e-9d14-7f586e7c4a07@github.com> On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f The context is copied in the signal handler and used [here](https://github.com/openjdk/jdk/blob/f0b130e54f33d3190640ce33c991e35f27e9f812/src/hotspot/share/utilities/debug.cpp#L208) after the signal handler has returned. The copied `fpregs` pointer referes still to original which is/was located in the signal handler caller frame. You need to set it to the `__fpregs_mem` in the copy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2434800734 From rrich at openjdk.org Thu Oct 24 09:55:10 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 24 Oct 2024 09:55:10 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: <_wfr0HaDAqb1_uvoG7mdl2R9pvxNJW1UKmETejTds3o=.8f2ae83c-2451-487e-9d14-7f586e7c4a07@github.com> References: <_wfr0HaDAqb1_uvoG7mdl2R9pvxNJW1UKmETejTds3o=.8f2ae83c-2451-487e-9d14-7f586e7c4a07@github.com> Message-ID: On Thu, 24 Oct 2024 09:45:20 GMT, Richard Reingruber wrote: > The context is copied in the signal handler and used [here](https://github.com/openjdk/jdk/blob/f0b130e54f33d3190640ce33c991e35f27e9f812/src/hotspot/share/utilities/debug.cpp#L208) after the signal handler has returned. The copied `fpregs` pointer referes still to original which is/was located in the signal handler caller frame. You need to set it to the `__fpregs_mem` in the copy. Correction: the kernel has aligned `fpregs` so it doesn't point precisely to `__fpregs_mem` (that's why Martins version isn't working). The copied `fpregs` must point to the same offset in the `ucontext_t` copy as the original `fpregs` pointer in the original `ucontext_t`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2434816007 From mdoerr at openjdk.org Thu Oct 24 10:39:12 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 24 Oct 2024 10:39:12 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: <7Ypj1NQkLlRs4XJCCfthootkY0bHmApAfX_qaGkJVuQ=.2bcedd00-9c12-4f9c-bde2-3364db0f9f86@github.com> On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Thanks for figuring this out! So, there's a bit more to repair :-) I'll take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2434913751 From chagedorn at openjdk.org Thu Oct 24 11:00:36 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 24 Oct 2024 11:00:36 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor Message-ID: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> #### Replacing the Remaining Predicate Walking and Cloning Code In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. #### Single Template Assertion Predicate Check This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). #### Common Refactorings for all the Patches in this Series In each of the patch, I will do similar refactoring ideas: - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. #### Refactorings of this Patch This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943)) which is quite similar. This patch includes: - `AssertionPredicatesForLoop` as new prediciate visitor (this class is reused for [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943)). - To correctly rewire data dependencies, we need to check if a node is part of the original loop body or the cloned loop body. To do that, I've introduced an interface `NodeInLoopBody` which is implemented to check if a node is either in the original loop body (`NodeInOriginalLoopBody`) or in the cloned loop body (`NodeInClonedLoopBody`, added with the next patch). Thanks, Christian ------------- Commit messages: - improve comments - 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor Changes: https://git.openjdk.org/jdk/pull/21679/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21679&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341977 Stats: 199 lines in 4 files changed: 133 ins; 53 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/21679.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21679/head:pull/21679 PR: https://git.openjdk.org/jdk/pull/21679 From chagedorn at openjdk.org Thu Oct 24 11:00:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 24 Oct 2024 11:00:37 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor In-Reply-To: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Thu, 24 Oct 2024 10:45:12 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. > > #### Refactorings of this Patch > This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943)) which is quite similar. Th... src/hotspot/share/opto/loopTransform.cpp line 819: > 817: if (counted_loop && UseLoopPredicate) { > 818: initialize_assertion_predicates_for_peeled_loop(new_head->as_CountedLoop(), head->as_CountedLoop(), > 819: first_node_index_in_cloned_loop_body, old_new); We can infer many values from other variables. src/hotspot/share/opto/loopTransform.cpp line 1451: > 1449: } > 1450: } > 1451: #endif // ASSERT Noticed that after JDK-8342043, `count_opaque_loop_nodes()` can also be made debug-build only. src/hotspot/share/opto/loopTransform.cpp line 1975: > 1973: const Node_List& old_new) { > 1974: const NodeInOriginalLoopBody node_in_original_loop_body(first_node_index_in_cloned_loop_body, old_new); > 1975: create_assertion_predicates_at_loop(peeled_loop_head, remaining_loop_head, node_in_original_loop_body); We can do something similar for the main and post loop which will be proposed in the next PR. src/hotspot/share/opto/loopTransform.cpp line 1994: > 1992: _igvn.replace_input_of(target_outer_loop_head, LoopNode::EntryControl, last_created_node); > 1993: set_idom(target_outer_loop_head, last_created_node, dom_depth(target_outer_loop_head)); > 1994: } **P1** (see PR description) src/hotspot/share/opto/loopnode.hpp line 955: > 953: Node* clone_template_assertion_predicate(IfNode* iff, Node* new_init, Node* predicate, Node* uncommon_proj, Node* control, > 954: IdealLoopTree* outer_loop, Node* new_control); > 955: public: Now needs to be called from `AssertionPredicatesForLoop`. src/hotspot/share/opto/predicates.cpp line 744: > 742: // Only process if we are in the correct Predicate Block. > 743: return; > 744: } **P2** (see PR description) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1814740963 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1814742470 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1814743208 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1814743609 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1814744478 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1814745959 From qamai at openjdk.org Thu Oct 24 11:54:08 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 24 Oct 2024 11:54:08 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 02:08:14 GMT, Jasmine Karthikeyan wrote: >> src/hotspot/share/opto/phaseX.cpp line 2277: >> >>> 2275: >>> 2276: // Try to find an existing version of the same node >>> 2277: Node* existing = _igvn->hash_find_insert(n); >> >> I think it would be easier if you have a switch in `gvn` that says you passed the point of doing `Ideal`, moving forward you will probably want to have a `IdealLowering` to transform nodes during this phase. `Identity` I think is fine since it returns an existing node. > > Ah, do you mean having a method in `Node` that holds the lowering code? I was originally planning on doing it this way, but I think it would pose some problems where certain nodes' `Lower()` methods would only be overridden on certain backends, which would become messy. One of my goals was to keep the lowering code separate from shared code, so new lowerings could be implemented by just updating the main `lower_node` function in the backend. > About GVN, I think it makes sense to do it in a separate phase because GVN is used quite generally whereas lowering is only done once. Since the `transform_old` function in IGVN is pretty complex as well, I think it's simpler to just implement `Value()` and GVN separately. Thinking on it more I think Identity is probably a good idea too, since as you mention it can't introduce new nodes into the graph. Mainly I wanted to avoid the case where `Ideal()` could fold a lowered graph back into the original form, causing an infinite loop. I mean we might want to run another kind of `Ideal` that will replace the normal `Ideal` on a node after its lowering. For example, consider this: vector v; u = v.withLane(0, a).withLane(1, b); This will be parsed into: vector v; v0 = InsertI(v, 4, a); u = InsertI(v0, 5, b); And can be lowered to: vector v; vector v1 = VectorExtract(v, 1); v2 = InsertI(v1, 0, a); v0 = VectorInsert(v, 1, v2); vector v3 = VectorExtract(v0, 1); v4 = InsertI(v3, 1, b); u = VectorInsert(v0, 1, v4); Which represents this sequence: ymm0; vextracti128 xmm1, ymm0, 1; vpinsrd xmm1, xmm1, a, 0; vinserti128 ymm0, ymm0, xmm1, 1; vextracti128 xmm1, ymm0, 1; vpinsrd xmm1, xmm1, b, 1; vinserti128 ymm0, ymm0, xmm1, 1; As you can imagine this sequence is pretty inefficient, what we really want is: ymm0; vextracti128 xmm1, ymm0, 1; vpinsrd xmm1, xmm1, a, 0; vpinsrd xmm1, xmm1, b, 1; vinserti128 ymm0, ymm0, xmm1, 1; Looking back at the graph, we can `Identity` `v3` into `v2` since it is pretty obvious that we just do an insert and extract from the same place. However, to transform `u = VectorInsert(v0, 1, v4)` into `u = VectorInsert(v, 1, v4)`, we would need an `Ideal`-like transformation to see that we just insert into a location twice and remove the intermediate `VectorInsert`. As a result, in addition to ease of implementation, I think you may extend `PhaseIterGVN` and override its `PhaseGVN::apply_ideal` to return `nullptr` for now, and take advantages of `PhaseIterGVN::optimize` to do the iterative transformation for you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1814828100 From chagedorn at openjdk.org Thu Oct 24 11:57:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 24 Oct 2024 11:57:39 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v2] In-Reply-To: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: > #### Replacing the Remaining Predicate Walking and Cloning Code > In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. > > #### Refactorings of this Patch > This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943)) which is quite similar. Th... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: small update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21679/files - new: https://git.openjdk.org/jdk/pull/21679/files/7fe4ea00..eb22d38e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21679&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21679&range=00-01 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21679.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21679/head:pull/21679 PR: https://git.openjdk.org/jdk/pull/21679 From dnsimon at openjdk.org Thu Oct 24 12:50:15 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 24 Oct 2024 12:50:15 GMT Subject: Integrated: 8337968: Problem list compiler/vectorapi/VectorRebracket128Test.java In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 05:28:06 GMT, Tobias Hartmann wrote: > Problem list until [JDK-8330538](https://bugs.openjdk.org/browse/JDK-8330538) is fixed to reduce the noise in testing. > > Thanks, > Tobias Since this fails intermittently in 23u, I'm backporting it: ------------- PR Comment: https://git.openjdk.org/jdk/pull/20485#issuecomment-2435201981 From mdoerr at openjdk.org Thu Oct 24 12:57:23 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 24 Oct 2024 12:57:23 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f It works when I repair the copied `uc_mcontext` (see 2nd commit). Please take a look. Note that I'm using little endian style: print the high order half first which is at offset 8, then the low order half (offset 0). We should also backport this fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2435220242 From mdoerr at openjdk.org Thu Oct 24 12:57:23 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 24 Oct 2024 12:57:23 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v2] In-Reply-To: References: Message-ID: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Use uc_mcontext.fpregs instead of __fpregs_mem._xmm and fix copied uc_mcontext in store_context. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21615/files - new: https://git.openjdk.org/jdk/pull/21615/files/b3ec6143..a68d167b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=00-01 Stats: 9 lines in 2 files changed: 4 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21615.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21615/head:pull/21615 PR: https://git.openjdk.org/jdk/pull/21615 From jbhateja at openjdk.org Thu Oct 24 13:36:50 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Oct 2024 13:36:50 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v32] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Review resolutions. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 - Factor out IR tests and Transforms to follow-up PRs. - Replacing flag based checks with CPU feature checks in IR validation test. - Remove Saturating IRNode patterns. - Restrict IR validation to newly added UMin/UMax transforms. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 - Prod build fix - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 - New IR tests + additional IR transformations - ... and 27 more: https://git.openjdk.org/jdk/compare/158b93d1...0e10139c ------------- Changes: https://git.openjdk.org/jdk/pull/20507/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=31 Stats: 9395 lines in 52 files changed: 8959 ins; 29 del; 407 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Thu Oct 24 13:36:51 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Oct 2024 13:36:51 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 06:46:32 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Factor out IR tests and Transforms to follow-up PRs. > > src/hotspot/cpu/x86/x86.ad line 10790: > >> 10788: predicate(is_subword_type(Matcher::vector_element_basic_type(n)) && >> 10789: n->is_SaturatingVector() && !n->as_SaturatingVector()->is_unsigned()); >> 10790: match(Set dst (SaturatingAddV (Binary dst (LoadVector src)) mask)); > > Do equivalent store operations exist we could also match for? ISA only supports memory operands as second source operand. > test/jdk/jdk/incubator/vector/VectorMathTest.java line 70: > >> 68: public static short[] INPUT_SS = {Short.MIN_VALUE, (short)(Short.MIN_VALUE + TEN_S), ZERO_S, (short)(Short.MAX_VALUE - TEN_S), Short.MAX_VALUE}; >> 69: public static int[] INPUT_SI = {Integer.MIN_VALUE, (Integer.MIN_VALUE + TEN_I), ZERO_I, Integer.MAX_VALUE - TEN_I, Integer.MAX_VALUE}; >> 70: public static long[] INPUT_SL = {Long.MIN_VALUE, Long.MIN_VALUE + TEN_L, ZERO_L, Long.MAX_VALUE - TEN_L, Long.MAX_VALUE}; > > Ok, now we have 4 or 5 hand-crafted examples. Is that sufficient? Some random values would be nice, then we know that at least eventually we have full coverage. Hand crafter cases contains delimiting and general cases, in short they sufficiently cover entire value range. > test/jdk/jdk/incubator/vector/templates/Unit-header.template line 1244: > >> 1242: return fill(s * BUFFER_REPS, >> 1243: i -> ($type$)($Boxtype$.MIN_VALUE + 100)); >> 1244: }) > > Not sure I see this right. But are we only providing these 4 constants as inputs, and all values in the input arrays will be identical? If that is true: we should have some random inputs, or at least dependent on `i`. Most important test points in a saturating operations are the edge conditions where overflow semantics differs and operation saturates a value than wrapping it around. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814998684 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814999013 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814998545 From jbhateja at openjdk.org Thu Oct 24 13:36:51 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Oct 2024 13:36:51 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 06:31:00 GMT, Emanuel Peter wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 32: >> >>> 30: * @since 24 >>> 31: */ >>> 32: public final class VectorMath { >> >> I think this class could have been split into a separate RFE, together with its tests. I would prefer that next time. > > Also: why did we not add these `Long.minUnsigned` etc? I guess that was already discussed? > Because we can easily also use this with the auto-vectorizer or more generally. Saturating and unsigned ops are generally useful I think... PR is specially targeting explicit vectorization flow, we plan to address scalar intrinsification and auto-vectorization later, once type system has exposure to unsigned types. >> test/jdk/jdk/incubator/vector/templates/Kernel-SaturatingBinary-Masked-op.template line 8: >> >>> 6: >>> 7: for (int ic = 0; ic < INVOC_COUNT; ic++) { >>> 8: for (int i = 0; i < a.length; i += SPECIES.length()) { >> >> I think this does not check if the generated vectors are too long. We had bugs in the past where we should have created say 2-element vectors, but the backend wrongly created 4-element vectors. This is especially an issue with vectors that do direct memory access. >> >> With a simple "counting-up" test, you will probably not catch this. It could be good to have a "counting-down" example as well. What do you think? > > Also: all of these cases load, and directly store again. Does that not mean all tests will probably pick the "..._mem" backend operations? Or do we actually end up testing all backend operations with the tests we have here? To exercise non memory operand pattern we need a vector operation padding layer after load vector, this will always ensure that selector pick all register operands flavor of instruction. Since its a generic limitation, do you think we should float it as a separate PR? I have create an RFE https://bugs.openjdk.org/browse/JDK-8342959 for reference. Given that we have moved IR tests out this PR on the grounds of review complexity, lets not add more code here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815000046 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1814998821 From epeter at openjdk.org Thu Oct 24 13:44:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 13:44:28 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v32] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:36:50 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Review resolutions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Factor out IR tests and Transforms to follow-up PRs. > - Replacing flag based checks with CPU feature checks in IR validation test. > - Remove Saturating IRNode patterns. > - Restrict IR validation to newly added UMin/UMax transforms. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Prod build fix > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - New IR tests + additional IR transformations > - ... and 27 more: https://git.openjdk.org/jdk/compare/158b93d1...0e10139c src/hotspot/cpu/x86/x86.ad line 10593: > 10591: match(Set dst (SaturatingAddV src1 src2)); > 10592: match(Set dst (SaturatingSubV src1 src2)); > 10593: format %{ "vector_addsub_saturating_subword $dst, $src1, $src2" %} Could the `Opcode` be put into the `format` string? Not strictly necessary, but would be neat. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815010088 From epeter at openjdk.org Thu Oct 24 13:44:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 13:44:28 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: <_Fu-_LTHnSKDrkQR1er4Tl9jIlyh2wIWvTstUaxJVFU=.df8418f6-95c9-4c52-8704-76d1f4046023@github.com> On Thu, 24 Oct 2024 13:30:25 GMT, Jatin Bhateja wrote: >> test/jdk/jdk/incubator/vector/VectorMathTest.java line 70: >> >>> 68: public static short[] INPUT_SS = {Short.MIN_VALUE, (short)(Short.MIN_VALUE + TEN_S), ZERO_S, (short)(Short.MAX_VALUE - TEN_S), Short.MAX_VALUE}; >>> 69: public static int[] INPUT_SI = {Integer.MIN_VALUE, (Integer.MIN_VALUE + TEN_I), ZERO_I, Integer.MAX_VALUE - TEN_I, Integer.MAX_VALUE}; >>> 70: public static long[] INPUT_SL = {Long.MIN_VALUE, Long.MIN_VALUE + TEN_L, ZERO_L, Long.MAX_VALUE - TEN_L, Long.MAX_VALUE}; >> >> Ok, now we have 4 or 5 hand-crafted examples. Is that sufficient? Some random values would be nice, then we know that at least eventually we have full coverage. > > Hand crafter cases contains delimiting and general cases, in short they sufficiently cover entire value range. @PaulSandoz do you think this is sufficient coverage? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815012386 From epeter at openjdk.org Thu Oct 24 13:44:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 13:44:28 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:30:20 GMT, Jatin Bhateja wrote: >> Also: all of these cases load, and directly store again. Does that not mean all tests will probably pick the "..._mem" backend operations? Or do we actually end up testing all backend operations with the tests we have here? > > To exercise non memory operand pattern we need a vector operation padding layer after load vector, this will always ensure that selector pick all register operands flavor of instruction. Since its a generic limitation, do you think we should float it as a separate PR? > > I have create an RFE https://bugs.openjdk.org/browse/JDK-8342959 for reference. Given that we have moved IR tests out this PR on the grounds of review complexity, lets not add more code here. Ok, we can file a separate RFE. Though I really have voiced 2 concerns: - Making sure we always test `_mem` and `_reg` variants in the backend. See your https://bugs.openjdk.org/browse/JDK-8342959 - Making sure we have tests that would detect vectors that are too long. This would require some padding between the vectors, so that we have some untouched space - and if it does get touched we know that a vector was too long. Does that make sense? This is I guess also a general concern - and would have to be applied to all vector instructions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815017678 From mdoerr at openjdk.org Thu Oct 24 13:50:14 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 24 Oct 2024 13:50:14 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v2] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 12:57:23 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use uc_mcontext.fpregs instead of __fpregs_mem._xmm and fix copied uc_mcontext in store_context. Also note that the copy fix above is kind of a hack. Should we use something like the following? diff --git a/src/hotspot/share/utilities/debug.cpp b/src/hotspot/share/utilities/debug.cpp index 988e5dddd90..1389654b4b6 100644 --- a/src/hotspot/share/utilities/debug.cpp +++ b/src/hotspot/share/utilities/debug.cpp @@ -731,7 +731,9 @@ static void store_context(const void* context) { #if defined(PPC64) *((void**) &g_stored_assertion_context.uc_mcontext.regs) = &(g_stored_assertion_context.uc_mcontext.gp_regs); #elif defined(AMD64) - *((void**) &g_stored_assertion_context.uc_mcontext.fpregs) = &(g_stored_assertion_context.uc_mcontext.fpregs); + // In the copied version, fpregs should point to the copied contents. Preserve the offset. + intptr_t offset = (address)(void*)(g_stored_assertion_context.uc_mcontext.fpregs) - (address)context; + *((void**) &g_stored_assertion_context.uc_mcontext.fpregs) = (void*)((address)(void*)&g_stored_assertion_context + offset); #endif #endif } ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2435348706 From rrich at openjdk.org Thu Oct 24 13:53:09 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 24 Oct 2024 13:53:09 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v2] In-Reply-To: References: Message-ID: <40HPxQVaUhLP7lwcLHOBcqmRIV95ijx2CrafpaPcJec=.97a6c09f-b4f7-4357-a790-d4a56f573712@github.com> On Thu, 24 Oct 2024 12:57:23 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use uc_mcontext.fpregs instead of __fpregs_mem._xmm and fix copied uc_mcontext in store_context. > Also note that the copy fix above is kind of a hack. Should we use something like the following? > > ```diff > diff --git a/src/hotspot/share/utilities/debug.cpp b/src/hotspot/share/utilities/debug.cpp > index 988e5dddd90..1389654b4b6 100644 > --- a/src/hotspot/share/utilities/debug.cpp > +++ b/src/hotspot/share/utilities/debug.cpp > @@ -731,7 +731,9 @@ static void store_context(const void* context) { > #if defined(PPC64) > *((void**) &g_stored_assertion_context.uc_mcontext.regs) = &(g_stored_assertion_context.uc_mcontext.gp_regs); > #elif defined(AMD64) > - *((void**) &g_stored_assertion_context.uc_mcontext.fpregs) = &(g_stored_assertion_context.uc_mcontext.fpregs); > + // In the copied version, fpregs should point to the copied contents. Preserve the offset. > + intptr_t offset = (address)(void*)(g_stored_assertion_context.uc_mcontext.fpregs) - (address)context; > + *((void**) &g_stored_assertion_context.uc_mcontext.fpregs) = (void*)((address)(void*)&g_stored_assertion_context + offset); > #endif > #endif > } > ``` Looks like what I ment in [my comment above](https://github.com/openjdk/jdk/pull/21615#issuecomment-2434816007). I was wondering why you ignored it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2435356708 From redestad at openjdk.org Thu Oct 24 13:57:20 2024 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 24 Oct 2024 13:57:20 GMT Subject: RFR: 8342958: Use jvmArgs consistently in microbenchmarks Message-ID: Many OpenJDK micros use `@Fork(jvmArgs/-Append/-Prepend)` to add JVM reasonable or necessary flags, but when deploying and running micros we often want to add or replace flags to tune to the machine, test different GCs, etc. The inconsistent use of the different `jvmArgs` options make it error prone, and we've had a few recent cases where we've not been testing with the expected set of flags. This PR suggests using `jvmArgs` consistently. I think this aligns with the intuition that when you use `jvmArgsAppend/-Prepend` intent is to add to a set of existing flags, while if you supply `jvmArgs` intent is "run with these and nothing else". Perhaps there are other opinions/preferences, and I don't feel strongly about which to consolidate to as long as we do so consistently. One argument could be made to consolidate on `jvmArgsAppend` since that one is (likely accidentally) the current most popular (143 compared to 59 `jvmArgsPrepend` and 18 `jvmArgs`). ------------- Commit messages: - Change all micros to consistently use jvmArgs, leaving both jvmArgsAppend/-Prepend free for ops Changes: https://git.openjdk.org/jdk/pull/21683/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21683&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342958 Stats: 202 lines in 142 files changed: 0 ins; 0 del; 202 mod Patch: https://git.openjdk.org/jdk/pull/21683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21683/head:pull/21683 PR: https://git.openjdk.org/jdk/pull/21683 From jbhateja at openjdk.org Thu Oct 24 14:03:26 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Oct 2024 14:03:26 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v32] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:36:53 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: >> >> - Review resolutions. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 >> - Factor out IR tests and Transforms to follow-up PRs. >> - Replacing flag based checks with CPU feature checks in IR validation test. >> - Remove Saturating IRNode patterns. >> - Restrict IR validation to newly added UMin/UMax transforms. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 >> - Prod build fix >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 >> - New IR tests + additional IR transformations >> - ... and 27 more: https://git.openjdk.org/jdk/compare/158b93d1...0e10139c > > src/hotspot/cpu/x86/x86.ad line 10593: > >> 10591: match(Set dst (SaturatingAddV src1 src2)); >> 10592: match(Set dst (SaturatingSubV src1 src2)); >> 10593: format %{ "vector_addsub_saturating_subword $dst, $src1, $src2" %} > > Could the `Opcode` be put into the `format` string? Not strictly necessary, but would be neat. Desirable future extension, but its not related to this specific PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815050398 From mbaesken at openjdk.org Thu Oct 24 14:07:21 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 24 Oct 2024 14:07:21 GMT Subject: RFR: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' Message-ID: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> When running with ubsanized binaries on Linux x86_64, hs jtreg test compiler/startup/StartupOutput.java showed this issue jdk/src/hotspot/share/ci/ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' #0 0x7fcea0810117 in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1614 #1 0x7fcea3123577 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 #2 0x7fcea0c01499 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 #3 0x7fcea09e9d85 in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 #4 0x7fcea066da1d in Runtime1::generate_blob(BufferBlob*, C1StubId, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:233 #5 0x7fcea066dfb0 in Runtime1::generate_blob_for(BufferBlob*, C1StubId) src/hotspot/share/c1/c1_Runtime1.cpp:262 #6 0x7fcea066dfb0 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:272 #7 0x7fcea03d2be1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 #8 0x7fcea03d2be1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 #9 0x7fcea0acc0c2 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 #10 0x7fcea0ad9a3f in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 #11 0x7fcea161264a in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:759 #12 0x7fcea2ec739a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:234 #13 0x7fcea251e1d2 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 #14 0x7fcea7c6c6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 1b515766201d47a183932ba0c8c8bd0d9ee8755b) #15 0x7fcea730f58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: 448a3ddd22596e1adb8fb3dec8921ed5b9d54dc2) So a nullptr check should be better added . ------------- Commit messages: - JDK-8342823 Changes: https://git.openjdk.org/jdk/pull/21684/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21684&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342823 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21684/head:pull/21684 PR: https://git.openjdk.org/jdk/pull/21684 From epeter at openjdk.org Thu Oct 24 14:07:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 24 Oct 2024 14:07:24 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v32] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:59:58 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 10593: >> >>> 10591: match(Set dst (SaturatingAddV src1 src2)); >>> 10592: match(Set dst (SaturatingSubV src1 src2)); >>> 10593: format %{ "vector_addsub_saturating_subword $dst, $src1, $src2" %} >> >> Could the `Opcode` be put into the `format` string? Not strictly necessary, but would be neat. > > Desirable future extension, but its not related to this specific PR. Well, here it would be especially interesting, because it would tell us if we have a `sub` or an `add`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815058565 From jbhateja at openjdk.org Thu Oct 24 14:10:15 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Oct 2024 14:10:15 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: <_Fu-_LTHnSKDrkQR1er4Tl9jIlyh2wIWvTstUaxJVFU=.df8418f6-95c9-4c52-8704-76d1f4046023@github.com> References: <_Fu-_LTHnSKDrkQR1er4Tl9jIlyh2wIWvTstUaxJVFU=.df8418f6-95c9-4c52-8704-76d1f4046023@github.com> Message-ID: On Thu, 24 Oct 2024 13:38:12 GMT, Emanuel Peter wrote: >> Hand crafter cases contains delimiting and general cases, in short they sufficiently cover entire value range. > > @PaulSandoz do you think this is sufficient coverage? Please note this test was added just to cover scalar operation validation in VectorMath, automated tests exercise these APIs in fallback implementation anyways. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815064343 From ecaspole at openjdk.org Thu Oct 24 14:46:08 2024 From: ecaspole at openjdk.org (Eric Caspole) Date: Thu, 24 Oct 2024 14:46:08 GMT Subject: RFR: 8342958: Use jvmArgs consistently in microbenchmarks In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:52:57 GMT, Claes Redestad wrote: > Many OpenJDK micros use `@Fork(jvmArgs/-Append/-Prepend)` to add JVM reasonable or necessary flags, but when deploying and running micros we often want to add or replace flags to tune to the machine, test different GCs, etc. The inconsistent use of the different `jvmArgs` options make it error prone, and we've had a few recent cases where we've not been testing with the expected set of flags. > > This PR suggests using `jvmArgs` consistently. I think this aligns with the intuition that when you use `jvmArgsAppend/-Prepend` intent is to add to a set of existing flags, while if you supply `jvmArgs` intent is "run with these and nothing else". Perhaps there are other opinions/preferences, and I don't feel strongly about which to consolidate to as long as we do so consistently. One argument could be made to consolidate on `jvmArgsAppend` since that one is (likely accidentally) the current most popular (143 compared to 59 `jvmArgsPrepend` and 18 `jvmArgs`). Looks good. ------------- Marked as reviewed by ecaspole (Committer). PR Review: https://git.openjdk.org/jdk/pull/21683#pullrequestreview-2392886714 From duke at openjdk.org Thu Oct 24 14:58:09 2024 From: duke at openjdk.org (Abdelhak Zaaim) Date: Thu, 24 Oct 2024 14:58:09 GMT Subject: RFR: 8342862: Gtest added by 8339507 appears to be causing 8GB build machines to hang [v2] In-Reply-To: References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> Message-ID: On Wed, 23 Oct 2024 23:49:19 GMT, hanklo6 wrote: >> We generate each test randomly. The `asmtest.out.h` is currently around 100 KB. We have added the `--full` argument to allow users to create a complete test set. > > hanklo6 has updated the pull request incrementally with one additional commit since the last revision: > > add instruction Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/21670#pullrequestreview-2392962162 From psandoz at openjdk.org Thu Oct 24 15:04:14 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 24 Oct 2024 15:04:14 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:31:00 GMT, Jatin Bhateja wrote: >> Also: why did we not add these `Long.minUnsigned` etc? I guess that was already discussed? >> Because we can easily also use this with the auto-vectorizer or more generally. Saturating and unsigned ops are generally useful I think... > > PR is specially targeting explicit vectorization flow, we plan to address scalar intrinsification and auto-vectorization later, once type system has exposure to unsigned types. We are uncertain about their locations in `java.lang` at the moment. For now it's better to place them under incubation and then revisit later when we are more certain. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815180240 From psandoz at openjdk.org Thu Oct 24 15:10:15 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 24 Oct 2024 15:10:15 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: <_Fu-_LTHnSKDrkQR1er4Tl9jIlyh2wIWvTstUaxJVFU=.df8418f6-95c9-4c52-8704-76d1f4046023@github.com> Message-ID: On Thu, 24 Oct 2024 14:07:34 GMT, Jatin Bhateja wrote: >> @PaulSandoz do you think this is sufficient coverage? > > Please note this test was added just to cover scalar operation validation in VectorMath, automated tests exercise these APIs in fallback implementation anyways. I think the coverage is sufficient for now and we can expand later. The test is written so that it should be possible to more easily refactor and add further dynamically generated test cases. (Note it is deliberately not a test designed to specifically exercise C2 - if/when we add auto vectorization IR tests would be required). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815191672 From psandoz at openjdk.org Thu Oct 24 15:19:20 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 24 Oct 2024 15:19:20 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:41:13 GMT, Emanuel Peter wrote: >> To exercise non memory operand pattern we need a vector operation padding layer after load vector, this will always ensure that selector pick all register operands flavor of instruction. Since its a generic limitation, do you think we should float it as a separate PR? >> >> I have create an RFE https://bugs.openjdk.org/browse/JDK-8342959 for reference. Given that we have moved IR tests out this PR on the grounds of review complexity, lets not add more code here. > > Ok, we can file a separate RFE. Though I really have voiced 2 concerns: > - Making sure we always test `_mem` and `_reg` variants in the backend. See your https://bugs.openjdk.org/browse/JDK-8342959 > - Making sure we have tests that would detect vectors that are too long. This would require some padding between the vectors, so that we have some untouched space - and if it does get touched we know that a vector was too long. Does that make sense? This is I guess also a general concern - and would have to be applied to all vector instructions. Good point on vector operations overrunning bounds. I worry about the computational increase of doing this generally for all operations (explicit or for auto vectorization i suppose). Perhaps we can focus on areas where we know this may be problematic? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815211324 From kvn at openjdk.org Thu Oct 24 15:25:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 24 Oct 2024 15:25:13 GMT Subject: RFR: 8342862: Gtest added by 8339507 appears to be causing 8GB build machines to hang [v2] In-Reply-To: References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> Message-ID: On Wed, 23 Oct 2024 23:49:19 GMT, hanklo6 wrote: >> We generate each test randomly. The `asmtest.out.h` is currently around 100 KB. We have added the `--full` argument to allow users to create a complete test set. > > hanklo6 has updated the pull request incrementally with one additional commit since the last revision: > > add instruction Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21670#pullrequestreview-2393033560 From duke at openjdk.org Thu Oct 24 16:02:07 2024 From: duke at openjdk.org (duke) Date: Thu, 24 Oct 2024 16:02:07 GMT Subject: RFR: 8342862: Gtest added by 8339507 appears to be causing 8GB build machines to hang [v2] In-Reply-To: References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> Message-ID: On Wed, 23 Oct 2024 23:49:19 GMT, hanklo6 wrote: >> We generate each test randomly. The `asmtest.out.h` is currently around 100 KB. We have added the `--full` argument to allow users to create a complete test set. > > hanklo6 has updated the pull request incrementally with one additional commit since the last revision: > > add instruction @hanklo6 Your change (at version e79a5a65fe2517dd79efb4ac38db4a34b9db9a6c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21670#issuecomment-2435679843 From duke at openjdk.org Thu Oct 24 16:05:13 2024 From: duke at openjdk.org (hanklo6) Date: Thu, 24 Oct 2024 16:05:13 GMT Subject: Integrated: 8342862: Gtest added by 8339507 appears to be causing 8GB build machines to hang In-Reply-To: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> References: <0t-KQxrOgrL7u5M9LHNbV8ZUyQjnsGzdsZXAqieN-Gs=.f073005a-ae6a-43aa-82ff-b0974b48a39f@github.com> Message-ID: On Wed, 23 Oct 2024 21:35:45 GMT, hanklo6 wrote: > We generate each test randomly. The `asmtest.out.h` is currently around 100 KB. We have added the `--full` argument to allow users to create a complete test set. This pull request has now been integrated. Changeset: 7d5eefa5 Author: hanklo6 Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/7d5eefa50673d6f7c5bd916f63271cf7898d6dee Stats: 85067 lines in 3 files changed: 927 ins; 83631 del; 509 mod 8342862: Gtest added by 8339507 appears to be causing 8GB build machines to hang Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/21670 From kvn at openjdk.org Thu Oct 24 16:22:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 24 Oct 2024 16:22:06 GMT Subject: RFR: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> References: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> Message-ID: On Thu, 24 Oct 2024 14:02:26 GMT, Matthias Baesken wrote: > When running with ubsanized binaries on Linux x86_64, > hs jtreg test compiler/startup/StartupOutput.java > showed this issue > > jdk/src/hotspot/share/ci/ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fcea0810117 in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1614 > #1 0x7fcea3123577 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fcea0c01499 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fcea09e9d85 in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fcea066da1d in Runtime1::generate_blob(BufferBlob*, C1StubId, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:233 > #5 0x7fcea066dfb0 in Runtime1::generate_blob_for(BufferBlob*, C1StubId) src/hotspot/share/c1/c1_Runtime1.cpp:262 > #6 0x7fcea066dfb0 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:272 > #7 0x7fcea03d2be1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fcea03d2be1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fcea0acc0c2 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fcea0ad9a3f in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fcea161264a in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:759 > #12 0x7fcea2ec739a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:234 > #13 0x7fcea251e1d2 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fcea7c6c6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 1b515766201d47a183932ba0c8c8bd0d9ee8755b) > #15 0x7fcea730f58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: 448a3ddd22596e1adb8fb3dec8921ed5b9d54dc2) > > So a nullptr check should be better added . Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21684#pullrequestreview-2393173029 From kvn at openjdk.org Thu Oct 24 16:36:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 24 Oct 2024 16:36:05 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 06:02:22 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superwordVTransformBuilder.cpp line 231: >> >>> 229: } else { >>> 230: // Replicate the scalar same_input to every vector element. >>> 231: BasicType element_type = p0->is_Convert() ? p0->in(1)->bottom_type()->basic_type() : _vloop_analyzer.types().velt_basic_type(p0); >> >> What vectors are generated (or not) with this change? The array in the test ins `int[]` but the element_type will be Long now. Will it bailout vectorization? > > @vnkozlov You can very easily see how it goes with my `Test4` above, I split the things onto different lines so we can see what is from where easily. > > The pack that `p0` belongs to is a `ConvL2I` pack. In my case, I have an `short[]`, just to make things even more interesting. Since the type is propagated from use -> def, the output of the `ConvL2I` is interpreted as a `short`, it is essentially a truncated `int`. `velt_basic_type(p0) == T_SHORT`. The vector node should be a `VectorCastL2X === _ 873 [[ ]] #vectors`, i.e. casting from long-vector to short-vector. > > But now we see that the input to the pack of `p0` is all the same, and so we want to introduce a `Replicate`. We should of course replicate for `long`. But `velt_basic_type(p0) == T_SHORT` - so you get a `Replicate === _ 717 [[ ]] #vectorx`, and then eventually a `VectorCastS2X === _ 890 [[ ]] #vectors`... but of course the AD file has no matching node for a VectorCast from short to short -> `bad AD file`. > > The issue is really that `velt_basic_type(p0)` gives us the output-type, but we actually would need the input-type. In almost all cases input-type == output-type. But of course that does not hold with Convert. > > With Roland's fix, we now ask for the output-type of the `ConvL2I`'s input. That is the same as asking for the `ConvL2I`'s input-type. That way, we know what type to Replicate for - the `element_type`. > > @rwestrel given that @vnkozlov also did not right away understand what is going on, I think you need to properly explain what happens in the comments ;) Thank you for explanation. Yes, to have comment would be nice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21660#discussion_r1815359076 From mdoerr at openjdk.org Thu Oct 24 16:54:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 24 Oct 2024 16:54:38 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v3] In-Reply-To: References: Message-ID: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Preserve offset in copied uc_mcontext in store_context. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21615/files - new: https://git.openjdk.org/jdk/pull/21615/files/a68d167b..6f9ed359 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21615.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21615/head:pull/21615 PR: https://git.openjdk.org/jdk/pull/21615 From mdoerr at openjdk.org Thu Oct 24 16:54:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 24 Oct 2024 16:54:38 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v2] In-Reply-To: <40HPxQVaUhLP7lwcLHOBcqmRIV95ijx2CrafpaPcJec=.97a6c09f-b4f7-4357-a790-d4a56f573712@github.com> References: <40HPxQVaUhLP7lwcLHOBcqmRIV95ijx2CrafpaPcJec=.97a6c09f-b4f7-4357-a790-d4a56f573712@github.com> Message-ID: On Thu, 24 Oct 2024 13:50:57 GMT, Richard Reingruber wrote: > Looks like what I ment in [my comment above](https://github.com/openjdk/jdk/pull/21615#issuecomment-2434816007). I was wondering why you ignored it. Sorry. I had forgotten it after looking into another issue. Committed slightly improved version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2435777576 From jvernee at openjdk.org Thu Oct 24 17:15:06 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 24 Oct 2024 17:15:06 GMT Subject: RFR: 8342958: Use jvmArgs consistently in microbenchmarks In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:52:57 GMT, Claes Redestad wrote: > Many OpenJDK micros use `@Fork(jvmArgs/-Append/-Prepend)` to add JVM reasonable or necessary flags, but when deploying and running micros we often want to add or replace flags to tune to the machine, test different GCs, etc. The inconsistent use of the different `jvmArgs` options make it error prone, and we've had a few recent cases where we've not been testing with the expected set of flags. > > This PR suggests using `jvmArgs` consistently. I think this aligns with the intuition that when you use `jvmArgsAppend/-Prepend` intent is to add to a set of existing flags, while if you supply `jvmArgs` intent is "run with these and nothing else". Perhaps there are other opinions/preferences, and I don't feel strongly about which to consolidate to as long as we do so consistently. One argument could be made to consolidate on `jvmArgsAppend` since that one is (likely accidentally) the current most popular (143 compared to 59 `jvmArgsPrepend` and 18 `jvmArgs`). I've always used `-jvmArgsAppend ` to add more flags on the command line when running the benchmarks jar directly. Does that still work if we change the annotations to use `jvmArgs`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21683#issuecomment-2435881457 From redestad at openjdk.org Thu Oct 24 17:23:06 2024 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 24 Oct 2024 17:23:06 GMT Subject: RFR: 8342958: Use jvmArgs consistently in microbenchmarks In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:52:57 GMT, Claes Redestad wrote: > Many OpenJDK micros use `@Fork(jvmArgs/-Append/-Prepend)` to add JVM reasonable or necessary flags, but when deploying and running micros we often want to add or replace flags to tune to the machine, test different GCs, etc. The inconsistent use of the different `jvmArgs` options make it error prone, and we've had a few recent cases where we've not been testing with the expected set of flags. > > This PR suggests using `jvmArgs` consistently. I think this aligns with the intuition that when you use `jvmArgsAppend/-Prepend` intent is to add to a set of existing flags, while if you supply `jvmArgs` intent is "run with these and nothing else". Perhaps there are other opinions/preferences, and I don't feel strongly about which to consolidate to as long as we do so consistently. One argument could be made to consolidate on `jvmArgsAppend` since that one is (likely accidentally) the current most popular (143 compared to 59 `jvmArgsPrepend` and 18 `jvmArgs`). Yes, that's typically what we do in testing, the subtle problem solved by this is that when you do so you'll silently overwrite pre-existing jvmArgsAppend annotation values. I think the suggested scheme is more intuitive in that regard. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21683#issuecomment-2435895954 From jvernee at openjdk.org Thu Oct 24 17:41:06 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 24 Oct 2024 17:41:06 GMT Subject: RFR: 8342958: Use jvmArgs consistently in microbenchmarks In-Reply-To: References: Message-ID: <5X1PC1riWwOSgqBXF8tJRwqiXgZKZTOltDj_DEUISyU=.ffb929d0-9820-43a6-8e2e-20c3a5fb399c@github.com> On Thu, 24 Oct 2024 13:52:57 GMT, Claes Redestad wrote: > Many OpenJDK micros use `@Fork(jvmArgs/-Append/-Prepend)` to add JVM reasonable or necessary flags, but when deploying and running micros we often want to add or replace flags to tune to the machine, test different GCs, etc. The inconsistent use of the different `jvmArgs` options make it error prone, and we've had a few recent cases where we've not been testing with the expected set of flags. > > This PR suggests using `jvmArgs` consistently. I think this aligns with the intuition that when you use `jvmArgsAppend/-Prepend` intent is to add to a set of existing flags, while if you supply `jvmArgs` intent is "run with these and nothing else". Perhaps there are other opinions/preferences, and I don't feel strongly about which to consolidate to as long as we do so consistently. One argument could be made to consolidate on `jvmArgsAppend` since that one is (likely accidentally) the current most popular (143 compared to 59 `jvmArgsPrepend` and 18 `jvmArgs`). Overall I like the idea. I took a look at the `foreign` benchmarks in particular, and don't see any issues with them. ------------- Marked as reviewed by jvernee (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21683#pullrequestreview-2393339018 From jbhateja at openjdk.org Thu Oct 24 17:41:16 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 24 Oct 2024 17:41:16 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v32] In-Reply-To: References: Message-ID: <9FAHfjE1Kq9dlE7RRWdqnXZYGQ0SXVevZi4EWuYinv0=.28778d36-1a8e-408c-a447-adc963cdd1b8@github.com> On Thu, 24 Oct 2024 14:04:32 GMT, Emanuel Peter wrote: >> Desirable future extension, but its not related to this specific PR. > > Well, here it would be especially interesting, because it would tell us if we have a `sub` or an `add`. Lets address it in a follow up PR ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815473182 From shade at openjdk.org Thu Oct 24 18:12:16 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 24 Oct 2024 18:12:16 GMT Subject: RFR: 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators() Message-ID: Noticed this while looking at Leyden profiles. C2 seems to spend considerable time doing in this loop. The disassembly shows this loop is fairly hot. Replacing the initialization with memset, while touching more memory, is apparently faster. memset is also what we normally do around C2 for arena-allocated data. We seem to touch a lot of these structs later on, so pulling them to cache with memset is likely "free". It also looks like current initialization misses initializing the last element (at `C->unique()+1`). I'll put performance data in separate comment. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/21690/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21690&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342975 Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21690/head:pull/21690 PR: https://git.openjdk.org/jdk/pull/21690 From shade at openjdk.org Thu Oct 24 18:12:16 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 24 Oct 2024 18:12:16 GMT Subject: RFR: 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators() In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 17:10:42 GMT, Aleksey Shipilev wrote: > Noticed this while looking at Leyden profiles. C2 seems to spend considerable time doing in this loop. The disassembly shows this loop is fairly hot. Replacing the initialization with memset, while touching more memory, is apparently faster. memset is also what we normally do around C2 for arena-allocated data. We seem to touch a lot of these structs later on, so pulling them to cache with memset is likely "free". > > It also looks like current initialization misses initializing the last element (at `C->unique()+1`). > > I'll put performance data in separate comment. On various tests on x86_64, this gives me +1% faster runs in `-Xcomp` scenarios. Not very visible with "normal" amount of C2 compilations. ## HelloWorld, -Xcomp -XX:-TieredCompilation # Before Time (mean ? ?): 617.7 ms ? 2.5 ms [User: 584.6 ms, System: 31.5 ms] Range (min ? max): 614.2 ms ? 624.5 ms 20 runs # After Time (mean ? ?): 611.3 ms ? 1.9 ms [User: 578.0 ms, System: 31.8 ms] Range (min ? max): 608.0 ms ? 614.4 ms 20 runs ## JavacBenchApp 50, -XX:-TieredCompilation # Before Time (mean ? ?): 1.733 s ? 0.011 s [User: 3.074 s, System: 0.139 s] Range (min ? max): 1.719 s ? 1.753 s 20 runs # After Time (mean ? ?): 1.727 s ? 0.011 s [User: 3.023 s, System: 0.144 s] Range (min ? max): 1.704 s ? 1.751 s 20 runs ## JavacBenchApp 50, -Xcomp -XX:-TieredCompilation # Before Time (mean ? ?): 15.223 s ? 0.061 s [User: 15.048 s, System: 0.239 s] Range (min ? max): 15.152 s ? 15.330 s 10 runs # After Time (mean ? ?): 15.080 s ? 0.021 s [User: 14.896 s, System: 0.239 s] Range (min ? max): 15.048 s ? 15.115 s 10 runs ------------- PR Comment: https://git.openjdk.org/jdk/pull/21690#issuecomment-2435882250 From kvn at openjdk.org Thu Oct 24 19:12:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 24 Oct 2024 19:12:05 GMT Subject: RFR: 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators() In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 17:10:42 GMT, Aleksey Shipilev wrote: > Noticed this while looking at Leyden profiles. C2 seems to spend considerable time doing in this loop. The disassembly shows this loop is fairly hot. Replacing the initialization with memset, while touching more memory, is apparently faster. memset is also what we normally do around C2 for arena-allocated data. We seem to touch a lot of these structs later on, so pulling them to cache with memset is likely "free". > > It also looks like current initialization misses initializing the last element (at `C->unique()+1`). > > I'll put performance data in separate comment. Nice find. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21690#pullrequestreview-2393510365 From luhenry at openjdk.org Thu Oct 24 23:08:07 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 24 Oct 2024 23:08:07 GMT Subject: RFR: 8342884: RISC-V: verify float <--> float16 conversion [v2] In-Reply-To: <1m79HSCmGtubgtikAcVZxe0R301KXln-tX9asygp1rg=.55455efd-e061-45d4-9734-990725250d4c@github.com> References: <3STkw_TrCn1ib_wPr4NlZxahYim-9hVafE_4g7gZ8WQ=.8ad2e319-440d-43f1-b091-1b603be28e09@github.com> <1m79HSCmGtubgtikAcVZxe0R301KXln-tX9asygp1rg=.55455efd-e061-45d4-9734-990725250d4c@github.com> Message-ID: <1Tq-vOEhOgg-LNviX8abQ_6zkyQx8mvYRH9KX6XMCks=.76124782-789d-492a-96d4-42bb48081c06@github.com> On Thu, 24 Oct 2024 09:12:52 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? it removes the `Experimental` of the `UseZvfh`. >> Thanks >> >> Currently, only float <--> float16 conversions use Zvfh extension, I've run the jmh tests on bananapi, the performance result shows it's good. >> >> Benchmark-XX:+UseZfh -XX:+UnlockExperimentalVMOptions -XX:+/-UseZvfh | (size) | Mode | Cnt | Score -intrinsic | Score +intrinsic | Error | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Fp16ConversionBenchmark.float16ToFloat | 2048 | avgt | 10 | 8129.72 | 4729.125 | 71.937 | ns/op | 1.719 >> Fp16ConversionBenchmark.float16ToFloatMemory | 2048 | avgt | 10 | 16.9 | 16.894 | 0.002 | ns/op | 1 >> Fp16ConversionBenchmark.floatToFloat16 | 2048 | avgt | 10 | 12561.962 | 3767.944 | 12.652 | ns/op | 3.334 >> Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | avgt | 10 | 18.146 | 18.147 | 0.003 | ns/op | 1 >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > hw probe Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21664#pullrequestreview-2393853909 From dlong at openjdk.org Thu Oct 24 23:10:08 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 24 Oct 2024 23:10:08 GMT Subject: RFR: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> References: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> Message-ID: On Thu, 24 Oct 2024 14:02:26 GMT, Matthias Baesken wrote: > When running with ubsanized binaries on Linux x86_64, > hs jtreg test compiler/startup/StartupOutput.java > showed this issue > > jdk/src/hotspot/share/ci/ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fcea0810117 in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1614 > #1 0x7fcea3123577 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fcea0c01499 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fcea09e9d85 in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fcea066da1d in Runtime1::generate_blob(BufferBlob*, C1StubId, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:233 > #5 0x7fcea066dfb0 in Runtime1::generate_blob_for(BufferBlob*, C1StubId) src/hotspot/share/c1/c1_Runtime1.cpp:262 > #6 0x7fcea066dfb0 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:272 > #7 0x7fcea03d2be1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fcea03d2be1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fcea0acc0c2 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fcea0ad9a3f in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fcea161264a in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:759 > #12 0x7fcea2ec739a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:234 > #13 0x7fcea251e1d2 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fcea7c6c6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 1b515766201d47a183932ba0c8c8bd0d9ee8755b) > #15 0x7fcea730f58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: 448a3ddd22596e1adb8fb3dec8921ed5b9d54dc2) > > So a nullptr check should be better added . There's not much point in generating an incomplete replay file. How about moving the checks for this->task() to the top of the function? And we could avoid the call altogether by changing VMError::report_and_die() from: 1864 ciEnv* env = ciEnv::current(); 1865 if (env != nullptr) { to 1864 ciEnv* env = ciEnv::current(); 1865 if (env != nullptr && env->task() != nullptr) { ------------- PR Comment: https://git.openjdk.org/jdk/pull/21684#issuecomment-2436491825 From dlong at openjdk.org Thu Oct 24 23:24:04 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 24 Oct 2024 23:24:04 GMT Subject: RFR: 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators() In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 17:10:42 GMT, Aleksey Shipilev wrote: > Noticed this while looking at Leyden profiles. C2 seems to spend considerable time doing in this loop. The disassembly shows this loop is fairly hot. Replacing the initialization with memset, while touching more memory, is apparently faster. memset is also what we normally do around C2 for arena-allocated data. We seem to touch a lot of these structs later on, so pulling them to cache with memset is likely "free". > > It also looks like current initialization misses initializing the last element (at `C->unique()+1`). > > I'll put performance data in separate comment. src/hotspot/share/opto/domgraph.cpp line 413: > 411: // Note: Tarjan uses 1-based arrays > 412: NTarjan *ntarjan = NEW_RESOURCE_ARRAY(NTarjan,C->unique()+1); > 413: // Initialize all fields for safety. This is also a performance optimization so the comment should probably say so, so we remember not to change it back to a loop later on. It might be nice to have NEW_RESOURCE_ARRAY do the initialization for us, like calloc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21690#discussion_r1815802767 From dlong at openjdk.org Fri Oct 25 00:49:16 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 25 Oct 2024 00:49:16 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 22 Oct 2024 07:19:54 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > changes to NoOverflowInt for Dean BTW, another place where a user-defined class like NoOverflowInt might be useful is compiler invocation counters. We store them as integers in various places, increment and scale them, check for saturation, and convert them to floating point for comparison. The last time I tried to fix a bug in that area, I was wishing for a better way to propagate integer saturation through multiple operations (like how NoOverflowInt uses NaN), and was tempted to change everything to floating point so I could use infinity. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2436584905 From jbhateja at openjdk.org Fri Oct 25 02:06:17 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Oct 2024 02:06:17 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v31] In-Reply-To: References: Message-ID: <8QrQacNaEZcyAYAcGLLzIm4xN-0G1-5BRxmSi__vMBU=.2ffa14a0-9fde-4587-9bc3-6b5bfaef33a0@github.com> On Thu, 24 Oct 2024 13:41:13 GMT, Emanuel Peter wrote: >> To exercise non memory operand pattern we need a vector operation padding layer after load vector, this will always ensure that selector pick all register operands flavor of instruction. Since its a generic limitation, do you think we should float it as a separate PR? >> >> I have created a new RFE https://bugs.openjdk.org/browse/JDK-8342959 for reference. Given that we have moved IR tests out this PR on the grounds of review complexity, lets not add more code here. > > Ok, we can file a separate RFE. Though I really have voiced 2 concerns: > - Making sure we always test `_mem` and `_reg` variants in the backend. See your https://bugs.openjdk.org/browse/JDK-8342959 > - Making sure we have tests that would detect vectors that are too long. This would require some padding between the vectors, so that we have some untouched space - and if it does get touched we know that a vector was too long. Does that make sense? This is I guess also a general concern - and would have to be applied to all vector instructions. Hi @eme64 , @PaulSandoz , in general bounds overrun problem is only pertinent to a very small portion of vector ISA which supports memory destination flavor e.g. [VCVTPS2PH](https://www.felixcloutier.com/x86/vcvtps2ph) , in all other cases we only load exact memory size into vector i.e. 4 , 8 and 16 bytes of memory will be loaded into 128 bit vector. Similarly, store vector writes exact memory size and intermediate vector operations with all register operands can operate at equally sized or lowest upper vector size for a given vector species. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1815901472 From jbhateja at openjdk.org Fri Oct 25 02:11:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Oct 2024 02:11:13 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v30] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 09:12:25 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Replacing flag based checks with CPU feature checks in IR validation test. > > Wow this is really a very moving target - quite frustrating to review - it takes up way too much of the reviewers bandwidth. You really need to split up your PRs as much as possible so that review is easier and faster. > > I think these optimizations should be done in a separate PR. I see no reason why they need to be mixed in. > > https://github.com/openjdk/jdk/commit/c56508899b000b8b1eb6755c901798a2a3685ef5 The `UMinVNode::Ideal` etc changes with IR rules. > > I also cannot easily review just such a diff, it does not let me make comments - so I still have to go find the relevant code in the whole PR. > > Some comments on this section: > > > Node* UMinVNode::Ideal(PhaseGVN* phase, bool can_reshape) { > bool match1 = in(1)->Opcode() == Op_UMinV || in(1)->Opcode() == Op_UMaxV; > bool match2 = in(2)->Opcode() == Op_UMinV || in(2)->Opcode() == Op_UMaxV; > // UMin (UMin(a, b), UMax(a, b)) => UMin(a, b) > // UMin (UMin(a, b), UMax(b, a)) => UMin(a, b) > if (match1 && match2) { > if ((in(1)->in(1) == in(2)->in(1) && in(1)->in(2) == in(2)->in(2)) || > (in(1)->in(2) == in(2)->in(1) && in(1)->in(1) == in(2)->in(2))) { > return new UMinVNode(in(1)->in(1), in(1)->in(2), vect_type()); > } > } > return nullptr; > } > > > Are we sure we do not need to verify any types in all of these cases? Maybe not - but I'd rather be super sure - not that things get misinterpreted and then folded the wrong way. > > I mean if I now approve only that diff, then I still need to approve the whole PR, which means I need to spend a lot of time on everything again. Otherwise, in theory people could smuggle anything in. Hi @eme64 , Let me know if there are other comments, else looking forward to you approval :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2436663568 From shade at openjdk.org Fri Oct 25 05:52:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Oct 2024 05:52:39 GMT Subject: RFR: 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators() [v2] In-Reply-To: References: Message-ID: > Noticed this while looking at Leyden profiles. C2 seems to spend considerable time doing in this loop. The disassembly shows this loop is fairly hot. Replacing the initialization with memset, while touching more memory, is apparently faster. memset is also what we normally do around C2 for arena-allocated data. We seem to touch a lot of these structs later on, so pulling them to cache with memset is likely "free". > > It also looks like current initialization misses initializing the last element (at `C->unique()+1`). > > I'll put performance data in separate comment. Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Better comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21690/files - new: https://git.openjdk.org/jdk/pull/21690/files/9a5953d4..981c9649 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21690&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21690&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21690/head:pull/21690 PR: https://git.openjdk.org/jdk/pull/21690 From shade at openjdk.org Fri Oct 25 05:52:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Oct 2024 05:52:39 GMT Subject: RFR: 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators() [v2] In-Reply-To: References: Message-ID: <-AAgmKeLv9vPffeG_ah69pb1oQhke0Sse-sgsBAOqRA=.b22c9c8b-f7b9-4f2a-9a35-e9e21b3b86d6@github.com> On Thu, 24 Oct 2024 23:21:17 GMT, Dean Long wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Better comment > > src/hotspot/share/opto/domgraph.cpp line 413: > >> 411: // Note: Tarjan uses 1-based arrays >> 412: NTarjan *ntarjan = NEW_RESOURCE_ARRAY(NTarjan,C->unique()+1); >> 413: // Initialize all fields for safety. > > This is also a performance optimization so the comment should probably say so, so we remember not to change it back to a loop later on. It might be nice to have NEW_RESOURCE_ARRAY do the initialization for us, like calloc. I mean, this is a common style in C2 to initialize things with `memset`, so I treated it more as "do the same thing as everywhere else", and having perf bump as a nice bonus. I blurped about performance in new comment, see new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21690#discussion_r1816033246 From qamai at openjdk.org Fri Oct 25 07:01:05 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 25 Oct 2024 07:01:05 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v3] In-Reply-To: <2FVz5G_WC9zJg2TGQ-MPmcWcSPgX5eVknTo58tm3TFI=.ca976a58-a49f-4030-9882-d52febeb82b1@github.com> References: <2FVz5G_WC9zJg2TGQ-MPmcWcSPgX5eVknTo58tm3TFI=.ca976a58-a49f-4030-9882-d52febeb82b1@github.com> Message-ID: <_NKTALxqtQeX8TeTDUbLiQE09N96oZc_U4o3ZvcdX00=.57e02d4e-4ab9-418b-9b2e-879591a34e85@github.com> On Wed, 23 Oct 2024 14:32:51 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > build fix src/hotspot/share/opto/c2_globals.hpp line 810: > 808: "of constructors") \ > 809: \ > 810: product(uintx, ShortLoopIter, 1000, \ May I ask why this is such a small value, why is it not a value that is closer to the limit of an `int` loop, such as 10^6? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1816109012 From epeter at openjdk.org Fri Oct 25 07:11:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 25 Oct 2024 07:11:20 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v32] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:36:50 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Review resolutions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Factor out IR tests and Transforms to follow-up PRs. > - Replacing flag based checks with CPU feature checks in IR validation test. > - Remove Saturating IRNode patterns. > - Restrict IR validation to newly added UMin/UMax transforms. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Prod build fix > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - New IR tests + additional IR transformations > - ... and 27 more: https://git.openjdk.org/jdk/compare/158b93d1...0e10139c Ok, I'll approve it now, since @PaulSandoz is ok with the test coverage. I did not review all the x86 backend instructions, I don't have the time or understanding for that - I rely on testing for correctness here. Thanks for the work @jatin-bhateja ? The VectorAPI is a lot of work, and we are grateful for your contributions! I am still worried about general test coverage. One can always do it later - but will that actually be done? I suppose this is a trade-off between spending time on quality vs quantity of features. Bugs in the VectorAPI will probably mostly show in miscompilations. Those are hard to catch without solid testing and rigorous result verification. The VectorAPI is also not widely used yet, so miscompilations will only slowly be discovered. Miscompilations in Auto-Vectorization are quicker discovered because there is more code out there that is sensitive to it. Maybe I'm coming across as annoying with all my RFE-splitting suggestions. But I do think that it is the job of the PR-author to make the code as reviewable and high-quality before it is sent for review. The author knows the code best, and should not burden the reviewers unnecessarily with combined features. A PR that is 2x as long is not just 2x as much work to review. It takes the reviewer more time - maybe 4x because it is harder to understand what belongs together, if everything is sufficiently tested. Also, the number of review-rounds increases. Every time the reviewer is then required to read all code again. This is really a waste of time and very frustrating. But I think you understand that now, and I am looking forward to nicely portioned PRs in the future ;) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2394395794 From amitkumar at openjdk.org Fri Oct 25 07:21:17 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 25 Oct 2024 07:21:17 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes Message-ID: We are on thin ice with the `TestOSRLotsOfLocals` test. Before it breaks, I'd like to provide the fix. :-) Testing : Tier1 test with fastdebug vm. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/21703/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21703&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342962 Stats: 21 lines in 1 file changed: 18 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21703.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21703/head:pull/21703 PR: https://git.openjdk.org/jdk/pull/21703 From mbaesken at openjdk.org Fri Oct 25 07:30:08 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 25 Oct 2024 07:30:08 GMT Subject: RFR: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: References: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> Message-ID: <_nvYOjXg-F_EgSJNfQP64D0olO0DCxURXr6nhDaREUI=.19d0b54a-6e89-4b41-bafb-9e0b81091d1b@github.com> On Thu, 24 Oct 2024 23:07:05 GMT, Dean Long wrote: > How about moving the checks for this->task() to the top of the function? Makes sense, better check just once . ------------- PR Comment: https://git.openjdk.org/jdk/pull/21684#issuecomment-2437087235 From mli at openjdk.org Fri Oct 25 07:52:10 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 25 Oct 2024 07:52:10 GMT Subject: RFR: 8342884: RISC-V: verify float <--> float16 conversion [v2] In-Reply-To: <1Tq-vOEhOgg-LNviX8abQ_6zkyQx8mvYRH9KX6XMCks=.76124782-789d-492a-96d4-42bb48081c06@github.com> References: <3STkw_TrCn1ib_wPr4NlZxahYim-9hVafE_4g7gZ8WQ=.8ad2e319-440d-43f1-b091-1b603be28e09@github.com> <1m79HSCmGtubgtikAcVZxe0R301KXln-tX9asygp1rg=.55455efd-e061-45d4-9734-990725250d4c@github.com> <1Tq-vOEhOgg-LNviX8abQ_6zkyQx8mvYRH9KX6XMCks=.76124782-789d-492a-96d4-42bb48081c06@github.com> Message-ID: On Thu, 24 Oct 2024 23:05:44 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> hw probe > > Marked as reviewed by luhenry (Committer). Thanks @luhenry @RealFYang for your reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21664#issuecomment-2437122476 From mli at openjdk.org Fri Oct 25 07:52:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 25 Oct 2024 07:52:11 GMT Subject: Integrated: 8342884: RISC-V: verify float <--> float16 conversion In-Reply-To: <3STkw_TrCn1ib_wPr4NlZxahYim-9hVafE_4g7gZ8WQ=.8ad2e319-440d-43f1-b091-1b603be28e09@github.com> References: <3STkw_TrCn1ib_wPr4NlZxahYim-9hVafE_4g7gZ8WQ=.8ad2e319-440d-43f1-b091-1b603be28e09@github.com> Message-ID: <5K_7OIIXShf_PDNVwOAiWed_qDkdG13mdSYuktCUXTc=.7011c6b0-5a2a-4472-ae8a-bd31de8bbb93@github.com> On Wed, 23 Oct 2024 13:22:32 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? it removes the `Experimental` of the `UseZvfh`. > Thanks > > Currently, only float <--> float16 conversions use Zvfh extension, I've run the jmh tests on bananapi, the performance result shows it's good. > > Benchmark-XX:+UseZfh -XX:+UnlockExperimentalVMOptions -XX:+/-UseZvfh | (size) | Mode | Cnt | Score -intrinsic | Score +intrinsic | Error | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- > Fp16ConversionBenchmark.float16ToFloat | 2048 | avgt | 10 | 8129.72 | 4729.125 | 71.937 | ns/op | 1.719 > Fp16ConversionBenchmark.float16ToFloatMemory | 2048 | avgt | 10 | 16.9 | 16.894 | 0.002 | ns/op | 1 > Fp16ConversionBenchmark.floatToFloat16 | 2048 | avgt | 10 | 12561.962 | 3767.944 | 12.652 | ns/op | 3.334 > Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | avgt | 10 | 18.146 | 18.147 | 0.003 | ns/op | 1 > > This pull request has now been integrated. Changeset: 94317dbc Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/94317dbcf26a54428c649ad0286e127bd6dab570 Stats: 6 lines in 2 files changed: 4 ins; 1 del; 1 mod 8342884: RISC-V: verify float <--> float16 conversion Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/21664 From mbaesken at openjdk.org Fri Oct 25 08:16:42 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 25 Oct 2024 08:16:42 GMT Subject: RFR: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' [v2] In-Reply-To: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> References: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> Message-ID: > When running with ubsanized binaries on Linux x86_64, > hs jtreg test compiler/startup/StartupOutput.java > showed this issue > > jdk/src/hotspot/share/ci/ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fcea0810117 in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1614 > #1 0x7fcea3123577 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fcea0c01499 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fcea09e9d85 in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fcea066da1d in Runtime1::generate_blob(BufferBlob*, C1StubId, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:233 > #5 0x7fcea066dfb0 in Runtime1::generate_blob_for(BufferBlob*, C1StubId) src/hotspot/share/c1/c1_Runtime1.cpp:262 > #6 0x7fcea066dfb0 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:272 > #7 0x7fcea03d2be1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fcea03d2be1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fcea0acc0c2 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fcea0ad9a3f in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fcea161264a in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:759 > #12 0x7fcea2ec739a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:234 > #13 0x7fcea251e1d2 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fcea7c6c6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 1b515766201d47a183932ba0c8c8bd0d9ee8755b) > #15 0x7fcea730f58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: 448a3ddd22596e1adb8fb3dec8921ed5b9d54dc2) > > So a nullptr check should be better added . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: move check up in method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21684/files - new: https://git.openjdk.org/jdk/pull/21684/files/d28c366f..f019b47f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21684&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21684&range=00-01 Stats: 10 lines in 1 file changed: 4 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21684/head:pull/21684 PR: https://git.openjdk.org/jdk/pull/21684 From lucy at openjdk.org Fri Oct 25 08:21:05 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 25 Oct 2024 08:21:05 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 06:28:02 GMT, Amit Kumar wrote: > We are on thin ice with the `TestOSRLotsOfLocals` test. Before it breaks, I'd like to provide the fix. :-) > > Testing : Tier1 test with fastdebug vm. Just some academic improvements... src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp line 144: > 142: // z_lg can only handle displacement upto 20bit signed binary integer > 143: __ load_const(Z_R0_scratch, locals_space); > 144: __ z_algr(OSR_buf, Z_R0_scratch); How about using `__ z_algfi(OSR_buf, locals_space);` instead? src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp line 166: > 164: // Z_R0 is killed by asm_assert_mem8_isnot_zero > 165: __ load_const(Z_R0_scratch, locals_space); > 166: __ z_slgr(OSR_buf, Z_R0_scratch); How about using `__ z_slgfi(OSR_buf, locals_space);` instead? ------------- Changes requested by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21703#pullrequestreview-2394545616 PR Review Comment: https://git.openjdk.org/jdk/pull/21703#discussion_r1816214796 PR Review Comment: https://git.openjdk.org/jdk/pull/21703#discussion_r1816215668 From amitkumar at openjdk.org Fri Oct 25 08:31:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 25 Oct 2024 08:31:43 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes [v2] In-Reply-To: References: Message-ID: > We are on thin ice with the `TestOSRLotsOfLocals` test. Before it breaks, I'd like to provide the fix. :-) > > Testing : Tier1 test with fastdebug vm. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: comments from Lutz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21703/files - new: https://git.openjdk.org/jdk/pull/21703/files/f230ffa3..084fac86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21703&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21703&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21703.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21703/head:pull/21703 PR: https://git.openjdk.org/jdk/pull/21703 From amitkumar at openjdk.org Fri Oct 25 08:31:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 25 Oct 2024 08:31:43 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes [v2] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 08:16:19 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> comments from Lutz > > src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp line 166: > >> 164: // Z_R0 is killed by asm_assert_mem8_isnot_zero >> 165: __ load_const(Z_R0_scratch, locals_space); >> 166: __ z_slgr(OSR_buf, Z_R0_scratch); > > How about using > `__ z_slgfi(OSR_buf, locals_space);` > instead? updated :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21703#discussion_r1816240663 From jbhateja at openjdk.org Fri Oct 25 08:59:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 25 Oct 2024 08:59:13 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v30] In-Reply-To: References: Message-ID: <5b8t9BfyScOG01ZBVh6pwUr6nssDucvq2idxtNd_7sc=.0f08654f-1e75-4c39-bcd0-0ad7e8adca44@github.com> On Tue, 22 Oct 2024 15:56:18 GMT, Paul Sandoz wrote: >> Hey @eme64 , >> >>> Wow this is really a very moving target - quite frustrating to review - it takes up way too much of the reviewers bandwidth. You really need to split up your PRs as much as possible so that review is easier and faster. >> >> I understand reviewer's pain, which is why I mentioned about last two changes specifically. Vector API related PRs generally looks bulky due to script generated sources and tests. Barring that it may not demand much of your time. >> >> But, to keep you motivated :-) and following @PaulSandoz and yours suggestions, I have moved out IR validations and Min / Max transforms to following follow up PRs. >> >> - https://bugs.openjdk.org/browse/JDK-8342676 (https://github.com/openjdk/jdk/pull/21604) >> - https://bugs.openjdk.org/browse/JDK-8342677 (https://github.com/openjdk/jdk/pull/21603) >> >> Can you kindly run this though your test infrastructure and approve if it goes fine ? >> >> Best Regards, >> Jatin > >> Can you kindly run this though your test infrastructure and approve if it goes fine ? >> > > Internal tier 1 to 3 testing passed (i needed to merge with master at 7133d1b983d, due to some updates to unrelated test configuration files the test infrastructure expects). > Ok, I'll approve it now, since @PaulSandoz is ok with the test coverage. I did not review all the x86 backend instructions, I don't have the time or understanding for that - I rely on testing for correctness here. > > Thanks for the work @jatin-bhateja ? The VectorAPI is a lot of work, and we are grateful for your contributions! > > I am still worried about general test coverage. One can always do it later - but will that actually be done? I suppose this is a trade-off between spending time on quality vs quantity of features. Bugs in the VectorAPI will probably mostly show in miscompilations. Those are hard to catch without solid testing and rigorous result verification. The VectorAPI is also not widely used yet, so miscompilations will only slowly be discovered. Miscompilations in Auto-Vectorization are quicker discovered because there is more code out there that is sensitive to it. > > Maybe I'm coming across as annoying with all my RFE-splitting suggestions. But I do think that it is the job of the PR-author to make the code as reviewable and high-quality before it is sent for review. The author knows the code best, and should not burden the reviewers unnecessarily with combined features. A PR that is 2x as long is not just 2x as much work to review. It takes the reviewer more time - maybe 4x because it is harder to understand what belongs together, if everything is sufficiently tested. Also, the number of review-rounds increases. Every time the reviewer is then required to read all code again. This is really a waste of time and very frustrating. But I think you understand that now, and I am looking forward to nicely portioned PRs in the future ;) We are in same boat @eme64 , I understand reviewers pain, and no debate on coverage and validations ? @sviswa7 , I guess we need a cursory re-approval from you for integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2437252005 From aph at openjdk.org Fri Oct 25 10:18:14 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 25 Oct 2024 10:18:14 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: <4oTnnVeBbxCTfBDoQnldpIyHh8GlPcjXwVlmaPQPrrw=.5243b504-e336-4ff2-bb59-525766d78a34@github.com> References: <4oTnnVeBbxCTfBDoQnldpIyHh8GlPcjXwVlmaPQPrrw=.5243b504-e336-4ff2-bb59-525766d78a34@github.com> Message-ID: On Fri, 7 Jun 2024 14:38:55 GMT, Amit Kumar wrote: > But if I revert the changes I had done, then it passes. Same situation I'm facing on s390x. Is this expected ? > > failure log: [type_profile_failure.log](https://github.com/user-attachments/files/15741205/type_profile_failure.log) Sorry for necro-posting, but I saw that there had never been a reply to this one. The IR tests that are faliing count the number of CMP nodes in a type check. When we disable the use of secondary_super_cache in C2, we reduce the number of CMP nodes, because we are no longer checking the secondary_super_cache. This is failure OK for now, because it never triggers without diagnostic VM options, but when we remove the secondary_super_cache altogether this test will have to be revised. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2437413279 From rrich at openjdk.org Fri Oct 25 10:22:12 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 25 Oct 2024 10:22:12 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v3] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 16:54:38 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Preserve offset in copied uc_mcontext in store_context. I've done a little bit of adhoc testing. Seems to work now. We could add a new test to test/hotspot/jtreg/runtime/ErrorHandling but on the other hand I don't think it is really necessary. Cheers, Richard. src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 551: > 549: st->cr(); > 550: st->cr(); > 551: for (int i = 0; i < 16; ++i) { Better check if `fpregs` points into `*uc` before dereferencing it (asserting that it actually does). src/hotspot/share/utilities/debug.cpp line 735: > 733: #elif defined(AMD64) > 734: // In the copied version, fpregs should point to the copied contents. Preserve the offset. > 735: intptr_t fpregs_offset = (address)(void*)(((const ucontext_t*)context)->uc_mcontext.fpregs) - (address)context; Reads better and is safer: Suggestion: size_t fpregs_offset = pointer_delta(((const ucontext_t*)context)->uc_mcontext.fpregs, context, 1); ------------- PR Review: https://git.openjdk.org/jdk/pull/21615#pullrequestreview-2394826913 PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1816418234 PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1816399838 From mdoerr at openjdk.org Fri Oct 25 11:01:42 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Oct 2024 11:01:42 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v4] In-Reply-To: References: Message-ID: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Check uc->uc_mcontext.fpregs sanity. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21615/files - new: https://git.openjdk.org/jdk/pull/21615/files/6f9ed359..76c45d6a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=02-03 Stats: 12 lines in 2 files changed: 7 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21615.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21615/head:pull/21615 PR: https://git.openjdk.org/jdk/pull/21615 From mdoerr at openjdk.org Fri Oct 25 11:01:43 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Oct 2024 11:01:43 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v3] In-Reply-To: References: Message-ID: <_SNqyqLMR9Dn5NOvJIKxUw2Ulw62NAcwfEfAJ-kAd60=.834111c4-6738-4bce-8ea1-2a377fa424d7@github.com> On Fri, 25 Oct 2024 10:16:45 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Preserve offset in copied uc_mcontext in store_context. > > src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 551: > >> 549: st->cr(); >> 550: st->cr(); >> 551: for (int i = 0; i < 16; ++i) { > > Better check if `fpregs` points into `*uc` before dereferencing it (asserting that it actually does). An assertion is not ideal. We don't want to run into it while printing an hs_err file. I've added a sanity check and print another message instead if it's bad. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1816479415 From mdoerr at openjdk.org Fri Oct 25 11:10:46 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Oct 2024 11:10:46 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v5] In-Reply-To: References: Message-ID: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Check should use >=. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21615/files - new: https://git.openjdk.org/jdk/pull/21615/files/76c45d6a..01d367f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21615.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21615/head:pull/21615 PR: https://git.openjdk.org/jdk/pull/21615 From mdoerr at openjdk.org Fri Oct 25 11:10:46 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Oct 2024 11:10:46 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v4] In-Reply-To: References: Message-ID: <4GrGcg5y3Ip-O1VBgFzRagw4ckr-tQAvKAzW-ydaRqI=.694dde39-ec71-44a1-808b-2b1bd0905074@github.com> On Fri, 25 Oct 2024 11:01:42 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Check uc->uc_mcontext.fpregs sanity. I only don't like that pointer_delta contains an assertion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2437506397 From mli at openjdk.org Fri Oct 25 11:33:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 25 Oct 2024 11:33:34 GMT Subject: RFR: 8343060: RISC-V: enable TestFloat16VectorConvChain for riscv Message-ID: <_iybfojWLLjELNamXwZGHkZ6AjmyouEqa4HyHjL4tdA=.658f1422-8fdc-4d70-aed3-e115945f1181@github.com> Hi, Can you help to review this simple patch? Both JDK-8336827 and JDK-8335860 modified the test in the wrong way for riscv, need to fix it for riscv. Thanks! ------------- Commit messages: - initial commit - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Revert "initial commit" - initial commit Changes: https://git.openjdk.org/jdk/pull/21709/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21709&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343060 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21709.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21709/head:pull/21709 PR: https://git.openjdk.org/jdk/pull/21709 From mli at openjdk.org Fri Oct 25 11:48:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 25 Oct 2024 11:48:34 GMT Subject: RFR: 8343063: RISC-V: remove redundant reg copy in generate_resolve_blob Message-ID: Hi, Can you help to review this simple patch? Seems to me, copy from t0 to t1 is nor necessary, but I could be wrong. Test running in progress. Thanks ------------- Commit messages: - initial commit - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Revert "initial commit" - initial commit Changes: https://git.openjdk.org/jdk/pull/21710/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21710&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343063 Stats: 5 lines in 1 file changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21710.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21710/head:pull/21710 PR: https://git.openjdk.org/jdk/pull/21710 From fyang at openjdk.org Fri Oct 25 12:11:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 25 Oct 2024 12:11:04 GMT Subject: RFR: 8343063: RISC-V: remove redundant reg copy in generate_resolve_blob In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 11:43:51 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Seems to me, copy from t0 to t1 is nor necessary, but I could be wrong. > > Test running in progress. > > Thanks I see the register copy is there after JDK-8340241. Seems fine to me. @robehn might want to take a look. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21710#pullrequestreview-2395090331 From fyang at openjdk.org Fri Oct 25 12:41:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 25 Oct 2024 12:41:06 GMT Subject: RFR: 8343060: RISC-V: enable TestFloat16VectorConvChain for riscv In-Reply-To: <_iybfojWLLjELNamXwZGHkZ6AjmyouEqa4HyHjL4tdA=.658f1422-8fdc-4d70-aed3-e115945f1181@github.com> References: <_iybfojWLLjELNamXwZGHkZ6AjmyouEqa4HyHjL4tdA=.658f1422-8fdc-4d70-aed3-e115945f1181@github.com> Message-ID: <_f1ijmOvW1965Sjp25dBeWrTpUlr_2tt9fSe8SL9lOM=.6f9471fe-4676-41d0-8b14-d27d6009a63e@github.com> On Fri, 25 Oct 2024 11:28:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Both JDK-8336827 and JDK-8335860 modified the test in the wrong way for riscv, need to fix it for riscv. > Thanks! LGTM assuming this test will be selected and pass with `zvfh` extension after this change. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21709#pullrequestreview-2395168358 From rehn at openjdk.org Fri Oct 25 12:42:04 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 25 Oct 2024 12:42:04 GMT Subject: RFR: 8343063: RISC-V: remove redundant reg copy in generate_resolve_blob In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 11:43:51 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Seems to me, copy from t0 to t1 is nor necessary, but I could be wrong. > > Test running in progress. > > Thanks Yes, seems fine! Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21710#pullrequestreview-2395170230 From rehn at openjdk.org Fri Oct 25 12:45:05 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 25 Oct 2024 12:45:05 GMT Subject: RFR: 8343063: RISC-V: remove redundant reg copy in generate_resolve_blob In-Reply-To: References: Message-ID: <9BTmEXNP4xICBqH7lo2AyBpaRuU2kt3VrpmdHklFkp0=.d48e154d-dfb7-4906-bfc4-950d854f5a3b@github.com> On Fri, 25 Oct 2024 12:39:24 GMT, Robbin Ehn wrote: >> Hi, >> Can you help to review this simple patch? >> Seems to me, copy from t0 to t1 is nor necessary, but I could be wrong. >> >> Test running in progress. >> >> Thanks > > Yes, seems fine! Thanks! > I see the register copy is there after JDK-8340241. Seems fine to me. @robehn might want to take a look. I'm not so worried about register copies when register are not used in paralllel. It should be a plain register rename from t0->t1 for the CPU. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21710#issuecomment-2437678061 From mli at openjdk.org Fri Oct 25 12:57:16 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 25 Oct 2024 12:57:16 GMT Subject: RFR: 8343063: RISC-V: remove redundant reg copy in generate_resolve_blob In-Reply-To: <9BTmEXNP4xICBqH7lo2AyBpaRuU2kt3VrpmdHklFkp0=.d48e154d-dfb7-4906-bfc4-950d854f5a3b@github.com> References: <9BTmEXNP4xICBqH7lo2AyBpaRuU2kt3VrpmdHklFkp0=.d48e154d-dfb7-4906-bfc4-950d854f5a3b@github.com> Message-ID: On Fri, 25 Oct 2024 12:42:20 GMT, Robbin Ehn wrote: > > I see the register copy is there after JDK-8340241. Seems fine to me. @robehn might want to take a look. > > I'm not so worried about register copies when register are not used in paralllel. It should be a plain register rename from t0->t1 for the CPU. In that sense you have the point. It makes the code clear by removing the unnecessary copy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21710#issuecomment-2437703076 From mli at openjdk.org Fri Oct 25 13:02:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 25 Oct 2024 13:02:04 GMT Subject: RFR: 8343060: RISC-V: enable TestFloat16VectorConvChain for riscv In-Reply-To: <_f1ijmOvW1965Sjp25dBeWrTpUlr_2tt9fSe8SL9lOM=.6f9471fe-4676-41d0-8b14-d27d6009a63e@github.com> References: <_iybfojWLLjELNamXwZGHkZ6AjmyouEqa4HyHjL4tdA=.658f1422-8fdc-4d70-aed3-e115945f1181@github.com> <_f1ijmOvW1965Sjp25dBeWrTpUlr_2tt9fSe8SL9lOM=.6f9471fe-4676-41d0-8b14-d27d6009a63e@github.com> Message-ID: On Fri, 25 Oct 2024 12:38:48 GMT, Fei Yang wrote: > LGTM assuming this test will be selected and pass with `zvfh` extension after this change. Thanks! Yes, it's selected and passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21709#issuecomment-2437714657 From lucy at openjdk.org Fri Oct 25 13:56:06 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 25 Oct 2024 13:56:06 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes [v2] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 08:31:43 GMT, Amit Kumar wrote: >> We are on thin ice with the `TestOSRLotsOfLocals` test. Before it breaks, I'd like to provide the fix. :-) >> >> Testing : Tier1 test with fastdebug vm. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > comments from Lutz Looks good now. Yet another idea, though. src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp line 139: > 137: const int locals_space = BytesPerWord * method() -> max_locals(); > 138: int monitor_offset = locals_space + (2 * BytesPerWord) * (number_of_locks - 1); > 139: bool handled_manually = false; Sorry, another idea (no need to accept it): bool large_offset = ! Immediate::is_simm20(monitor_offset + BytesPerWord) && number_of_locks > 0; if (large_offset) { . . . ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21703#pullrequestreview-2395364719 PR Review Comment: https://git.openjdk.org/jdk/pull/21703#discussion_r1816728916 From roland at openjdk.org Fri Oct 25 14:14:40 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 25 Oct 2024 14:14:40 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied Message-ID: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> The transformation: (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) when i fits in an int is not always applied: when the type of `i` is narrowed so it fits in an int, the `CastX2P` is not enqueued for igvn. This can get in the way of vectorization as shown by test case `test2`. ------------- Commit messages: - test & fix Changes: https://git.openjdk.org/jdk/pull/21714/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21714&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343068 Stats: 91 lines in 3 files changed: 91 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21714.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21714/head:pull/21714 PR: https://git.openjdk.org/jdk/pull/21714 From rcastanedalo at openjdk.org Fri Oct 25 14:24:04 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 25 Oct 2024 14:24:04 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied In-Reply-To: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: On Fri, 25 Oct 2024 14:09:48 GMT, Roland Westrelin wrote: > The transformation: > > > (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) > > > when i fits in an int is not always applied: when the type of `i` is > narrowed so it fits in an int, the `CastX2P` is not enqueued for > igvn. This can get in the way of vectorization as shown by test case > `test2`. test/hotspot/jtreg/compiler/c2/TestCastX2NotProcessedIGVN.java line 1: > 1: package compiler.c2;/* Please start the copyright header in a new line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21714#discussion_r1816780802 From mli at openjdk.org Fri Oct 25 14:31:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 25 Oct 2024 14:31:11 GMT Subject: Integrated: 8343060: RISC-V: enable TestFloat16VectorConvChain for riscv In-Reply-To: <_iybfojWLLjELNamXwZGHkZ6AjmyouEqa4HyHjL4tdA=.658f1422-8fdc-4d70-aed3-e115945f1181@github.com> References: <_iybfojWLLjELNamXwZGHkZ6AjmyouEqa4HyHjL4tdA=.658f1422-8fdc-4d70-aed3-e115945f1181@github.com> Message-ID: On Fri, 25 Oct 2024 11:28:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Both JDK-8336827 and JDK-8335860 modified the test in the wrong way for riscv, need to fix it for riscv. > Thanks! This pull request has now been integrated. Changeset: 4f8f395e Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/4f8f395e2bb692148e2b891198f28a579749dd6d Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8343060: RISC-V: enable TestFloat16VectorConvChain for riscv Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/21709 From mli at openjdk.org Fri Oct 25 14:32:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 25 Oct 2024 14:32:09 GMT Subject: RFR: 8343063: RISC-V: remove redundant reg copy in generate_resolve_blob In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 12:08:49 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this simple patch? >> Seems to me, copy from t0 to t1 is nor necessary, but I could be wrong. >> >> Test running in progress. >> >> Thanks > > I see the register copy is there after JDK-8340241. Seems fine to me. > @robehn might want to take a look. Thanks @RealFYang @robehn for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21710#issuecomment-2437971182 From mli at openjdk.org Fri Oct 25 14:32:10 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 25 Oct 2024 14:32:10 GMT Subject: Integrated: 8343063: RISC-V: remove redundant reg copy in generate_resolve_blob In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 11:43:51 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Seems to me, copy from t0 to t1 is nor necessary, but I could be wrong. > > Test running in progress. > > Thanks This pull request has now been integrated. Changeset: 1e35da8d Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/1e35da8d3341ed1af266e5b59aa90bfcfae6576a Stats: 5 lines in 1 file changed: 0 ins; 1 del; 4 mod 8343063: RISC-V: remove redundant reg copy in generate_resolve_blob Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/21710 From mli at openjdk.org Fri Oct 25 14:50:20 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 25 Oct 2024 14:50:20 GMT Subject: RFR: 8343070: Enable is_trace_align_vector when TraceSuperWord is set Message-ID: Hi, Can you help to review this simple patch? Currently, in SuperWord::filter_packs_for_alignment(), there is some log not turned on when TraceSuperWord is set, but I think it should, as it's more convenient for users to debug. Thanks! ------------- Commit messages: - initial commit - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Revert "initial commit" - initial commit Changes: https://git.openjdk.org/jdk/pull/21715/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21715&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343070 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21715.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21715/head:pull/21715 PR: https://git.openjdk.org/jdk/pull/21715 From roland at openjdk.org Fri Oct 25 15:09:50 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 25 Oct 2024 15:09:50 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied [v2] In-Reply-To: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: > The transformation: > > > (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) > > > when i fits in an int is not always applied: when the type of `i` is > narrowed so it fits in an int, the `CastX2P` is not enqueued for > igvn. This can get in the way of vectorization as shown by test case > `test2`. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21714/files - new: https://git.openjdk.org/jdk/pull/21714/files/19fd4179..12a471f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21714&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21714&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21714.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21714/head:pull/21714 PR: https://git.openjdk.org/jdk/pull/21714 From roland at openjdk.org Fri Oct 25 15:09:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 25 Oct 2024 15:09:51 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied [v2] In-Reply-To: References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: On Fri, 25 Oct 2024 14:21:28 GMT, Roberto Casta?eda Lozano wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> fix test > > test/hotspot/jtreg/compiler/c2/TestCastX2NotProcessedIGVN.java line 1: > >> 1: package compiler.c2;/* > > Please start the copyright header in a new line. Right! fixed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21714#discussion_r1816865340 From amitkumar at openjdk.org Fri Oct 25 15:10:20 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 25 Oct 2024 15:10:20 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes [v3] In-Reply-To: References: Message-ID: > We are on thin ice with the `TestOSRLotsOfLocals` test. Before it breaks, I'd like to provide the fix. :-) > > Testing : Tier1 test with fastdebug vm. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: more comments from lutz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21703/files - new: https://git.openjdk.org/jdk/pull/21703/files/084fac86..cba88536 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21703&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21703&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21703.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21703/head:pull/21703 PR: https://git.openjdk.org/jdk/pull/21703 From amitkumar at openjdk.org Fri Oct 25 15:14:07 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 25 Oct 2024 15:14:07 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes [v2] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 13:52:12 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> comments from Lutz > > src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp line 139: > >> 137: const int locals_space = BytesPerWord * method() -> max_locals(); >> 138: int monitor_offset = locals_space + (2 * BytesPerWord) * (number_of_locks - 1); >> 139: bool handled_manually = false; > > Sorry, another idea (no need to accept it): > > bool large_offset = ! Immediate::is_simm20(monitor_offset + BytesPerWord) && number_of_locks > 0; > > if (large_offset) { > . . . done ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21703#discussion_r1816891970 From roland at openjdk.org Fri Oct 25 15:19:25 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 25 Oct 2024 15:19:25 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v2] In-Reply-To: References: Message-ID: > Superword creates a `Replicate` node at a `ConvL2I` node and uses the > type of the result of the `ConvL2I` to pick the type of the > `Replicate` instead of the type of the input to the `ConvL2I`. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21660/files - new: https://git.openjdk.org/jdk/pull/21660/files/4678caf2..1070696f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21660&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21660&range=00-01 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21660.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21660/head:pull/21660 PR: https://git.openjdk.org/jdk/pull/21660 From roland at openjdk.org Fri Oct 25 15:19:25 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 25 Oct 2024 15:19:25 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v2] In-Reply-To: References: Message-ID: <0ZNMBY251BeinURF92b9a1Q6sE9Lz3bzGm6T51nTIBI=.d17f0450-dddc-4133-b3f9-b6192a3983e7@github.com> On Thu, 24 Oct 2024 16:33:25 GMT, Vladimir Kozlov wrote: >> @vnkozlov You can very easily see how it goes with my `Test4` above, I split the things onto different lines so we can see what is from where easily. >> >> The pack that `p0` belongs to is a `ConvL2I` pack. In my case, I have an `short[]`, just to make things even more interesting. Since the type is propagated from use -> def, the output of the `ConvL2I` is interpreted as a `short`, it is essentially a truncated `int`. `velt_basic_type(p0) == T_SHORT`. The vector node should be a `VectorCastL2X === _ 873 [[ ]] #vectors`, i.e. casting from long-vector to short-vector. >> >> But now we see that the input to the pack of `p0` is all the same, and so we want to introduce a `Replicate`. We should of course replicate for `long`. But `velt_basic_type(p0) == T_SHORT` - so you get a `Replicate === _ 717 [[ ]] #vectorx`, and then eventually a `VectorCastS2X === _ 890 [[ ]] #vectors`... but of course the AD file has no matching node for a VectorCast from short to short -> `bad AD file`. >> >> The issue is really that `velt_basic_type(p0)` gives us the output-type, but we actually would need the input-type. In almost all cases input-type == output-type. But of course that does not hold with Convert. >> >> With Roland's fix, we now ask for the output-type of the `ConvL2I`'s input. That is the same as asking for the `ConvL2I`'s input-type. That way, we know what type to Replicate for - the `element_type`. >> >> @rwestrel given that @vnkozlov also did not right away understand what is going on, I think you need to properly explain what happens in the comments ;) > > Thank you for explanation. Yes, to have comment would be nice. I added a comment. Does it look ok to you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21660#discussion_r1816904205 From rrich at openjdk.org Fri Oct 25 15:20:12 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 25 Oct 2024 15:20:12 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v4] In-Reply-To: <4GrGcg5y3Ip-O1VBgFzRagw4ckr-tQAvKAzW-ydaRqI=.694dde39-ec71-44a1-808b-2b1bd0905074@github.com> References: <4GrGcg5y3Ip-O1VBgFzRagw4ckr-tQAvKAzW-ydaRqI=.694dde39-ec71-44a1-808b-2b1bd0905074@github.com> Message-ID: On Fri, 25 Oct 2024 11:07:31 GMT, Martin Doerr wrote: > I only don't like that pointer_delta contains an assertion. Hm, the code gets bloated if you try to avoid it. Maybe it's acceptable(?). Avoiding it could look like this: https://github.com/reinrich/jdk/commit/4097a7fdccd0e393d5e9e2bdb6b3725e5bed4ae6 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2438098965 From roland at openjdk.org Fri Oct 25 15:23:05 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 25 Oct 2024 15:23:05 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v3] In-Reply-To: <_NKTALxqtQeX8TeTDUbLiQE09N96oZc_U4o3ZvcdX00=.57e02d4e-4ab9-418b-9b2e-879591a34e85@github.com> References: <2FVz5G_WC9zJg2TGQ-MPmcWcSPgX5eVknTo58tm3TFI=.ca976a58-a49f-4030-9882-d52febeb82b1@github.com> <_NKTALxqtQeX8TeTDUbLiQE09N96oZc_U4o3ZvcdX00=.57e02d4e-4ab9-418b-9b2e-879591a34e85@github.com> Message-ID: On Fri, 25 Oct 2024 06:58:36 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> build fix > > src/hotspot/share/opto/c2_globals.hpp line 810: > >> 808: "of constructors") \ >> 809: \ >> 810: product(uintx, ShortLoopIter, 1000, \ > > May I ask why this is such a small value, why is it not a value that is closer to the limit of an `int` loop, such as 10^6? 1000 happens to be the loop strip mining limit too. So by restricting the number of iterations to 1000, we also get rid of the outer loop for loop strip mining. It could make sense to have a similar mechanism for loop strip mining and then have long counted loops and loop strip mined loops have their own tests for what's a small number of iterations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1816910154 From mdoerr at openjdk.org Fri Oct 25 15:41:08 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Oct 2024 15:41:08 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v5] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 11:10:46 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Check should use >=. Maybe you're right and the assertion is acceptable. We should never get a broken context. And if it ever happens we may live with the assertion. Product builds should be fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2438151502 From rrich at openjdk.org Fri Oct 25 16:03:09 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 25 Oct 2024 16:03:09 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v5] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 11:10:46 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Check should use >=. src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 547: > 545: st->cr(); > 546: st->print(" TRAPNO=" INTPTR_FORMAT, (intptr_t)uc->uc_mcontext.gregs[REG_TRAPNO]); > 547: #ifndef MUSL_LIBC Isn't it compiling on MUSL? `ucontext_t` and `mcontext_t` seem to be equal looking at https://git.musl-libc.org/cgit/musl/tree/arch/x86_64/bits/signal.h Also the context comes from the kernel and should have the same layout I thought. If it isn't working then I guess the changes in debug.cpp won't work for MUSL either ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1816971440 From lucy at openjdk.org Fri Oct 25 16:19:12 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 25 Oct 2024 16:19:12 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes [v3] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 15:10:20 GMT, Amit Kumar wrote: >> We are on thin ice with the `TestOSRLotsOfLocals` test. Before it breaks, I'd like to provide the fix. :-) >> >> Testing : Tier1 test with fastdebug vm. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > more comments from lutz Wonderful! ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21703#pullrequestreview-2395783168 From mdoerr at openjdk.org Fri Oct 25 17:21:21 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Oct 2024 17:21:21 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v6] In-Reply-To: References: Message-ID: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Enable on MUSL. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21615/files - new: https://git.openjdk.org/jdk/pull/21615/files/01d367f6..b949136b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21615.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21615/head:pull/21615 PR: https://git.openjdk.org/jdk/pull/21615 From mdoerr at openjdk.org Fri Oct 25 17:21:21 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Oct 2024 17:21:21 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v5] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 16:00:06 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Check should use >=. > > src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 547: > >> 545: st->cr(); >> 546: st->print(" TRAPNO=" INTPTR_FORMAT, (intptr_t)uc->uc_mcontext.gregs[REG_TRAPNO]); >> 547: #ifndef MUSL_LIBC > > Isn't it compiling on MUSL? `ucontext_t` and `mcontext_t` seem to be equal looking at https://git.musl-libc.org/cgit/musl/tree/arch/x86_64/bits/signal.h > Also the context comes from the kernel and should have the same layout I thought. > If it isn't working then I guess the changes in debug.cpp won't work for MUSL either ;) Let's give it a try. See Commit nr. 6. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1817079374 From kvn at openjdk.org Fri Oct 25 18:29:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Oct 2024 18:29:05 GMT Subject: RFR: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' [v2] In-Reply-To: References: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> Message-ID: On Fri, 25 Oct 2024 08:16:42 GMT, Matthias Baesken wrote: >> When running with ubsanized binaries on Linux x86_64, >> hs jtreg test compiler/startup/StartupOutput.java >> showed this issue >> >> jdk/src/hotspot/share/ci/ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' >> #0 0x7fcea0810117 in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1614 >> #1 0x7fcea3123577 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 >> #2 0x7fcea0c01499 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 >> #3 0x7fcea09e9d85 in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 >> #4 0x7fcea066da1d in Runtime1::generate_blob(BufferBlob*, C1StubId, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:233 >> #5 0x7fcea066dfb0 in Runtime1::generate_blob_for(BufferBlob*, C1StubId) src/hotspot/share/c1/c1_Runtime1.cpp:262 >> #6 0x7fcea066dfb0 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:272 >> #7 0x7fcea03d2be1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 >> #8 0x7fcea03d2be1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 >> #9 0x7fcea0acc0c2 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 >> #10 0x7fcea0ad9a3f in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 >> #11 0x7fcea161264a in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:759 >> #12 0x7fcea2ec739a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:234 >> #13 0x7fcea251e1d2 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 >> #14 0x7fcea7c6c6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 1b515766201d47a183932ba0c8c8bd0d9ee8755b) >> #15 0x7fcea730f58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: 448a3ddd22596e1adb8fb3dec8921ed5b9d54dc2) >> >> So a nullptr check should be better added . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > move check up in method I think we should do the check in `VMError::report_and_die()` to avoid creating empty replay file. Note, `dump_replay_data_unsafe()` is called only in that one place. An other path through `dump_replay_data()` call required Compilation ID which is set only when we have task. We can use assert instead of check in `ciEnv::dump_replay_data_helper()`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21684#issuecomment-2438548139 From kvn at openjdk.org Fri Oct 25 19:28:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Oct 2024 19:28:11 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied [v2] In-Reply-To: References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: On Fri, 25 Oct 2024 15:09:50 GMT, Roland Westrelin wrote: >> The transformation: >> >> >> (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) >> >> >> when i fits in an int is not always applied: when the type of `i` is >> narrowed so it fits in an int, the `CastX2P` is not enqueued for >> igvn. This can get in the way of vectorization as shown by test case >> `test2`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > fix test Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21714#pullrequestreview-2396232429 From kvn at openjdk.org Fri Oct 25 19:31:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Oct 2024 19:31:04 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v2] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 15:19:25 GMT, Roland Westrelin wrote: >> Superword creates a `Replicate` node at a `ConvL2I` node and uses the >> type of the result of the `ConvL2I` to pick the type of the >> `Replicate` instead of the type of the input to the `ConvL2I`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21660#pullrequestreview-2396236888 From cslucas at openjdk.org Fri Oct 25 19:36:08 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 25 Oct 2024 19:36:08 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" In-Reply-To: <9wsyRJSf120rDZVr39RLRateAT7_JNByxLIdVUx9sgo=.a8f4f772-802b-4ced-878a-3d6c7ea5026d@github.com> References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> <9wsyRJSf120rDZVr39RLRateAT7_JNByxLIdVUx9sgo=.a8f4f772-802b-4ced-878a-3d6c7ea5026d@github.com> Message-ID: On Tue, 22 Oct 2024 07:18:19 GMT, Tobias Hartmann wrote: > Does your test also reproduce the NPE example that @chhagedorn triggered with his [Test3.java](https://bugs.openjdk.org/secure/attachment/111067/Test3.java)? If not, I think that one should be added as well. The test I added in this PR is based on @chhagedorn Test3.java. I was able to reproduce the issue on my end fairly easily. > Now that the -XX:+StressUnstableIfTraps option was added with [JDK-8335334](https://bugs.openjdk.org/browse/JDK-8335334), could you please also add [Test.java](https://bugs.openjdk.org/secure/attachment/111040/Test.java) that I used originally to reproduce the issue? I wasn't able to reproduce the issue using Test.java, how often does it reproduce for you with the flags that you listed at the top of Test.java ? Thanks for testing @TobiHartmann ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21624#issuecomment-2438651634 From dlong at openjdk.org Fri Oct 25 21:07:11 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 25 Oct 2024 21:07:11 GMT Subject: RFR: 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators() [v2] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 05:52:39 GMT, Aleksey Shipilev wrote: >> Noticed this while looking at Leyden profiles. C2 seems to spend considerable time doing in this loop. The disassembly shows this loop is fairly hot. Replacing the initialization with memset, while touching more memory, is apparently faster. memset is also what we normally do around C2 for arena-allocated data. We seem to touch a lot of these structs later on, so pulling them to cache with memset is likely "free". >> >> It also looks like current initialization misses initializing the last element (at `C->unique()`). >> >> I'll put performance data in separate comment. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Better comment Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21690#pullrequestreview-2396486266 From sviswanathan at openjdk.org Fri Oct 25 21:57:27 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 25 Oct 2024 21:57:27 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v32] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:36:50 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Review resolutions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Factor out IR tests and Transforms to follow-up PRs. > - Replacing flag based checks with CPU feature checks in IR validation test. > - Remove Saturating IRNode patterns. > - Restrict IR validation to newly added UMin/UMax transforms. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Prod build fix > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - New IR tests + additional IR transformations > - ... and 27 more: https://git.openjdk.org/jdk/compare/158b93d1...0e10139c Changes look good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2396570637 From sviswanathan at openjdk.org Fri Oct 25 22:01:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 25 Oct 2024 22:01:12 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v30] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:56:18 GMT, Paul Sandoz wrote: >> Hey @eme64 , >> >>> Wow this is really a very moving target - quite frustrating to review - it takes up way too much of the reviewers bandwidth. You really need to split up your PRs as much as possible so that review is easier and faster. >> >> I understand reviewer's pain, which is why I mentioned about last two changes specifically. Vector API related PRs generally looks bulky due to script generated sources and tests. Barring that it may not demand much of your time. >> >> But, to keep you motivated :-) and following @PaulSandoz and yours suggestions, I have moved out IR validations and Min / Max transforms to following follow up PRs. >> >> - https://bugs.openjdk.org/browse/JDK-8342676 (https://github.com/openjdk/jdk/pull/21604) >> - https://bugs.openjdk.org/browse/JDK-8342677 (https://github.com/openjdk/jdk/pull/21603) >> >> Can you kindly run this though your test infrastructure and approve if it goes fine ? >> >> Best Regards, >> Jatin > >> Can you kindly run this though your test infrastructure and approve if it goes fine ? >> > > Internal tier 1 to 3 testing passed (i needed to merge with master at 7133d1b983d, due to some updates to unrelated test configuration files the test infrastructure expects). @PaulSandoz Looks like you also need to re-approve. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2438938409 From vlivanov at openjdk.org Fri Oct 25 22:23:11 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Oct 2024 22:23:11 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> Message-ID: On Thu, 24 Oct 2024 02:05:58 GMT, Jatin Bhateja wrote: >> Another reason is that lowering being done late allows us to have more freedom to break some invariants of the nodes, such as looking through `VectorReinterpret`. An example is this (really crafted) case: >> >> Int256Vector v; >> int a = v.lane(5); >> float b = v.reinterpretAsFloats().lane(7); >> >> This would be transformed into: >> >> vector v; >> vector v0 = VectorExtract(v, 1); >> int a = ExtractI(v0, 1); >> vector v1 = VectorReinterpret(v, ); >> vector v2 = VectorExtract(v1, 1); >> float b = ExtractF(v2, 3); >> >> By allowing lowering to look through `VectorReinterpret` and break the invariant of `Extract` nodes that the element types of their inputs and outputs must be the same, we can `gvn` `v1` and `v`, `v2` and `v0`. Simplify the graph: >> >> vector v; >> vector v0 = VectorExtract(v, 1); >> int a = ExtractI(v0, 1); >> float b = ExtractF(v0, 3); > >> Because lowering is a transformation that increases the complexity of the graph. >> >> * A `d = ExtractD(z, 4)` expanded into `x = VectorExtract(z, 2); d = ExtractD(x, 0)` increases the number of nodes by 1. >> * A logic cone transformed into a `MacroLogicV` introduces another kind of node that may not be recognized by other nodes. >> >> As a result, we should do this as the last step when other transformation has finished their jobs. For the concerns regarding loop body size, we still have a function in `Matcher` for that purpose. > > Yes, you rightly pointed out, given the fact that lowering in some cases may significantly impact the graph shape it should be accounted by loop optimizations. > > Unrolling decisions are based on loop body size and a rudimentary cost model e.g. macro logic optimization which folds entire logic tree into one x86 specific lowered IR should promote unrolling. > By allowing lowering to look through VectorReinterpret and break the invariant of Extract nodes that the element types of their inputs and outputs must be the same, we can gvn v1 and v, v2 and v0. I'd warn against breaking existing IR invariants. As an example, precise type information is important to properly match generic ideal vector nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1817418966 From vlivanov at openjdk.org Fri Oct 25 23:33:05 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Oct 2024 23:33:05 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 02:14:46 GMT, Jasmine Karthikeyan wrote: >> Build changes look good (but would be slightly better without the extra blank line). I have not reviewed the actual hotspot changes. > > Thanks for looking at the build changes, @magicus! I've pushed a commit that removes the extra newline in the makefiles and adds newlines to the ends of files that were missing them. > > Thanks for taking a look as well, @merykitty and @jatin-bhateja! I've pushed a commit that should address the code suggestions and left some comments as well. @jaskarth thanks for exploring platform-specific lowering! I briefly looked through the changes, but I didn't get a good understanding of its design goals. It's hard to see what use cases it is targeted for when only skeleton code is present. It would really help if there are some cases ported on top of it for illustration purposes and to solidify the design. Currently, there are multiple places in the code where IR lowering happens. In particular: * IGVN during post loop opts phase (guarded by `Compile::post_loop_opts_phase()`) (Ideal -> Ideal); * macro expansion (Ideal -> Ideal); * ad-hoc passes (GC barrier expansion, `MacroLogicV`) (Ideal -> Ideal); * final graph reshaping (Ideal -> Ideal); * matcher (Ideal -> Mach). I'd like to understand how the new pass is intended to interact with existing cases. Only the last one is truly platform-specific, but there are some platform-specific cases exposes in other places (e.g., MacroLogicV pass, DivMod combining) guarded by some predicates on `Matcher`. As the `PhaseLowering` is implemented now, it looks like a platform-specific macro expansion pass (1->many rewriting). But `MacroLogicV` case doesn't fit such model well. I see changes to enable platform-specific node classes. As of now, only Matcher introduces platform-specific nodes and all of them are Mach nodes. Platform-specific Ideal nodes are declared in shared code, but then their usages are guarded by `Matcher::has_match_rule()` thus ensuring there's enough support on back-end side. Some random observations: * the pass is performed unconditionally and it iterates over all live nodes; in contrast, macro nodes and nodes for post-loop opts IGVN are explicitly listed on the side (MacroLogicV pass is also guarded, but by a coarse-grained check); ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2439040451 From vlivanov at openjdk.org Fri Oct 25 23:42:10 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 25 Oct 2024 23:42:10 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" In-Reply-To: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> Message-ID: <-eGvtC-tGwThPGTfRVKEySOMZZRm-E0IJKOBT78Icu4=.b03bb08b-9b37-4127-b9a7-3a8a1bf8c9fc@github.com> On Mon, 21 Oct 2024 20:27:10 GMT, Cesar Soares Lucas wrote: > Please, review this patch to fix an issue that may occur when serializing debug information related to reduce allocation merges. The problem happens when there are more than one JVMS in a `uncommon_trap` and a _younger_ JVMS doesn't have the RAM inputs as a local/expression/monitor but an older JVMS does. In that situation the loop at line 1173 of output.cpp will set the `is_root` property of the ObjectValue to `false` when processing the younger JVMS even though it may have been set to `true` when visiting the older JVMS. > > Tested on: > - Win, Mac & Linux tier1-4 on x64 & Aarch64. > - CTW with some thousands of jars. Looks good. src/hotspot/share/opto/output.cpp line 1179: > 1177: // the younger JVMS. > 1178: if (ov->is_root()) { > 1179: continue; You can either fuse `ov->is_root()` check into `is_root` computation (`bool is_root = ov->is_root() || ...`) or turn it into an `if-then-else` (`if (ov->is_root()) { /* comment */ } else { bool is_root = ...; ov->set_root(is_root); }`). I find both cases easier to read. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21624#pullrequestreview-2396740081 PR Review Comment: https://git.openjdk.org/jdk/pull/21624#discussion_r1817451037 From qamai at openjdk.org Sat Oct 26 02:37:05 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 26 Oct 2024 02:37:05 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> Message-ID: On Fri, 25 Oct 2024 22:20:00 GMT, Vladimir Ivanov wrote: >>> Because lowering is a transformation that increases the complexity of the graph. >>> >>> * A `d = ExtractD(z, 4)` expanded into `x = VectorExtract(z, 2); d = ExtractD(x, 0)` increases the number of nodes by 1. >>> * A logic cone transformed into a `MacroLogicV` introduces another kind of node that may not be recognized by other nodes. >>> >>> As a result, we should do this as the last step when other transformation has finished their jobs. For the concerns regarding loop body size, we still have a function in `Matcher` for that purpose. >> >> Yes, you rightly pointed out, given the fact that lowering in some cases may significantly impact the graph shape it should be accounted by loop optimizations. >> >> Unrolling decisions are based on loop body size and a rudimentary cost model e.g. macro logic optimization which folds entire logic tree into one x86 specific lowered IR should promote unrolling. > >> By allowing lowering to look through VectorReinterpret and break the invariant of Extract nodes that the element types of their inputs and outputs must be the same, we can gvn v1 and v, v2 and v0. > > I'd warn against breaking existing IR invariants. As an example, precise type information is important to properly match generic ideal vector nodes. I believe the matcher only needs the exact type of the node but not its inputs. E.g. it should not be an issue if we `AddVB` a `vector` and a `vector` into a `vector`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1817595300 From qamai at openjdk.org Sat Oct 26 02:41:07 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 26 Oct 2024 02:41:07 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 23:30:52 GMT, Vladimir Ivanov wrote: >> Thanks for looking at the build changes, @magicus! I've pushed a commit that removes the extra newline in the makefiles and adds newlines to the ends of files that were missing them. >> >> Thanks for taking a look as well, @merykitty and @jatin-bhateja! I've pushed a commit that should address the code suggestions and left some comments as well. > > @jaskarth thanks for exploring platform-specific lowering! > > I briefly looked through the changes, but I didn't get a good understanding of its design goals. It's hard to see what use cases it is targeted for when only skeleton code is present. It would really help if there are some cases ported on top of it for illustration purposes and to solidify the design. > > Currently, there are multiple places in the code where IR lowering happens. In particular: > * IGVN during post loop opts phase (guarded by `Compile::post_loop_opts_phase()`) (Ideal -> Ideal); > * macro expansion (Ideal -> Ideal); > * ad-hoc passes (GC barrier expansion, `MacroLogicV`) (Ideal -> Ideal); > * final graph reshaping (Ideal -> Ideal); > * matcher (Ideal -> Mach). > > I'd like to understand how the new pass is intended to interact with existing cases. > > Only the last one is truly platform-specific, but there are some platform-specific cases exposes in other places (e.g., MacroLogicV pass, DivMod combining) guarded by some predicates on `Matcher`. > > As the `PhaseLowering` is implemented now, it looks like a platform-specific macro expansion pass (1->many rewriting). But `MacroLogicV` case doesn't fit such model well. > > I see changes to enable platform-specific node classes. As of now, only Matcher introduces platform-specific nodes and all of them are Mach nodes. Platform-specific Ideal nodes are declared in shared code, but then their usages are guarded by `Matcher::has_match_rule()` thus ensuring there's enough support on back-end side. > > Some random observations: > * the pass is performed unconditionally and it iterates over all live nodes; in contrast, macro nodes and nodes for post-loop opts IGVN are explicitly listed on the side (MacroLogicV pass is also guarded, but by a coarse-grained check); @iwanowww The pass is supposed to be a generalisation of the `MacroLogicV` pass, it should be able to perform arbitrary transformations and its immediate usages are for `MacroLogicV` patterns and `vpmuludq` patterns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2439189524 From jbhateja at openjdk.org Sun Oct 27 01:25:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 27 Oct 2024 01:25:27 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> Message-ID: On Sat, 26 Oct 2024 02:34:44 GMT, Quan Anh Mai wrote: >>> By allowing lowering to look through VectorReinterpret and break the invariant of Extract nodes that the element types of their inputs and outputs must be the same, we can gvn v1 and v, v2 and v0. >> >> I'd warn against breaking existing IR invariants. As an example, precise type information is important to properly match generic ideal vector nodes. > > I believe the matcher only needs the exact type of the node but not its inputs. E.g. it should not be an issue if we `AddVB` a `vector` and a `vector` into a `vector`. Generic vector operand resolution cocretizes generic operands based on type agnostic node size, its a post matcher pass, and its job is to replace generic MachOper operand nodes with cocrete ones (vec[SDXYZ]) which holds precise register mask needed by register allocator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1817960219 From jbhateja at openjdk.org Sun Oct 27 02:00:16 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 27 Oct 2024 02:00:16 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 02:14:46 GMT, Jasmine Karthikeyan wrote: >> Build changes look good (but would be slightly better without the extra blank line). I have not reviewed the actual hotspot changes. > > Thanks for looking at the build changes, @magicus! I've pushed a commit that removes the extra newline in the makefiles and adds newlines to the ends of files that were missing them. > > Thanks for taking a look as well, @merykitty and @jatin-bhateja! I've pushed a commit that should address the code suggestions and left some comments as well. > @jaskarth thanks for exploring platform-specific lowering! > > I briefly looked through the changes, but I didn't get a good understanding of its design goals. It's hard to see what use cases it is targeted for when only skeleton code is present. It would really help if there are some cases ported on top of it for illustration purposes and to solidify the design. > > Currently, there are multiple places in the code where IR lowering happens. In particular: > > * IGVN during post loop opts phase (guarded by `Compile::post_loop_opts_phase()`) (Ideal -> Ideal); > * macro expansion (Ideal -> Ideal); > * ad-hoc passes (GC barrier expansion, `MacroLogicV`) (Ideal -> Ideal); > * final graph reshaping (Ideal -> Ideal); > * matcher (Ideal -> Mach). > > I'd like to understand how the new pass is intended to interact with existing cases. > > Only the last one is truly platform-specific, but there are some platform-specific cases exposes in other places (e.g., MacroLogicV pass, DivMod combining) guarded by some predicates on `Matcher`. > > As the `PhaseLowering` is implemented now, it looks like a platform-specific macro expansion pass (1->many rewriting). But `MacroLogicV` case doesn't fit such model well. > > I see changes to enable platform-specific node classes. As of now, only Matcher introduces platform-specific nodes and all of them are Mach nodes. Platform-specific Ideal nodes are declared in shared code, but then their usages are guarded by `Matcher::has_match_rule()` thus ensuring there's enough support on back-end side. > > Some random observations: > > * the pass is performed unconditionally and it iterates over all live nodes; in contrast, macro nodes and nodes for post-loop opts IGVN are explicitly listed on the side (MacroLogicV pass is also guarded, but by a coarse-grained check); MacroLogicV currently is a standalone pass and was not implemented as discrete Idealizations split across various logic IR Idealizations, our intention was to create a standalone pass and limit the exposure of such complex logic packing to rest of the code and guard it by target specific match_rule_supported check. To fit this into new lowering interface, we may need to split root detection across various logic IR node accepted by PhaseLowering::lower_to(Node*). @jaskarth, @merykitty , can you share more use cases apart from VPMUL[U]DQ and MacroLogicV detection in support of new lowring pass. Please be aware that even after introducing lowering pass we still want to prevent replcating IR transforms, e.g transformations/Idealizations only applicable to x86 and AARCH64 should still be shared for maintainability reasons. In general compilers having lowering pass also introduce target specific pre matching IR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2439800311 From qamai at openjdk.org Sun Oct 27 13:22:15 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 27 Oct 2024 13:22:15 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Sun, 27 Oct 2024 01:55:41 GMT, Jatin Bhateja wrote: >> Thanks for looking at the build changes, @magicus! I've pushed a commit that removes the extra newline in the makefiles and adds newlines to the ends of files that were missing them. >> >> Thanks for taking a look as well, @merykitty and @jatin-bhateja! I've pushed a commit that should address the code suggestions and left some comments as well. > >> @jaskarth thanks for exploring platform-specific lowering! >> >> I briefly looked through the changes, but I didn't get a good understanding of its design goals. It's hard to see what use cases it is targeted for when only skeleton code is present. It would really help if there are some cases ported on top of it for illustration purposes and to solidify the design. >> >> Currently, there are multiple places in the code where IR lowering happens. In particular: >> >> * IGVN during post loop opts phase (guarded by `Compile::post_loop_opts_phase()`) (Ideal -> Ideal); >> * macro expansion (Ideal -> Ideal); >> * ad-hoc passes (GC barrier expansion, `MacroLogicV`) (Ideal -> Ideal); >> * final graph reshaping (Ideal -> Ideal); >> * matcher (Ideal -> Mach). >> >> I'd like to understand how the new pass is intended to interact with existing cases. >> >> Only the last one is truly platform-specific, but there are some platform-specific cases exposes in other places (e.g., MacroLogicV pass, DivMod combining) guarded by some predicates on `Matcher`. >> >> As the `PhaseLowering` is implemented now, it looks like a platform-specific macro expansion pass (1->many rewriting). But `MacroLogicV` case doesn't fit such model well. >> > > > > >> I see changes to enable platform-specific node classes. As of now, only Matcher introduces platform-specific nodes and all of them are Mach nodes. Platform-specific Ideal nodes are declared in shared code, but then their usages are guarded by `Matcher::has_match_rule()` thus ensuring there's enough support on back-end side. >> >> Some random observations: >> >> * the pass is performed unconditionally and it iterates over all live nodes; in contrast, macro nodes and nodes for post-loop opts IGVN are explicitly listed on the side (MacroLogicV pass is also guarded, but by a coarse-grained check); > > MacroLogicV currently is a standalone pass and was not implemented as discrete Idealizations split across various logic IR Idealizations, our intention was to create a standalone pass and limit the exposure of such complex logic packing to rest of the code and guard it by target specific match_rule_supported check. > To fit this into new lowering interface, we may need to split root detection across various logic IR node accepted by PhaseLowering::lower_to(Node*). > > @jaskarth, @merykitty , can you share more use cases apart from VPMUL[U]DQ and MacroLogicV detection in support of new lowring pass. Please be aware that even after introducing lowering pass we stil... @jatin-bhateja @iwanowww The application of lowering is very broad as it can help us perform arbitrary transformation as well as take advantages of GVN in the ideal world: 1, Any expansion that can benefit from GVN can be done in this pass. The first example is `ExtractXNode`s. Currently, it is expanded during code emission. An `int` extraction at the index 5 is currently expanded to: vextracti128 xmm1, ymm0, 1 vpextrd eax, xmm1, 1 If we try to extract multiple elements then `vextracti128` would be needlessly emitted multiple times. By moving the expansion from code emission to lowering, we can do GVN and eliminate the redundant operations. For vector insertions, the situation is even worse, as it would be expanded into multiple instructions. For example, to construct a vector from 4 long values, we would have to: vpxor xmm0, xmm0, xmm0 vmovdqu xmm1, xmm0 vpinsrq xmm1, xmm1, rax, 0 vinserti128 ymm0, ymm0, xmm1, 0 vmovdqu xmm1, xmm0 vpinsrq xmm1, xmm1, rcx, 1 vinserti128 ymm0, ymm0, xmm1, 0 vextracti128 xmm1, ymm0, 1 vpinsrq xmm1, xmm1, rdx, 0 vinserti128 ymm0, ymm0, xmm1, 1 vextracti128 xmm1, ymm0, 1 vpinsrq xmm1, xmm1, rbx, 1 vinserti128 ymm0, ymm0, xmm1, 1 By moving the expansion to lowering we can have a much more efficient sequence: vmovq xmm0, rax vpinsrq xmm0, xmm0, rcx, 1 vmovq xmm1, rdx vpinsrq xmm1, xmm1, rbx, 1 vinserti128 ymm0, ymm0, xmm1, 1 Some other examples are vector mask operations which can be expanded into a `VectorMaskToLong` operation followed by a bit counting one, unsigned comparisons on AVX2 is expanded into 2 `xor` with the min signed value and a signed comparison. 2, @jaskarth mentioned their experiments with the `bextr` instruction. It allows a quick bit extraction but the downside is that it does not accept an immediate operand. As a result, an immediate operand needs to be materialized each time. By doing it here, we can allow that to be moved outside of loops. Furthermore, this can be done for variable `bextr`, too. If we see that the start and len parameter to be a loop variant then the operand for `bextr` can be computed outside the loop. 3, x86 haves several atomic instructions, the downside is that they often does not return the old value, which is the proprety only `xadd` has. As a result, a `VarHandle::getAndBitwiseAnd` is implemented as a compare exchange loop. This phase may recognise the pattern and transform it into a `lock and` if the old value is not used. 4, etc ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2440021718 From amitkumar at openjdk.org Mon Oct 28 03:05:02 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 28 Oct 2024 03:05:02 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes [v3] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 15:10:20 GMT, Amit Kumar wrote: >> We are on thin ice with the `TestOSRLotsOfLocals` test. Before it breaks, I'd like to provide the fix. :-) >> >> Testing : Tier1 test with fastdebug vm. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > more comments from lutz @TheRealMDoerr can I get a review for this fix :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21703#issuecomment-2440443776 From jkarthikeyan at openjdk.org Mon Oct 28 03:58:16 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 28 Oct 2024 03:58:16 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 23:30:52 GMT, Vladimir Ivanov wrote: >> Thanks for looking at the build changes, @magicus! I've pushed a commit that removes the extra newline in the makefiles and adds newlines to the ends of files that were missing them. >> >> Thanks for taking a look as well, @merykitty and @jatin-bhateja! I've pushed a commit that should address the code suggestions and left some comments as well. > > @jaskarth thanks for exploring platform-specific lowering! > > I briefly looked through the changes, but I didn't get a good understanding of its design goals. It's hard to see what use cases it is targeted for when only skeleton code is present. It would really help if there are some cases ported on top of it for illustration purposes and to solidify the design. > > Currently, there are multiple places in the code where IR lowering happens. In particular: > * IGVN during post loop opts phase (guarded by `Compile::post_loop_opts_phase()`) (Ideal -> Ideal); > * macro expansion (Ideal -> Ideal); > * ad-hoc passes (GC barrier expansion, `MacroLogicV`) (Ideal -> Ideal); > * final graph reshaping (Ideal -> Ideal); > * matcher (Ideal -> Mach). > > I'd like to understand how the new pass is intended to interact with existing cases. > > Only the last one is truly platform-specific, but there are some platform-specific cases exposes in other places (e.g., MacroLogicV pass, DivMod combining) guarded by some predicates on `Matcher`. > > As the `PhaseLowering` is implemented now, it looks like a platform-specific macro expansion pass (1->many rewriting). But `MacroLogicV` case doesn't fit such model well. > > I see changes to enable platform-specific node classes. As of now, only Matcher introduces platform-specific nodes and all of them are Mach nodes. Platform-specific Ideal nodes are declared in shared code, but then their usages are guarded by `Matcher::has_match_rule()` thus ensuring there's enough support on back-end side. > > Some random observations: > * the pass is performed unconditionally and it iterates over all live nodes; in contrast, macro nodes and nodes for post-loop opts IGVN are explicitly listed on the side (MacroLogicV pass is also guarded, but by a coarse-grained check); Thanks a lot for your analysis of the patch, @iwanowww! I hope to answer some of your questions here. > It's hard to see what use cases it is targeted for when only skeleton code is present. It would really help if there are some cases ported on top of it for illustration purposes and to solidify the design. I think this is a very fair point. I was testing some cases before I made the PR, but I wanted to submit just the system in isolation to make it easier to review. I can make some example use cases separately to show what could be possible with the new system. > I'd like to understand how the new pass is intended to interact with existing cases. The overarching goal is to support new kinds of transforms on ideal nodes that are only relevant to a single hardware platform, which would otherwise be too narrow in scope to put in shared code but would be difficult to do in purely AD code. It can be helpful having GVN while transforming the IR into a more backend-specific form. @merykitty added some nice examples above that illustrate possible use-cases. > As the `PhaseLowering` is implemented now, it looks like a platform-specific macro expansion pass (1->many rewriting). But MacroLogicV case doesn't fit such model well. The lowering implementation works similarly to how an `Ideal()` call works, so it's possible to do many->1 (like `MacroLogicV`) and many->many transformations as well. > I see changes to enable platform-specific node classes. As of now, only Matcher introduces platform-specific nodes and all of them are Mach nodes. I was thinking if we're introducing nodes that only have functionality on specific platforms it might be nice to make those nodes only exist on those platforms as well, to reduce the size of shared code on platforms where the nodes aren't relevant. Since the lowering phase introduces new nodes that are specially known to the backend they should be supported by the backend too. However, it's not a necessary component of the lowering phase, just something that I thought could help with the implementation of lowered nodes. > the pass is performed unconditionally and it iterates over all live nodes; in contrast, macro nodes and nodes for post-loop opts IGVN are explicitly listed on the side (MacroLogicV pass is also guarded, but by a coarse-grained check); This is true, my thought was since MacroLogicV currently also iterates across all live nodes doing it here as well would be alright. I think a way to collect lowering-specific nodes would be difficult since the nodes that actually get lowered could change between backends. I did some testing on compile time with `-XX:+CITime`, and it seems like the impact is negligible (at least with the skeleton code). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2440496414 From jkarthikeyan at openjdk.org Mon Oct 28 03:58:16 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 28 Oct 2024 03:58:16 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 11:51:23 GMT, Quan Anh Mai wrote: >> Ah, do you mean having a method in `Node` that holds the lowering code? I was originally planning on doing it this way, but I think it would pose some problems where certain nodes' `Lower()` methods would only be overridden on certain backends, which would become messy. One of my goals was to keep the lowering code separate from shared code, so new lowerings could be implemented by just updating the main `lower_node` function in the backend. >> About GVN, I think it makes sense to do it in a separate phase because GVN is used quite generally whereas lowering is only done once. Since the `transform_old` function in IGVN is pretty complex as well, I think it's simpler to just implement `Value()` and GVN separately. Thinking on it more I think Identity is probably a good idea too, since as you mention it can't introduce new nodes into the graph. Mainly I wanted to avoid the case where `Ideal()` could fold a lowered graph back into the original form, causing an infinite loop. > > I mean we might want to run another kind of `Ideal` that will replace the normal `Ideal` on a node after its lowering. For example, consider this: > > vector v; > u = v.withLane(0, a).withLane(1, b); > > This will be parsed into: > > vector v; > v0 = InsertI(v, 4, a); > u = InsertI(v0, 5, b); > > And can be lowered to: > > vector v; > vector v1 = VectorExtract(v, 1); > v2 = InsertI(v1, 0, a); > v0 = VectorInsert(v, 1, v2); > vector v3 = VectorExtract(v0, 1); > v4 = InsertI(v3, 1, b); > u = VectorInsert(v0, 1, v4); > > Which represents this sequence: > > ymm0; > vextracti128 xmm1, ymm0, 1; > vpinsrd xmm1, xmm1, a, 0; > vinserti128 ymm0, ymm0, xmm1, 1; > vextracti128 xmm1, ymm0, 1; > vpinsrd xmm1, xmm1, b, 1; > vinserti128 ymm0, ymm0, xmm1, 1; > > As you can imagine this sequence is pretty inefficient, what we really want is: > > ymm0; > vextracti128 xmm1, ymm0, 1; > vpinsrd xmm1, xmm1, a, 0; > vpinsrd xmm1, xmm1, b, 1; > vinserti128 ymm0, ymm0, xmm1, 1; > > Looking back at the graph, we can `Identity` `v3` into `v2` since it is pretty obvious that we just do an insert and extract from the same place. However, to transform `u = VectorInsert(v0, 1, v4)` into `u = VectorInsert(v, 1, v4)`, we would need an `Ideal`-like transformation to see that we just insert into a location twice and remove the intermediate `VectorInsert`. > > As a result, in addition to ease of implementation, I think you may extend `PhaseIterGVN` and override its `PhaseGVN::apply_ideal` to return `nullptr` for now, and take advantages of `PhaseIterGVN::optimize` to do the iterative transformation for you. Ah, I see what you mean now. I think this makes extending IGVN more appealing because we could continue to do Ideal on lowered nodes, as you mentioned. We could override `PhaseGVN::apply_ideal` to return `nullptr` when processing regular nodes, but run the other `Ideal` type when encountering lowered nodes. Do you think it would be better to add another method to `Node` or should we re-use the existing Ideal call, but lowering specific nodes are guarded with a new node flag? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1818344051 From fyang at openjdk.org Mon Oct 28 04:21:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 28 Oct 2024 04:21:54 GMT Subject: RFR: 8343121: RISC-V: More accurate max size for C2SafepointPollStub and C2EntryBarrierStub Message-ID: Hi, please review this small change. The current max size these two stubs is a bit overestimated and thus is more than needed. Since `la`, `far_call` and `far_jump` assember routines used by these two stubs will always emit 2 instructions for address inside the code cache, we can make the max size more accurate. Testing on linux-riscv64 platform: - [x] tier1-tier3 (release) - [x] hotspot:tier1 (fastdebug) ------------- Commit messages: - 8343121: RISC-V: More accurate max size for C2SafepointPollStub and C2EntryBarrierStub Changes: https://git.openjdk.org/jdk/pull/21732/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21732&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343121 Stats: 7 lines in 1 file changed: 3 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21732.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21732/head:pull/21732 PR: https://git.openjdk.org/jdk/pull/21732 From fyang at openjdk.org Mon Oct 28 04:46:26 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 28 Oct 2024 04:46:26 GMT Subject: RFR: 8343122: RISC-V: C2: Small improvement for real runtime callouts Message-ID: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> Hi, please review this small improvement. Currently, we do 11 instructions for real C2 runtime callouts (See riscv_enc_java_to_runtime). Seems we can materialize the pointer faster with `movptr2`, which will reduce 2 instructions. But we will need to reorder the original calling sequence a bit to make `t0` available for `movptr2`. Testing on linux-riscv64 platform: - [x] tier1-tier3 (release) - [x] hotspot:tier1 (fastdebug) ------------- Commit messages: - 8343122: RISC-V: C2: Small improvement for real runtime callouts Changes: https://git.openjdk.org/jdk/pull/21733/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21733&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343122 Stats: 11 lines in 1 file changed: 4 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21733.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21733/head:pull/21733 PR: https://git.openjdk.org/jdk/pull/21733 From qamai at openjdk.org Mon Oct 28 04:59:23 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 28 Oct 2024 04:59:23 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: <_bIKE5NejQ9yFVAXMMUwDxREhXLJGQw-1U-V1KqC1xY=.1e3f43a1-3773-4151-862a-5404a1a084ff@github.com> On Mon, 28 Oct 2024 03:55:57 GMT, Jasmine Karthikeyan wrote: >> I mean we might want to run another kind of `Ideal` that will replace the normal `Ideal` on a node after its lowering. For example, consider this: >> >> vector v; >> u = v.withLane(0, a).withLane(1, b); >> >> This will be parsed into: >> >> vector v; >> v0 = InsertI(v, 4, a); >> u = InsertI(v0, 5, b); >> >> And can be lowered to: >> >> vector v; >> vector v1 = VectorExtract(v, 1); >> v2 = InsertI(v1, 0, a); >> v0 = VectorInsert(v, 1, v2); >> vector v3 = VectorExtract(v0, 1); >> v4 = InsertI(v3, 1, b); >> u = VectorInsert(v0, 1, v4); >> >> Which represents this sequence: >> >> ymm0; >> vextracti128 xmm1, ymm0, 1; >> vpinsrd xmm1, xmm1, a, 0; >> vinserti128 ymm0, ymm0, xmm1, 1; >> vextracti128 xmm1, ymm0, 1; >> vpinsrd xmm1, xmm1, b, 1; >> vinserti128 ymm0, ymm0, xmm1, 1; >> >> As you can imagine this sequence is pretty inefficient, what we really want is: >> >> ymm0; >> vextracti128 xmm1, ymm0, 1; >> vpinsrd xmm1, xmm1, a, 0; >> vpinsrd xmm1, xmm1, b, 1; >> vinserti128 ymm0, ymm0, xmm1, 1; >> >> Looking back at the graph, we can `Identity` `v3` into `v2` since it is pretty obvious that we just do an insert and extract from the same place. However, to transform `u = VectorInsert(v0, 1, v4)` into `u = VectorInsert(v, 1, v4)`, we would need an `Ideal`-like transformation to see that we just insert into a location twice and remove the intermediate `VectorInsert`. >> >> As a result, in addition to ease of implementation, I think you may extend `PhaseIterGVN` and override its `PhaseGVN::apply_ideal` to return `nullptr` for now, and take advantages of `PhaseIterGVN::optimize` to do the iterative transformation for you. > > Ah, I see what you mean now. I think this makes extending IGVN more appealing because we could continue to do Ideal on lowered nodes, as you mentioned. We could override `PhaseGVN::apply_ideal` to return `nullptr` when processing regular nodes, but run the other `Ideal` type when encountering lowered nodes. Do you think it would be better to add another method to `Node` or should we re-use the existing Ideal call, but lowering specific nodes are guarded with a new node flag? I think having a new method in `Node` would be more manageable, I can imagine it allows us to reuse pre-lowered nodes for lowering. The example I gave above we reuse `ExtractI` since the semantics is still the same, the only difference is that from here `ExtractI` can only appear with the index parameter being smaller than 4. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1818374892 From thartmann at openjdk.org Mon Oct 28 06:06:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Oct 2024 06:06:24 GMT Subject: RFR: 8343070: Enable is_trace_align_vector when TraceSuperWord is set In-Reply-To: References: Message-ID: <1HVBc5Ltvtp7gQD7Z8_188r8LGxYzLOIJgYBrR4vcD0=.85240297-0f6f-41f2-a0d5-6317b2752e88@github.com> On Fri, 25 Oct 2024 14:45:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Currently, in SuperWord::filter_packs_for_alignment(), there is some log not turned on when TraceSuperWord is set, but I think it should, as it's more convenient for users to debug. > Thanks! Looks reasonable to me but @eme64 should have a look at this as well. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21715#pullrequestreview-2397996440 From thartmann at openjdk.org Mon Oct 28 06:24:15 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Oct 2024 06:24:15 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied [v2] In-Reply-To: References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: On Fri, 25 Oct 2024 15:09:50 GMT, Roland Westrelin wrote: >> The transformation: >> >> >> (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) >> >> >> when i fits in an int is not always applied: when the type of `i` is >> narrowed so it fits in an int, the `CastX2P` is not enqueued for >> igvn. This can get in the way of vectorization as shown by test case >> `test2`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > fix test Shouldn't this be caught by `VerifyIterativeGVN` after [JDK-8298952 ](https://bugs.openjdk.org/browse/JDK-8298952)? ------------- PR Review: https://git.openjdk.org/jdk/pull/21714#pullrequestreview-2398042097 From epeter at openjdk.org Mon Oct 28 06:48:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Oct 2024 06:48:26 GMT Subject: RFR: 8343070: Enable is_trace_align_vector when TraceSuperWord is set In-Reply-To: References: Message-ID: <49U1Q5veOv_rsL4EfpG52Qnz2bOeNWUuULJMgqZKZOw=.20f25eab-428d-4d87-bdf7-073ab3b04237@github.com> On Fri, 25 Oct 2024 14:45:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Currently, in SuperWord::filter_packs_for_alignment(), there is some log not turned on when TraceSuperWord is set, but I think it should, as it's more convenient for users to debug. > Thanks! @Hamlin-Li There is a lot of tracing in SuperWord. You have very fine-grained control with `TraceAutoVectorization`, and in my opinion `TraceSuperWord` should just be a "summary", and quickly readable to get a hint. You can use `-XX:CompileCommand=TraceAutoVectorization,*::*,ALIGN_VECTOR`. If you do think we should add the align vector tracing into `TraceSuperWord`, then I would like to hear from you what other tracing you would add, and what not. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21715#issuecomment-2440685811 From epeter at openjdk.org Mon Oct 28 06:58:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Oct 2024 06:58:03 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v2] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 15:19:25 GMT, Roland Westrelin wrote: >> Superword creates a `Replicate` node at a `ConvL2I` node and uses the >> type of the result of the `ConvL2I` to pick the type of the >> `Replicate` instead of the type of the input to the `ConvL2I`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Nice, thanks for the added comments! Do you know what JDK versions are affected? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21660#pullrequestreview-2398092168 From epeter at openjdk.org Mon Oct 28 06:59:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Oct 2024 06:59:10 GMT Subject: RFR: 8343070: Enable is_trace_align_vector when TraceSuperWord is set In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 14:45:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Currently, in SuperWord::filter_packs_for_alignment(), there is some log not turned on when TraceSuperWord is set, but I think it should, as it's more convenient for users to debug. > Thanks! BTW I'm not saying that tracing is handled optimally. I currently just have it working. And maybe eventually we can remove `TraceSuperWord` and only use `TraceAutoVectorization`. The grouping of tracing flags/tags can surely be adjusted. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21715#issuecomment-2440699559 From epeter at openjdk.org Mon Oct 28 07:12:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Oct 2024 07:12:21 GMT Subject: RFR: 8343070: Enable is_trace_align_vector when TraceSuperWord is set In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 14:45:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Currently, in SuperWord::filter_packs_for_alignment(), there is some log not turned on when TraceSuperWord is set, but I think it should, as it's more convenient for users to debug. > Thanks! Ah, one more thing. I try to keep `SW_INFO` and `TraceSuperWord` in sync. So if we do decide to add `ALIGN_VECTOR` to `TraceSuperWord`, we should also add it to `SW_INFO`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21715#issuecomment-2440716962 From jbhateja at openjdk.org Mon Oct 28 07:18:05 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 28 Oct 2024 07:18:05 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Sun, 27 Oct 2024 01:55:41 GMT, Jatin Bhateja wrote: >> Thanks for looking at the build changes, @magicus! I've pushed a commit that removes the extra newline in the makefiles and adds newlines to the ends of files that were missing them. >> >> Thanks for taking a look as well, @merykitty and @jatin-bhateja! I've pushed a commit that should address the code suggestions and left some comments as well. > >> @jaskarth thanks for exploring platform-specific lowering! >> >> I briefly looked through the changes, but I didn't get a good understanding of its design goals. It's hard to see what use cases it is targeted for when only skeleton code is present. It would really help if there are some cases ported on top of it for illustration purposes and to solidify the design. >> >> Currently, there are multiple places in the code where IR lowering happens. In particular: >> >> * IGVN during post loop opts phase (guarded by `Compile::post_loop_opts_phase()`) (Ideal -> Ideal); >> * macro expansion (Ideal -> Ideal); >> * ad-hoc passes (GC barrier expansion, `MacroLogicV`) (Ideal -> Ideal); >> * final graph reshaping (Ideal -> Ideal); >> * matcher (Ideal -> Mach). >> >> I'd like to understand how the new pass is intended to interact with existing cases. >> >> Only the last one is truly platform-specific, but there are some platform-specific cases exposes in other places (e.g., MacroLogicV pass, DivMod combining) guarded by some predicates on `Matcher`. >> >> As the `PhaseLowering` is implemented now, it looks like a platform-specific macro expansion pass (1->many rewriting). But `MacroLogicV` case doesn't fit such model well. >> > > > > >> I see changes to enable platform-specific node classes. As of now, only Matcher introduces platform-specific nodes and all of them are Mach nodes. Platform-specific Ideal nodes are declared in shared code, but then their usages are guarded by `Matcher::has_match_rule()` thus ensuring there's enough support on back-end side. >> >> Some random observations: >> >> * the pass is performed unconditionally and it iterates over all live nodes; in contrast, macro nodes and nodes for post-loop opts IGVN are explicitly listed on the side (MacroLogicV pass is also guarded, but by a coarse-grained check); > > MacroLogicV currently is a standalone pass and was not implemented as discrete Idealizations split across various logic IR Idealizations, our intention was to create a standalone pass and limit the exposure of such complex logic packing to rest of the code and guard it by target specific match_rule_supported check. > To fit this into new lowering interface, we may need to split root detection across various logic IR node accepted by PhaseLowering::lower_to(Node*). > > @jaskarth, @merykitty , can you share more use cases apart from VPMUL[U]DQ and MacroLogicV detection in support of new lowring pass. Please be aware that even after introducing lowering pass we stil... > @jatin-bhateja @iwanowww The application of lowering is very broad as it can help us perform arbitrary transformation as well as take advantages of GVN in the ideal world: > > 1, Any expansion that can benefit from GVN can be done in this pass. The first example is `ExtractXNode`s. Currently, it is expanded during code emission. An `int` extraction at the index 5 is currently expanded to: > > ``` > vextracti128 xmm1, ymm0, 1 > vpextrd eax, xmm1, 1 > ``` > > If we try to extract multiple elements then `vextracti128` would be needlessly emitted multiple times. By moving the expansion from code emission to lowering, we can do GVN and eliminate the redundant operations. For vector insertions, the situation is even worse, as it would be expanded into multiple instructions. For example, to construct a vector from 4 long values, we would have to: > > ``` > vpxor xmm0, xmm0, xmm0 > > vmovdqu xmm1, xmm0 > vpinsrq xmm1, xmm1, rax, 0 > vinserti128 ymm0, ymm0, xmm1, 0 > > vmovdqu xmm1, xmm0 > vpinsrq xmm1, xmm1, rcx, 1 > vinserti128 ymm0, ymm0, xmm1, 0 > > vextracti128 xmm1, ymm0, 1 > vpinsrq xmm1, xmm1, rdx, 0 > vinserti128 ymm0, ymm0, xmm1, 1 > > vextracti128 xmm1, ymm0, 1 > vpinsrq xmm1, xmm1, rbx, 1 > vinserti128 ymm0, ymm0, xmm1, 1 > ``` > > By moving the expansion to lowering we can have a much more efficient sequence: > > ``` > vmovq xmm0, rax > vpinsrq xmm0, xmm0, rcx, 1 > vmovq xmm1, rdx > vpinsrq xmm1, xmm1, rbx, 1 > vinserti128 ymm0, ymm0, xmm1, 1 > ``` > Hi @jaskarth Target specific IR compliments lowering pass, the example above very appropriately showcases the usefulness of lowering pass. For completeness we should extend this patch and add target specific extensions to "opto/classes.hpp" and a new Node.hpp' to record new target specific IR definitions. Hi @merykitty , Lowering will also reduce register pressure since we may be able to save additional temporary machine operands by splitting monolithic instruction encoding blocks across multiple lowered IR nodes, this together with GVN promoted sharing should be very powerful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2440720267 From chagedorn at openjdk.org Mon Oct 28 08:09:51 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 08:09:51 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v2] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Thu, 24 Oct 2024 11:57:39 GMT, Christian Hagedorn wrote: >> #### Replacing the Remaining Predicate Walking and Cloning Code >> In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. >> >> #### Single Template Assertion Predicate Check >> This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). >> >> #### Common Refactorings for all the Patches in this Series >> In each of the patch, I will do similar refactoring ideas: >> - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. >> - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. >> - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. >> - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). >> - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. >> >> #### Refactorings of this Patch >> This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > small update src/hotspot/share/opto/loopTransform.cpp line 1994: > 1992: _igvn.replace_input_of(target_outer_loop_head, LoopNode::EntryControl, last_created_predicate_success_proj); > 1993: set_idom(target_outer_loop_head, last_created_predicate_success_proj, dom_depth(target_outer_loop_head)); > 1994: } **P1** (see PR description) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1818537592 From rrich at openjdk.org Mon Oct 28 08:24:44 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 28 Oct 2024 08:24:44 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v6] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 17:21:21 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Enable on MUSL. Looks good, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21615#pullrequestreview-2398239564 From thartmann at openjdk.org Mon Oct 28 08:35:53 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Oct 2024 08:35:53 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" In-Reply-To: References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> <9wsyRJSf120rDZVr39RLRateAT7_JNByxLIdVUx9sgo=.a8f4f772-802b-4ced-878a-3d6c7ea5026d@github.com> Message-ID: On Fri, 25 Oct 2024 19:33:28 GMT, Cesar Soares Lucas wrote: > The test I added in this PR is based on @chhagedorn Test3.java. I was able to reproduce the issue on my end fairly easily. Ah right, all good then. Please add @chhagedorn as co-contributor since he extracted that test. > I wasn't able to reproduce the issue using Test.java, how often does it reproduce for you with the flags that you listed at the top of Test.java ? Right, I just tried and it does not seem to reproduce anymore. Too bad but let's leave it out then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21624#issuecomment-2440865472 From mli at openjdk.org Mon Oct 28 10:05:32 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 28 Oct 2024 10:05:32 GMT Subject: RFR: 8343070: Enable is_trace_align_vector when TraceSuperWord is set In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 07:08:39 GMT, Emanuel Peter wrote: >> Hi, >> Can you help to review this simple patch? >> Currently, in SuperWord::filter_packs_for_alignment(), there is some log not turned on when TraceSuperWord is set, but I think it should, as it's more convenient for users to debug. >> Thanks! > > Ah, one more thing. I try to keep `SW_INFO` and `TraceSuperWord` in sync. So if we do decide to add `ALIGN_VECTOR` to `TraceSuperWord`, we should also add it to `SW_INFO`. > Looks reasonable to me but @eme64 should have a look at this as well. > > /reviewers 2 Thanks @TobiHartmann ! Thanks @eme64 for discussion! :) I'm on vacation, will get back to you when I'm back later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21715#issuecomment-2441121074 PR Comment: https://git.openjdk.org/jdk/pull/21715#issuecomment-2441123459 From dlunden at openjdk.org Mon Oct 28 10:19:50 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 28 Oct 2024 10:19:50 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v6] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 14:06:18 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Remove leftover debug var > - Update > - Merge tag 'jdk-24+16' into HEAD > > Added tag jdk-24+16 for changeset c58fbef0 > - Formatting updates > - Update > - Update after Roberto's comments and suggestions > - Add can_represent asserts > - Remove leftover CHUNK_SIZE reference > - Support methods with many arguments in C2 New comment to avoid automatic closing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2441152487 From epeter at openjdk.org Mon Oct 28 10:28:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Oct 2024 10:28:24 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v2] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Thu, 24 Oct 2024 11:57:39 GMT, Christian Hagedorn wrote: >> #### Replacing the Remaining Predicate Walking and Cloning Code >> In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. >> >> #### Single Template Assertion Predicate Check >> This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). >> >> #### Common Refactorings for all the Patches in this Series >> In each of the patch, I will do similar refactoring ideas: >> - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. >> - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. >> - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. >> - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). >> - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. >> >> #### Refactorings of this Patch >> This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > small update Generally looks reasonable. A little hard to review without digging much deeper into the code. I have some questions/suggestions already. src/hotspot/share/opto/loopTransform.cpp line 819: > 817: if (counted_loop && UseLoopPredicate) { > 818: initialize_assertion_predicates_for_peeled_loop(new_head->as_CountedLoop(), head->as_CountedLoop(), > 819: first_node_index_in_cloned_loop_body, old_new); Suggestion: initialize_assertion_predicates_for_peeled_loop(new_head->as_CountedLoop(), head->as_CountedLoop(), first_node_index_in_cloned_loop_body, old_new); Indentation seems off. src/hotspot/share/opto/loopTransform.cpp line 1989: > 1987: Node* source_loop_entry = source_loop_head->skip_strip_mined()->in(LoopNode::EntryControl); > 1988: PredicateIterator predicate_iterator(source_loop_entry); > 1989: predicate_iterator.for_each(assertion_predicates_for_loop); The name `assertion_predicates_for_loop` does not tell me what this would do, when applied with the `for_each` src/hotspot/share/opto/predicates.cpp line 736: > 734: if (deopt_reason == Deoptimization::Reason_predicate || > 735: deopt_reason == Deoptimization::Reason_profile_predicate) { > 736: _current_parse_predicate = parse_predicate.tail(); We set it here. But do we need to unset it again for later predicate blocks? src/hotspot/share/opto/predicates.hpp line 946: > 944: // Visitor to create Initialized Assertion Predicates at a target loop from Template Assertion Predicates from a source > 945: // loop. This visitor can be used in combination with a PredicateIterator. > 946: class AssertionPredicatesForLoop : public PredicateVisitor { I think this could have a more expressive name. It is a Visitor... hmm Maybe `InitializedAssertionPredicatesFromTemplatesCreator`? Hmm not sure. The current name suggests that it is just a collection of `AssertionPredicates`. src/hotspot/share/opto/predicates.hpp line 952: > 950: Node* _new_control; > 951: PhaseIdealLoop* const _phase; > 952: ParsePredicateSuccessProj* _current_parse_predicate; It looks to me like this could be a boolean, correct? src/hotspot/share/opto/predicates.hpp line 967: > 965: NONCOPYABLE(AssertionPredicatesForLoop); > 966: > 967: using PredicateVisitor::visit; What does this do? ------------- PR Review: https://git.openjdk.org/jdk/pull/21679#pullrequestreview-2398539538 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1818725177 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1818781439 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1818772147 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1818760936 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1818767972 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1818745986 From epeter at openjdk.org Mon Oct 28 10:28:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Oct 2024 10:28:25 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v2] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Mon, 28 Oct 2024 10:11:04 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> small update > > src/hotspot/share/opto/predicates.hpp line 952: > >> 950: Node* _new_control; >> 951: PhaseIdealLoop* const _phase; >> 952: ParsePredicateSuccessProj* _current_parse_predicate; > > It looks to me like this could be a boolean, correct? Then the name could also be more descriptive ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1818769211 From chagedorn at openjdk.org Mon Oct 28 10:34:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 10:34:28 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v2] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Mon, 28 Oct 2024 10:06:33 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> small update > > src/hotspot/share/opto/predicates.hpp line 946: > >> 944: // Visitor to create Initialized Assertion Predicates at a target loop from Template Assertion Predicates from a source >> 945: // loop. This visitor can be used in combination with a PredicateIterator. >> 946: class AssertionPredicatesForLoop : public PredicateVisitor { > > I think this could have a more expressive name. It is a Visitor... hmm > Maybe `InitializedAssertionPredicatesFromTemplatesCreator`? Hmm not sure. > > The current name suggests that it is just a collection of `AssertionPredicates`. As discussed offline, maybe `CreateAssertionPredicatesVisitor`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1818802276 From mdoerr at openjdk.org Mon Oct 28 10:39:03 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Oct 2024 10:39:03 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes [v3] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 15:10:20 GMT, Amit Kumar wrote: >> We are on thin ice with the `TestOSRLotsOfLocals` test. Before it breaks, I'd like to provide the fix. :-) >> >> Testing : Tier1 test with fastdebug vm. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > more comments from lutz Looks correct to me. I suggest testing it by setting `large_offset` to true. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21703#pullrequestreview-2398683280 From amitkumar at openjdk.org Mon Oct 28 11:23:02 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 28 Oct 2024 11:23:02 GMT Subject: RFR: 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms Message-ID: Adjust TestVectorizationMismatchedAccess.java for Big Endian Platforms ------------- Commit messages: - minimize the diff - Adjustment for BE platforms Changes: https://git.openjdk.org/jdk/pull/21736/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21736&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342489 Stats: 31 lines in 1 file changed: 8 ins; 3 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/21736.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21736/head:pull/21736 PR: https://git.openjdk.org/jdk/pull/21736 From amitkumar at openjdk.org Mon Oct 28 11:23:02 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 28 Oct 2024 11:23:02 GMT Subject: RFR: 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 10:09:13 GMT, Amit Kumar wrote: > Adjust TestVectorizationMismatchedAccess.java for Big Endian Platforms @MBaesken can you test it once ? It passes on s390x now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21736#issuecomment-2441143767 From mbaesken at openjdk.org Mon Oct 28 11:29:46 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 28 Oct 2024 11:29:46 GMT Subject: RFR: 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms In-Reply-To: References: Message-ID: <5RGkOmbIE_u6JbIKndKlJmpl7xUd5q59JYkmWRiL5fg=.b6041cc6-e206-4658-b550-a4b57d40674b@github.com> On Mon, 28 Oct 2024 10:09:13 GMT, Amit Kumar wrote: > Adjust TestVectorizationMismatchedAccess.java for Big Endian Platforms please remove the exclusion too (from https://bugs.openjdk.org/browse/JDK-8343055) ; I put this then into our nightly build/test queue and tell you afterwards if all is fine . ------------- PR Comment: https://git.openjdk.org/jdk/pull/21736#issuecomment-2441321473 From dnsimon at openjdk.org Mon Oct 28 12:25:04 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 28 Oct 2024 12:25:04 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses [v3] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 15:02:27 GMT, Yudi Zheng wrote: >> https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Fix JIT error. Still looks good. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20949#pullrequestreview-2398921613 PR Review: https://git.openjdk.org/jdk/pull/20949#pullrequestreview-2398921934 From yzheng at openjdk.org Mon Oct 28 12:40:23 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 28 Oct 2024 12:40:23 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses [v3] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 15:02:27 GMT, Yudi Zheng wrote: >> https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Fix JIT error. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20949#issuecomment-2441475031 From yzheng at openjdk.org Mon Oct 28 12:42:52 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 28 Oct 2024 12:42:52 GMT Subject: Integrated: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:09:07 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses This pull request has now been integrated. Changeset: d5fb6b4a Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/d5fb6b4a3cf4926acb333e7ee55f96fc76225631 Stats: 72 lines in 7 files changed: 64 ins; 0 del; 8 mod 8339939: [JVMCI] Don't compress abstract and interface Klasses Co-authored-by: Doug Simon Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/20949 From chagedorn at openjdk.org Mon Oct 28 12:48:07 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 12:48:07 GMT Subject: RFR: 8343137: C2: VerifyLoopOptimizations fails with "Was reachable in only one" Message-ID: [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) introduced a new usage of `igvn.intcon()` but missed to call `set_ctrl()`. This simple patch fixes this. While working on this, I've noticed that a few Assertion Predicates tests ended up in the `predicates` directory while they would have better been placed into `predicates/assertion`. It's probably not worth to file a separate issue just for that, so I've squeezed this trivial move into this fix. Note that `AssertionPredicateDoesntConstantFold` does not define a package and thus does not need an update to the command line. Thanks, Christian ------------- Commit messages: - 8343137: C2: VerifyLoopOptimizations fails with "Was reachable in only one" Changes: https://git.openjdk.org/jdk/pull/21739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21739&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343137 Stats: 234 lines in 6 files changed: 145 ins; 85 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21739/head:pull/21739 PR: https://git.openjdk.org/jdk/pull/21739 From chagedorn at openjdk.org Mon Oct 28 12:48:08 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 12:48:08 GMT Subject: RFR: 8343137: C2: VerifyLoopOptimizations fails with "Was reachable in only one" In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 12:41:02 GMT, Christian Hagedorn wrote: > [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) introduced a new usage of `igvn.intcon()` but missed to call `set_ctrl()`. This simple patch fixes this. > > While working on this, I've noticed that a few Assertion Predicates tests ended up in the `predicates` directory while they would have better been placed into `predicates/assertion`. It's probably not worth to file a separate issue just for that, so I've squeezed this trivial move into this fix. Note that `AssertionPredicateDoesntConstantFold` does not define a package and thus does not need an update to the command line. > > Thanks, > Christian src/hotspot/share/opto/loopnode.cpp line 4492: > 4490: if (!useful_predicates.member(opaque_node)) { // not in the useful list > 4491: ConINode* one = _igvn.intcon(1); > 4492: set_ctrl(one, C->root()); Noticed that we find this pattern quite often in our code. Would be nice to have a `PhaseIdealLoop::intcon()` which calls `igvn.intcon()` and takes care of setting ctrl. I filed [JDK-8343148](https://bugs.openjdk.org/browse/JDK-8343148) to keep track of that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21739#discussion_r1818992863 From thartmann at openjdk.org Mon Oct 28 13:08:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 28 Oct 2024 13:08:28 GMT Subject: RFR: 8343137: C2: VerifyLoopOptimizations fails with "Was reachable in only one" In-Reply-To: References: Message-ID: <4CjB3vmy6acU9gaylkEvuNrk5D1D7X6U-M38Pu5NCwQ=.ea0ea2fb-a01f-4a20-876b-62acb1cb073a@github.com> On Mon, 28 Oct 2024 12:41:02 GMT, Christian Hagedorn wrote: > [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) introduced a new usage of `igvn.intcon()` but missed to call `set_ctrl()`. This simple patch fixes this. > > While working on this, I've noticed that a few Assertion Predicates tests ended up in the `predicates` directory while they would have better been placed into `predicates/assertion`. It's probably not worth to file a separate issue just for that, so I've squeezed this trivial move into this fix. Note that `AssertionPredicateDoesntConstantFold` does not define a package and thus does not need an update to the command line. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21739#pullrequestreview-2399023435 From stuefe at openjdk.org Mon Oct 28 13:09:01 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 28 Oct 2024 13:09:01 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v6] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 17:21:21 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Enable on MUSL. Small nit. But thank you for fixing the assertion poison mechanism too. I'll approve now and leave it up to you if you take my suggestion. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21615#pullrequestreview-2399017612 From stuefe at openjdk.org Mon Oct 28 13:09:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 28 Oct 2024 13:09:04 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v4] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 11:01:42 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Check uc->uc_mcontext.fpregs sanity. src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 551: > 549: st->cr(); > 550: st->cr(); > 551: size_t fpregs_offset = pointer_delta(uc->uc_mcontext.fpregs, uc, 1); Could the register substructure live outside uc on x64? If so, it may be safer to Suggestion: size_t fpregs_offset = (uc->uc_mcontext.fpregs >= uc) ? pointer_delta(uc->uc_mcontext.fpregs, uc, 1) : 0; or similar, since the register substructure may precede uc which would make pointer_delta assert. src/hotspot/share/utilities/debug.cpp line 735: > 733: #elif defined(AMD64) > 734: // In the copied version, fpregs should point to the copied contents. Preserve the offset. > 735: size_t fpregs_offset = pointer_delta(((const ucontext_t*)context)->uc_mcontext.fpregs, context, 1); Interesting note, above I see we do similar adjustments for ppc (https://github.com/openjdk/jdk/commit/3e603a776ea9c5642cb0ec6b9105c6ff34b8f2b1) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1819022246 PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1819024835 From chagedorn at openjdk.org Mon Oct 28 13:10:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 13:10:38 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v3] In-Reply-To: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: <0oDHLii2_SYYCp_AdTQep5QxIImEBY4URCcWGA7Db_o=.fbd870e0-6612-42be-9244-96872554c3a6@github.com> > #### Replacing the Remaining Predicate Walking and Cloning Code > In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. > > #### Refactorings of this Patch > This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943)) which is quite similar. Th... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Review Emanuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21679/files - new: https://git.openjdk.org/jdk/pull/21679/files/eb22d38e..79a59130 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21679&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21679&range=01-02 Stats: 19 lines in 3 files changed: 1 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/21679.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21679/head:pull/21679 PR: https://git.openjdk.org/jdk/pull/21679 From chagedorn at openjdk.org Mon Oct 28 13:10:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 13:10:42 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v2] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Thu, 24 Oct 2024 11:57:39 GMT, Christian Hagedorn wrote: >> #### Replacing the Remaining Predicate Walking and Cloning Code >> In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. >> >> #### Single Template Assertion Predicate Check >> This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). >> >> #### Common Refactorings for all the Patches in this Series >> In each of the patch, I will do similar refactoring ideas: >> - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. >> - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. >> - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. >> - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). >> - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. >> >> #### Refactorings of this Patch >> This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > small update Thanks Emanuel for your review! I've addressed you comments in a new commit. ------------- PR Review: https://git.openjdk.org/jdk/pull/21679#pullrequestreview-2399017668 From chagedorn at openjdk.org Mon Oct 28 13:10:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 13:10:43 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v2] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: <9FW4puDFAlxIZo1SjQQ0nVMTeoMfvMK5dnulBU8hnzU=.dbd4d818-bd90-4d78-b4a2-6db380d49b2f@github.com> On Mon, 28 Oct 2024 10:11:21 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/predicates.hpp line 952: >> >>> 950: Node* _new_control; >>> 951: PhaseIdealLoop* const _phase; >>> 952: ParsePredicateSuccessProj* _current_parse_predicate; >> >> It looks to me like this could be a boolean, correct? > > Then the name could also be more descriptive I changed it into a `bool`. The idea is to simulate the old behavior that we only create the Assertion Predicates if there are Parse Predicates available. I don't think this is generally correct - we could still split loops later when Parse Predicates have already been removed. But I want to fix this at other places as well and thus defer this to a separate change later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1819024953 From chagedorn at openjdk.org Mon Oct 28 13:10:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 13:10:43 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v2] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Mon, 28 Oct 2024 09:57:20 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> small update > > src/hotspot/share/opto/predicates.hpp line 967: > >> 965: NONCOPYABLE(AssertionPredicatesForLoop); >> 966: >> 967: using PredicateVisitor::visit; > > What does this do? See explanation here: https://github.com/openjdk/jdk/pull/21161#discussion_r1780565499 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1819022262 From chagedorn at openjdk.org Mon Oct 28 13:19:27 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 13:19:27 GMT Subject: RFR: 8343137: C2: VerifyLoopOptimizations fails with "Was reachable in only one" In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 12:41:02 GMT, Christian Hagedorn wrote: > [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) introduced a new usage of `igvn.intcon()` but missed to call `set_ctrl()`. This simple patch fixes this. > > While working on this, I've noticed that a few Assertion Predicates tests ended up in the `predicates` directory while they would have better been placed into `predicates/assertion`. It's probably not worth to file a separate issue just for that, so I've squeezed this trivial move into this fix. Note that `AssertionPredicateDoesntConstantFold` does not define a package and thus does not need an update to the command line. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21739#issuecomment-2441559749 From epeter at openjdk.org Mon Oct 28 13:19:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Oct 2024 13:19:29 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v3] In-Reply-To: <0oDHLii2_SYYCp_AdTQep5QxIImEBY4URCcWGA7Db_o=.fbd870e0-6612-42be-9244-96872554c3a6@github.com> References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> <0oDHLii2_SYYCp_AdTQep5QxIImEBY4URCcWGA7Db_o=.fbd870e0-6612-42be-9244-96872554c3a6@github.com> Message-ID: On Mon, 28 Oct 2024 13:10:38 GMT, Christian Hagedorn wrote: >> #### Replacing the Remaining Predicate Walking and Cloning Code >> In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. >> >> #### Single Template Assertion Predicate Check >> This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). >> >> #### Common Refactorings for all the Patches in this Series >> In each of the patch, I will do similar refactoring ideas: >> - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. >> - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. >> - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. >> - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). >> - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. >> >> #### Refactorings of this Patch >> This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Review Emanuel src/hotspot/share/opto/loopTransform.cpp line 1991: > 1989: predicate_iterator.for_each(create_assertion_predicates_for_loop); > 1990: if (create_assertion_predicates_for_loop.has_created_predicates()) { > 1991: IfTrueNode* last_created_predicate_success_proj = create_assertion_predicates_for_loop.last_created_success_proj(); Suggestion: CreateAssertionPredicatesVisitor create_assertion_predicates_visitor(init, stride, target_loop_entry, this, _node_in_loop_body); Node* source_loop_entry = source_loop_head->skip_strip_mined()->in(LoopNode::EntryControl); PredicateIterator predicate_iterator(source_loop_entry); predicate_iterator.for_each(create_assertion_predicates_visitor); if (create_assertion_predicates_visitor.has_created_predicates()) { IfTrueNode* last_created_predicate_success_proj = create_assertion_predicates_for_loop.last_created_success_proj(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1819037886 From epeter at openjdk.org Mon Oct 28 13:19:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Oct 2024 13:19:29 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v3] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> <0oDHLii2_SYYCp_AdTQep5QxIImEBY4URCcWGA7Db_o=.fbd870e0-6612-42be-9244-96872554c3a6@github.com> Message-ID: On Mon, 28 Oct 2024 13:14:29 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Review Emanuel > > src/hotspot/share/opto/loopTransform.cpp line 1991: > >> 1989: predicate_iterator.for_each(create_assertion_predicates_for_loop); >> 1990: if (create_assertion_predicates_for_loop.has_created_predicates()) { >> 1991: IfTrueNode* last_created_predicate_success_proj = create_assertion_predicates_for_loop.last_created_success_proj(); > > Suggestion: > > CreateAssertionPredicatesVisitor create_assertion_predicates_visitor(init, stride, target_loop_entry, this, > _node_in_loop_body); > Node* source_loop_entry = source_loop_head->skip_strip_mined()->in(LoopNode::EntryControl); > PredicateIterator predicate_iterator(source_loop_entry); > predicate_iterator.for_each(create_assertion_predicates_visitor); > if (create_assertion_predicates_visitor.has_created_predicates()) { > IfTrueNode* last_created_predicate_success_proj = create_assertion_predicates_for_loop.last_created_success_proj(); I think this would complete the renaming ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1819038342 From mdoerr at openjdk.org Mon Oct 28 13:45:26 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Oct 2024 13:45:26 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v6] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 17:21:21 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Enable on MUSL. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2441625303 From mdoerr at openjdk.org Mon Oct 28 13:45:26 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Oct 2024 13:45:26 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v4] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 13:03:42 GMT, Thomas Stuefe wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Check uc->uc_mcontext.fpregs sanity. > > src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 551: > >> 549: st->cr(); >> 550: st->cr(); >> 551: size_t fpregs_offset = pointer_delta(uc->uc_mcontext.fpregs, uc, 1); > > Could the register substructure live outside uc on x64? If so, it may be safer to > Suggestion: > > size_t fpregs_offset = (uc->uc_mcontext.fpregs >= uc) ? pointer_delta(uc->uc_mcontext.fpregs, uc, 1) : 0; > > or similar, since the register substructure may precede uc which would make pointer_delta assert. I think using 0 would require more changes to avoid accessing uc+0 which would be wrong. Richard and I already discussed about this above. We think it's acceptable. Some other projects claim that FP register substructure is inside the uc: https://github.com/mono/mono/blob/0f53e9e151d92944cacab3e24ac359410c606df6/mono/utils/mono-sigcontext.h#L263 We could also check the kernel code which writes it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1819079823 From chagedorn at openjdk.org Mon Oct 28 13:46:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 13:46:42 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v4] In-Reply-To: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: > #### Replacing the Remaining Predicate Walking and Cloning Code > In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. > > #### Refactorings of this Patch > This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943)) which is quite similar. Th... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21679/files - new: https://git.openjdk.org/jdk/pull/21679/files/79a59130..d8ddf918 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21679&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21679&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21679.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21679/head:pull/21679 PR: https://git.openjdk.org/jdk/pull/21679 From chagedorn at openjdk.org Mon Oct 28 13:46:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 13:46:42 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v3] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> <0oDHLii2_SYYCp_AdTQep5QxIImEBY4URCcWGA7Db_o=.fbd870e0-6612-42be-9244-96872554c3a6@github.com> Message-ID: <1DRtV06NhUOIOWMs-T0PXe1hqZMtMwAu_3R-2mYhqnw=.932ca4a8-b480-4b03-ba36-04cd4d87361e@github.com> On Mon, 28 Oct 2024 13:14:46 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1991: >> >>> 1989: predicate_iterator.for_each(create_assertion_predicates_for_loop); >>> 1990: if (create_assertion_predicates_for_loop.has_created_predicates()) { >>> 1991: IfTrueNode* last_created_predicate_success_proj = create_assertion_predicates_for_loop.last_created_success_proj(); >> >> Suggestion: >> >> CreateAssertionPredicatesVisitor create_assertion_predicates_visitor(init, stride, target_loop_entry, this, >> _node_in_loop_body); >> Node* source_loop_entry = source_loop_head->skip_strip_mined()->in(LoopNode::EntryControl); >> PredicateIterator predicate_iterator(source_loop_entry); >> predicate_iterator.for_each(create_assertion_predicates_visitor); >> if (create_assertion_predicates_visitor.has_created_predicates()) { >> IfTrueNode* last_created_predicate_success_proj = create_assertion_predicates_for_loop.last_created_success_proj(); > > I think this would complete the renaming Good catch, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1819081000 From mdoerr at openjdk.org Mon Oct 28 13:53:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Oct 2024 13:53:55 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v7] In-Reply-To: References: Message-ID: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add check for uc->uc_mcontext.fpregs >= uc. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21615/files - new: https://git.openjdk.org/jdk/pull/21615/files/b949136b..24278108 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=05-06 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21615.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21615/head:pull/21615 PR: https://git.openjdk.org/jdk/pull/21615 From mdoerr at openjdk.org Mon Oct 28 13:53:56 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Oct 2024 13:53:56 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v4] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 13:41:24 GMT, Martin Doerr wrote: >> src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 551: >> >>> 549: st->cr(); >>> 550: st->cr(); >>> 551: size_t fpregs_offset = pointer_delta(uc->uc_mcontext.fpregs, uc, 1); >> >> Could the register substructure live outside uc on x64? If so, it may be safer to >> Suggestion: >> >> size_t fpregs_offset = (uc->uc_mcontext.fpregs >= uc) ? pointer_delta(uc->uc_mcontext.fpregs, uc, 1) : 0; >> >> or similar, since the register substructure may precede uc which would make pointer_delta assert. > > I think using 0 would require more changes to avoid accessing uc+0 which would be wrong. Richard and I already discussed about this above. We think it's acceptable. Some other projects claim that FP register substructure is inside the uc: https://github.com/mono/mono/blob/0f53e9e151d92944cacab3e24ac359410c606df6/mono/utils/mono-sigcontext.h#L263 > We could also check the kernel code which writes it. After having read this, I have made the change. See commit nr. 7. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1819095211 From stuefe at openjdk.org Mon Oct 28 13:58:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 28 Oct 2024 13:58:18 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v7] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 13:53:55 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add check for uc->uc_mcontext.fpregs >= uc. still good ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21615#pullrequestreview-2399150773 From rrich at openjdk.org Mon Oct 28 14:21:44 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 28 Oct 2024 14:21:44 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v7] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 13:53:55 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add check for uc->uc_mcontext.fpregs >= uc. src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 552: > 550: size_t fpregs_offset = ((address)uc->uc_mcontext.fpregs >= (address)uc) ? > 551: pointer_delta(uc->uc_mcontext.fpregs, uc, 1) : 0; > 552: if (fpregs_offset >= sizeof(ucontext_t) || fpregs_offset == 0) { Why protect against the assertion in `pointer_delta` here and not also in `store_context`? You could do the following in both methods: Suggestion: size_t fpregs_offset = ((address)uc->uc_mcontext.fpregs >= (address)uc) ? pointer_delta(uc->uc_mcontext.fpregs, uc, 1) : sizeof(ucontext_t); if (fpregs_offset >= sizeof(ucontext_t)) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1819157870 From epeter at openjdk.org Mon Oct 28 14:36:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Oct 2024 14:36:09 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v4] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Mon, 28 Oct 2024 13:46:42 GMT, Christian Hagedorn wrote: >> #### Replacing the Remaining Predicate Walking and Cloning Code >> In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. >> >> #### Single Template Assertion Predicate Check >> This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). >> >> #### Common Refactorings for all the Patches in this Series >> In each of the patch, I will do similar refactoring ideas: >> - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. >> - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. >> - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. >> - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). >> - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. >> >> #### Refactorings of this Patch >> This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Emanuel Peter Now it looks good to me :) Oh, I think you actually have a build failure: === Output from failing command(s) repeated here === * For target hotspot_variant-server_libjvm_objs_loopTransform.o: /home/runner/work/jdk/jdk/src/hotspot/share/opto/loopTransform.cpp: In member function ?void PhaseIdealLoop::create_assertion_predicates_at_loop(CountedLoopNode*, CountedLoopNode*, const NodeInLoopBody&)?: /home/runner/work/jdk/jdk/src/hotspot/share/opto/loopTransform.cpp:1991:55: error: ?create_assertion_predicates_for_loop? was not declared in this scope; did you mean ?create_assertion_predicates_at_loop?? 1991 | IfTrueNode* last_created_predicate_success_proj = create_assertion_predicates_for_loop.last_created_success_proj(); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | create_assertion_predicates_at_loop * All command lines available in /home/runner/work/jdk/jdk/build/linux-x64/make-support/failure-logs. === End of repeated output === src/hotspot/share/opto/loopTransform.cpp line 1986: > 1984: Node* target_loop_entry = target_outer_loop_head->in(LoopNode::EntryControl); > 1985: CreateAssertionPredicatesVisitor create_assertion_predicates_visitor(init, stride, target_loop_entry, this, > 1986: _node_in_loop_body); optional: fix indentation ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21679#pullrequestreview-2399278122 PR Comment: https://git.openjdk.org/jdk/pull/21679#issuecomment-2441764541 PR Review Comment: https://git.openjdk.org/jdk/pull/21679#discussion_r1819179462 From chagedorn at openjdk.org Mon Oct 28 14:42:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 14:42:34 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v5] In-Reply-To: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: > #### Replacing the Remaining Predicate Walking and Cloning Code > In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. > > #### Refactorings of this Patch > This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943)) which is quite similar. Th... Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: - Fix indentation - Fix build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21679/files - new: https://git.openjdk.org/jdk/pull/21679/files/d8ddf918..6b41a534 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21679&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21679&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21679.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21679/head:pull/21679 PR: https://git.openjdk.org/jdk/pull/21679 From chagedorn at openjdk.org Mon Oct 28 14:42:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Oct 2024 14:42:35 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v4] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Mon, 28 Oct 2024 13:46:42 GMT, Christian Hagedorn wrote: >> #### Replacing the Remaining Predicate Walking and Cloning Code >> In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. >> >> #### Single Template Assertion Predicate Check >> This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). >> >> #### Common Refactorings for all the Patches in this Series >> In each of the patch, I will do similar refactoring ideas: >> - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. >> - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. >> - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. >> - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). >> - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. >> >> #### Refactorings of this Patch >> This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Emanuel Peter Oh, thanks for pointing that out! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21679#issuecomment-2441779836 From epeter at openjdk.org Mon Oct 28 14:50:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Oct 2024 14:50:37 GMT Subject: RFR: 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms In-Reply-To: References: Message-ID: <_NrQsE0AEcizkZlXbN3Dgjfc2iHx6bYBcGzS_22abp8=.92b49b23-9b51-4ce3-85a2-6ea99b4e2d3a@github.com> On Mon, 28 Oct 2024 10:09:13 GMT, Amit Kumar wrote: > Adjust TestVectorizationMismatchedAccess.java for Big Endian Platforms @offamitkumar Thanks for the fix, this looks good to me. @rwestrel You originally wrote this test, I then added more cases. Do you also agree with this fix? Ah yes, you need to undo this [PR's](https://github.com/openjdk/jdk/pull/21708) change, as asked for above. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21736#pullrequestreview-2399314640 Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21736#pullrequestreview-2399318881 From amitkumar at openjdk.org Mon Oct 28 14:50:37 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 28 Oct 2024 14:50:37 GMT Subject: RFR: 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 10:09:13 GMT, Amit Kumar wrote: > Adjust TestVectorizationMismatchedAccess.java for Big Endian Platforms >Ah yes, you need to undo this https://github.com/openjdk/jdk/pull/21708 change, as asked for above. No, that PR isn't integrated. So I suggested Matthias to close that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21736#issuecomment-2441799999 From mbaesken at openjdk.org Mon Oct 28 15:02:20 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 28 Oct 2024 15:02:20 GMT Subject: RFR: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' [v3] In-Reply-To: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> References: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> Message-ID: <-ndagm4ibk40x3EzTHkxJ1Pm2mtbLTu9oa-fQTwDCd8=.ccf7187d-c757-41e0-b0d6-73509d8ffac0@github.com> > When running with ubsanized binaries on Linux x86_64, > hs jtreg test compiler/startup/StartupOutput.java > showed this issue > > jdk/src/hotspot/share/ci/ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fcea0810117 in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1614 > #1 0x7fcea3123577 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fcea0c01499 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fcea09e9d85 in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fcea066da1d in Runtime1::generate_blob(BufferBlob*, C1StubId, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:233 > #5 0x7fcea066dfb0 in Runtime1::generate_blob_for(BufferBlob*, C1StubId) src/hotspot/share/c1/c1_Runtime1.cpp:262 > #6 0x7fcea066dfb0 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:272 > #7 0x7fcea03d2be1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fcea03d2be1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fcea0acc0c2 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fcea0ad9a3f in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fcea161264a in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:759 > #12 0x7fcea2ec739a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:234 > #13 0x7fcea251e1d2 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fcea7c6c6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 1b515766201d47a183932ba0c8c8bd0d9ee8755b) > #15 0x7fcea730f58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: 448a3ddd22596e1adb8fb3dec8921ed5b9d54dc2) > > So a nullptr check should be better added . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: move check, add assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21684/files - new: https://git.openjdk.org/jdk/pull/21684/files/f019b47f..1d240ea1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21684&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21684&range=01-02 Stats: 4 lines in 2 files changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21684/head:pull/21684 PR: https://git.openjdk.org/jdk/pull/21684 From mbaesken at openjdk.org Mon Oct 28 15:03:04 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 28 Oct 2024 15:03:04 GMT Subject: RFR: 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms In-Reply-To: <_NrQsE0AEcizkZlXbN3Dgjfc2iHx6bYBcGzS_22abp8=.92b49b23-9b51-4ce3-85a2-6ea99b4e2d3a@github.com> References: <_NrQsE0AEcizkZlXbN3Dgjfc2iHx6bYBcGzS_22abp8=.92b49b23-9b51-4ce3-85a2-6ea99b4e2d3a@github.com> Message-ID: <0VVkPsPTTxumoxfyBgr89UpU67APG4-Uv4GyDdrVNPg=.7390bdc2-d8cd-402b-84bb-302c59683ad3@github.com> On Mon, 28 Oct 2024 14:45:44 GMT, Emanuel Peter wrote: > Ah yes, you need to undo this [PR's](https://github.com/openjdk/jdk/pull/21708) change, as asked for above. I closed that one, most likely not needed any more. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21736#issuecomment-2441839201 From psandoz at openjdk.org Mon Oct 28 15:37:02 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 28 Oct 2024 15:37:02 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v32] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:36:50 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Review resolutions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Factor out IR tests and Transforms to follow-up PRs. > - Replacing flag based checks with CPU feature checks in IR validation test. > - Remove Saturating IRNode patterns. > - Restrict IR validation to newly added UMin/UMax transforms. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Prod build fix > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - New IR tests + additional IR transformations > - ... and 27 more: https://git.openjdk.org/jdk/compare/158b93d1...0e10139c Marked as reviewed by psandoz (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2399452258 From mbaesken at openjdk.org Mon Oct 28 15:36:51 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 28 Oct 2024 15:36:51 GMT Subject: RFR: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' [v2] In-Reply-To: References: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> Message-ID: On Fri, 25 Oct 2024 18:26:06 GMT, Vladimir Kozlov wrote: > I think we should do the check in `VMError::report_and_die()` to avoid creating empty replay file. Note, `dump_replay_data_unsafe()` is called only in that one place. An other path through `dump_replay_data()` call required Compilation ID which is set only when we have task. We can use assert instead of check in `ciEnv::dump_replay_data_helper()`. I moved the check and added an assert . ------------- PR Comment: https://git.openjdk.org/jdk/pull/21684#issuecomment-2441923400 From shade at openjdk.org Mon Oct 28 15:40:33 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Oct 2024 15:40:33 GMT Subject: RFR: 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators() [v2] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 05:52:39 GMT, Aleksey Shipilev wrote: >> Noticed this while looking at Leyden profiles. C2 seems to spend considerable time doing in this loop. The disassembly shows this loop is fairly hot. Replacing the initialization with memset, while touching more memory, is apparently faster. memset is also what we normally do around C2 for arena-allocated data. We seem to touch a lot of these structs later on, so pulling them to cache with memset is likely "free". >> >> It also looks like current initialization misses initializing the last element (at `C->unique()`). >> >> I'll put performance data in separate comment. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Better comment Thanks! Testing passes on our side here, so I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21690#issuecomment-2441934447 From shade at openjdk.org Mon Oct 28 15:40:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Oct 2024 15:40:34 GMT Subject: Integrated: 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators() In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 17:10:42 GMT, Aleksey Shipilev wrote: > Noticed this while looking at Leyden profiles. C2 seems to spend considerable time doing in this loop. The disassembly shows this loop is fairly hot. Replacing the initialization with memset, while touching more memory, is apparently faster. memset is also what we normally do around C2 for arena-allocated data. We seem to touch a lot of these structs later on, so pulling them to cache with memset is likely "free". > > It also looks like current initialization misses initializing the last element (at `C->unique()`). > > I'll put performance data in separate comment. This pull request has now been integrated. Changeset: e659d9da Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e659d9da5d6198ad9c85efd6472e138a6a3961c2 Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod 8342975: C2: Micro-optimize PhaseIdealLoop::Dominators() Reviewed-by: dlong, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21690 From jbhateja at openjdk.org Mon Oct 28 16:34:40 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 28 Oct 2024 16:34:40 GMT Subject: Integrated: 8338021: Support new unsigned and saturating vector operators in VectorAPI In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 06:50:59 GMT, Jatin Bhateja wrote: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html This pull request has now been integrated. Changeset: 52382e28 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/52382e285fdf853c01605f8e0d7f3f5d34965802 Stats: 9395 lines in 52 files changed: 8959 ins; 29 del; 407 mod 8338021: Support new unsigned and saturating vector operators in VectorAPI Reviewed-by: psandoz, epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/20507 From varadam at openjdk.org Mon Oct 28 16:40:18 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 28 Oct 2024 16:40:18 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object Message-ID: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Load and store assembly instructions which takes Address object as argument. Tier 1 testing successful on linux-ppc64le and aix-ppc (fastdebug level) JBS : [JDK-8331861](https://bugs.openjdk.org/browse/JDK-8331861) ------------- Commit messages: - 8331861: [PPC64] Implement load / store assembler functions which take an Address object - 8331861: [PPC64] Implement load / store assembler functions which take an Address object - 8331861: [PPC64] Implement load / store assembler functions which take an Address object - 8331861: [PPC64] Implement load / store assembler functions which take an Address object - 8331861: [PPC64] Implement load / store assembler functions which take an Address object - JDK-8331861: [PPC64] Implement load / store assembler functions which take an Address object - JDK-8331861 Changes: https://git.openjdk.org/jdk/pull/21492/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21492&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331861 Stats: 70 lines in 4 files changed: 41 ins; 0 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/21492.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21492/head:pull/21492 PR: https://git.openjdk.org/jdk/pull/21492 From amitkumar at openjdk.org Mon Oct 28 16:40:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 28 Oct 2024 16:40:19 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Mon, 14 Oct 2024 12:20:45 GMT, Varada M wrote: > Load and store assembly instructions which takes Address object as argument. > > Tier 1 testing successful on linux-ppc64le and aix-ppc (fastdebug level) > > JBS : [JDK-8331861](https://bugs.openjdk.org/browse/JDK-8331861) Can you apply this patch, I did the change with regex, so Please verify before pushing it. diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp index 684c06614a9..f9207d89615 100644 --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp @@ -157,9 +157,9 @@ void LIR_Assembler::osr_entry() { mo = frame_map()->address_for_monitor_object(i); assert(ml.index() == noreg && mo.index() == noreg, "sanity"); __ ld(R0, slot_offset + 0, OSR_buf); - __ std(R0, ml.disp(), ml.base()); + __ std(R0, ml); __ ld(R0, slot_offset + 1*BytesPerWord, OSR_buf); - __ std(R0, mo.disp(), mo.base()); + __ std(R0, mo); } } } @@ -581,14 +581,14 @@ void LIR_Assembler::emit_opConvert(LIR_OpConvert* op) { __ fcmpu(CCR0, rsrc, rsrc); if (dst_in_memory) { __ li(R0, 0); // 0 in case of NAN - __ std(R0, addr.disp(), addr.base()); + __ std(R0, addr); } else { __ li(dst->as_register(), 0); } __ bso(CCR0, L); __ fctiwz(rsrc, rsrc); // USE_KILL if (dst_in_memory) { - __ stfd(rsrc, addr.disp(), addr.base()); + __ stfd(rsrc, addr); } else { __ mffprd(dst->as_register(), rsrc); } @@ -605,14 +605,14 @@ void LIR_Assembler::emit_opConvert(LIR_OpConvert* op) { __ fcmpu(CCR0, rsrc, rsrc); if (dst_in_memory) { __ li(R0, 0); // 0 in case of NAN - __ std(R0, addr.disp(), addr.base()); + __ std(R0, addr); } else { __ li(dst->as_register_lo(), 0); } __ bso(CCR0, L); __ fctidz(rsrc, rsrc); // USE_KILL if (dst_in_memory) { - __ stfd(rsrc, addr.disp(), addr.base()); + __ stfd(rsrc, addr); } else { __ mffprd(dst->as_register_lo(), rsrc); } @@ -873,20 +873,20 @@ void LIR_Assembler::const2stack(LIR_Opr src, LIR_Opr dest) { int value = c->as_jint_bits(); __ load_const_optimized(src_reg, value); Address addr = frame_map()->address_for_slot(dest->single_stack_ix()); - __ stw(src_reg, addr.disp(), addr.base()); + __ stw(src_reg, addr); break; } case T_ADDRESS: { int value = c->as_jint_bits(); __ load_const_optimized(src_reg, value); Address addr = frame_map()->address_for_slot(dest->single_stack_ix()); - __ std(src_reg, addr.disp(), addr.base()); + __ std(src_reg, addr); break; } case T_OBJECT: { jobject2reg(c->as_jobject(), src_reg); Address addr = frame_map()->address_for_slot(dest->single_stack_ix()); - __ std(src_reg, addr.disp(), addr.base()); + __ std(src_reg, addr); break; } case T_LONG: @@ -894,7 +894,7 @@ void LIR_Assembler::const2stack(LIR_Opr src, LIR_Opr dest) { int value = c->as_jlong_bits(); __ load_const_optimized(src_reg, value); Address addr = frame_map()->address_for_double_slot(dest->double_stack_ix()); - __ std(src_reg, addr.disp(), addr.base()); + __ std(src_reg, addr); break; } default: @@ -1070,24 +1070,24 @@ void LIR_Assembler::stack2stack(LIR_Opr src, LIR_Opr dest, BasicType type) { case T_FLOAT: { Address from = frame_map()->address_for_slot(src->single_stack_ix()); Address to = frame_map()->address_for_slot(dest->single_stack_ix()); - __ lwz(tmp, from.disp(), from.base()); - __ stw(tmp, to.disp(), to.base()); + __ lwz(tmp, from); + __ stw(tmp, to); break; } case T_ADDRESS: case T_OBJECT: { Address from = frame_map()->address_for_slot(src->single_stack_ix()); Address to = frame_map()->address_for_slot(dest->single_stack_ix()); - __ ld(tmp, from.disp(), from.base()); - __ std(tmp, to.disp(), to.base()); + __ ld(tmp, from); + __ std(tmp, to); break; } case T_LONG: case T_DOUBLE: { Address from = frame_map()->address_for_double_slot(src->double_stack_ix()); Address to = frame_map()->address_for_double_slot(dest->double_stack_ix()); - __ ld(tmp, from.disp(), from.base()); - __ std(tmp, to.disp(), to.base()); + __ ld(tmp, from); + __ std(tmp, to); break; } diff --git a/src/hotspot/cpu/ppc/gc/z/zBarrierSetAssembler_ppc.cpp b/src/hotspot/cpu/ppc/gc/z/zBarrierSetAssembler_ppc.cpp index 89ab1b1edee..0307c60087f 100644 --- a/src/hotspot/cpu/ppc/gc/z/zBarrierSetAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/gc/z/zBarrierSetAssembler_ppc.cpp @@ -610,14 +610,14 @@ void ZBarrierSetAssembler::try_resolve_jobject_in_native(MacroAssembler* masm, R // Resolve global handle __ ld(dst, 0, dst); - __ ld(tmp, load_bad_mask.disp(), load_bad_mask.base()); + __ ld(tmp, load_bad_mask); __ b(check_color); __ bind(weak_tagged); // Resolve weak handle __ ld(dst, 0, dst); - __ ld(tmp, mark_bad_mask.disp(), mark_bad_mask.base()); + __ ld(tmp, mark_bad_mask); __ bind(check_color); __ and_(tmp, tmp, dst); Looks Good src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp line 160: > 158: assert(ml.index() == noreg && mo.index() == noreg, "sanity"); > 159: __ ld(R0, slot_offset + 0, OSR_buf); > 160: __ std(R0, ml, noreg); No, `tmp=noreg` is default case. So Suggestion: __ std(R0, ml); should work. src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp line 1090: > 1088: Address to = frame_map()->address_for_double_slot(dest->double_stack_ix()); > 1089: __ ld(tmp, from); > 1090: __ std(tmp, to.disp(), to.base()); why these store instruction left behind ? Can't we just pass address and give it scratch register to play with ? ------------- Changes requested by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/21492#pullrequestreview-2366620862 Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/21492#pullrequestreview-2394406675 PR Review Comment: https://git.openjdk.org/jdk/pull/21492#discussion_r1816086379 PR Review Comment: https://git.openjdk.org/jdk/pull/21492#discussion_r1814744899 From mdoerr at openjdk.org Mon Oct 28 16:40:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Oct 2024 16:40:20 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Mon, 14 Oct 2024 12:20:45 GMT, Varada M wrote: > Load and store assembly instructions which takes Address object as argument. > > Tier 1 testing successful on linux-ppc64le and aix-ppc (fastdebug level) > > JBS : [JDK-8331861](https://bugs.openjdk.org/browse/JDK-8331861) Thanks! This looks better. Please don't forget that an `Address` either has `_index` or `_disp` (but not both: https://github.com/openjdk/jdk/blob/f56a154132f7e66b1b65adfa2aa937119999b14a/src/hotspot/cpu/ppc/assembler_ppc.hpp#L44). Idea: Use the `RegisterOrConstant` version. This covers all cases including large offset and index. E.g. `inline void stw( Register d, Address &a, Register tmp = noreg);` inline void Assembler::stw( Register d, Address &a, Register tmp) { stw(d, a.index() != noreg ? RegisterOrConstant(a.index()) : RegisterOrConstant(a.disp()), a.base(), tmp); } I'd move them into a separate section. src/hotspot/cpu/ppc/assembler_ppc.hpp line 2534: > 2532: void stw( Register d, Address &a, Register tmp = noreg); > 2533: void sth( Register d, Address &a, Register tmp = noreg); > 2534: void stb( Register d, Address &a, Register tmp = noreg); Spaces before "Register d" are uncommon. Please remove them (you can keep the one for ld to align it with the other ones). src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 341: > 339: // PPC 1, section 3.3.2 Fixed-Point Load Instructions > 340: inline void Assembler::lwzx( Register d, Register s1, Register s2) { emit_int32(LWZX_OPCODE | rt(d) | ra0mem(s1) | rb(s2));} > 341: inline void Assembler::lwz( Register d, Address &a) { lwz(d, a.disp(), a.base()); } The load instructions need the same adaptation as the store instructions (just without the tmp Register). inline void Assembler::lwz( Register d, Address &a) { lwz(d, a.index() != noreg ? RegisterOrConstant(a.index()) : RegisterOrConstant(a.disp()), a.base()); } src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 380: > 378: // PPC 1, section 3.3.3 Fixed-Point Store Instructions > 379: inline void Assembler::stwx( Register d, Register s1, Register s2) { emit_int32(STWX_OPCODE | rs(d) | ra0mem(s1) | rb(s2));} > 380: inline void Assembler::stw( Register d, Address &a, Register s1) { stw(d, a.index() != noreg ? RegisterOrConstant(a.index()) : RegisterOrConstant(a.disp()), a.base(), s1); } I think the name "s1" is confusing, here. It's "tmp". src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 380: > 378: // PPC 1, section 3.3.3 Fixed-Point Store Instructions > 379: inline void Assembler::stwx( Register d, Register s1, Register s2) { emit_int32(STWX_OPCODE | rs(d) | ra0mem(s1) | rb(s2));} > 380: inline void Assembler::stw( Register d, Address &a, Register tmp) { stw(d, a.index() != noreg ? RegisterOrConstant(a.index()) : RegisterOrConstant(a.disp()), a.base(), tmp); } The line is very long. Better break it to: inline void Assembler::stw( Register d, Address &a, Register tmp) { stw(d, a.index() != noreg ? RegisterOrConstant(a.index()) : RegisterOrConstant(a.disp()), a.base(), tmp); } ------------- PR Review: https://git.openjdk.org/jdk/pull/21492#pullrequestreview-2392082045 PR Comment: https://git.openjdk.org/jdk/pull/21492#issuecomment-2411332924 PR Comment: https://git.openjdk.org/jdk/pull/21492#issuecomment-2414709306 PR Review Comment: https://git.openjdk.org/jdk/pull/21492#discussion_r1814734835 PR Review Comment: https://git.openjdk.org/jdk/pull/21492#discussion_r1814738595 PR Review Comment: https://git.openjdk.org/jdk/pull/21492#discussion_r1814714539 PR Review Comment: https://git.openjdk.org/jdk/pull/21492#discussion_r1814736276 From varadam at openjdk.org Mon Oct 28 16:40:20 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 28 Oct 2024 16:40:20 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Mon, 14 Oct 2024 12:20:45 GMT, Varada M wrote: > Load and store assembly instructions which takes Address object as argument. > > Tier 1 testing successful on linux-ppc64le and aix-ppc (fastdebug level) > > JBS : [JDK-8331861](https://bugs.openjdk.org/browse/JDK-8331861) Tier1 testing on linux-ppc64le has completed successfully. I am now running tests for aix-ppc Tier1 testing on aix-ppc has completed successfully. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21492#issuecomment-2434908095 PR Comment: https://git.openjdk.org/jdk/pull/21492#issuecomment-2442079841 From mdoerr at openjdk.org Mon Oct 28 16:40:21 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Oct 2024 16:40:21 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Tue, 15 Oct 2024 18:18:57 GMT, Martin Doerr wrote: >> Load and store assembly instructions which takes Address object as argument. >> >> Tier 1 testing successful on linux-ppc64le and aix-ppc (fastdebug level) >> >> JBS : [JDK-8331861](https://bugs.openjdk.org/browse/JDK-8331861) > > Idea: Use the `RegisterOrConstant` version. This covers all cases including large offset and index. E.g. > `inline void stw( Register d, Address &a, Register tmp = noreg);` > > inline void Assembler::stw( Register d, Address &a, Register tmp) { > stw(d, a.index() != noreg ? RegisterOrConstant(a.index()) : RegisterOrConstant(a.disp()), a.base(), tmp); > } > > I'd move them into a separate section. > @TheRealMDoerr are floating-point load/store instructions out of scope for this PR? > > I see couple of use cases: > > ```c++ > ./c1_LIRAssembler_ppc.cpp:591: __ stfd(rsrc, addr.disp(), addr.base()); > ./c1_LIRAssembler_ppc.cpp:615: __ stfd(rsrc, addr.disp(), addr.base()); > ``` That could be done, too, but floating point instructions are so rarely used, that we could skip them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21492#issuecomment-2437400879 From varadam at openjdk.org Mon Oct 28 16:40:21 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 28 Oct 2024 16:40:21 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Thu, 24 Oct 2024 10:41:27 GMT, Martin Doerr wrote: >> Load and store assembly instructions which takes Address object as argument. >> >> Tier 1 testing successful on linux-ppc64le and aix-ppc (fastdebug level) >> >> JBS : [JDK-8331861](https://bugs.openjdk.org/browse/JDK-8331861) > > src/hotspot/cpu/ppc/assembler_ppc.hpp line 2534: > >> 2532: void stw( Register d, Address &a, Register tmp = noreg); >> 2533: void sth( Register d, Address &a, Register tmp = noreg); >> 2534: void stb( Register d, Address &a, Register tmp = noreg); > > Spaces before "Register d" are uncommon. Please remove them (you can keep the one for ld to align it with the other ones). Thank you Martin. I have fixed the alignment for both the cases (RegisterOrConstant and Address obj). Also I have adapted the same changes to load instructions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21492#discussion_r1816047123 From varadam at openjdk.org Mon Oct 28 16:40:21 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 28 Oct 2024 16:40:21 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Fri, 25 Oct 2024 06:38:38 GMT, Amit Kumar wrote: >> Load and store assembly instructions which takes Address object as argument. >> >> Tier 1 testing successful on linux-ppc64le and aix-ppc (fastdebug level) >> >> JBS : [JDK-8331861](https://bugs.openjdk.org/browse/JDK-8331861) > > src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp line 160: > >> 158: assert(ml.index() == noreg && mo.index() == noreg, "sanity"); >> 159: __ ld(R0, slot_offset + 0, OSR_buf); >> 160: __ std(R0, ml, noreg); > > No, `tmp=noreg` is default case. > > So > Suggestion: > > __ std(R0, ml); > > should work. Okay got it. Thank you ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21492#discussion_r1816093552 From amitkumar at openjdk.org Mon Oct 28 16:40:21 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 28 Oct 2024 16:40:21 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Thu, 24 Oct 2024 10:49:17 GMT, Amit Kumar wrote: >> Load and store assembly instructions which takes Address object as argument. >> >> Tier 1 testing successful on linux-ppc64le and aix-ppc (fastdebug level) >> >> JBS : [JDK-8331861](https://bugs.openjdk.org/browse/JDK-8331861) > > src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp line 1090: > >> 1088: Address to = frame_map()->address_for_double_slot(dest->double_stack_ix()); >> 1089: __ ld(tmp, from); >> 1090: __ std(tmp, to.disp(), to.base()); > > why these store instruction left behind ? > > Can't we just pass address and give it scratch register to play with ? Oh, we don't even need `tmp` and it defaults to `noreg`. Even better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21492#discussion_r1814747197 From varadam at openjdk.org Mon Oct 28 16:40:21 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 28 Oct 2024 16:40:21 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Thu, 24 Oct 2024 10:51:06 GMT, Amit Kumar wrote: >> src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp line 1090: >> >>> 1088: Address to = frame_map()->address_for_double_slot(dest->double_stack_ix()); >>> 1089: __ ld(tmp, from); >>> 1090: __ std(tmp, to.disp(), to.base()); >> >> why these store instruction left behind ? >> >> Can't we just pass address and give it scratch register to play with ? > > Oh, we don't even need `tmp` and it defaults to `noreg`. Even better. Thanks for suggestion! I have adapted the changes for store instructions. Testing (tier1) on linux-ppc64le is successfully completed. Testing to be done for aix-ppc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21492#discussion_r1816048992 From amitkumar at openjdk.org Mon Oct 28 16:40:21 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 28 Oct 2024 16:40:21 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Tue, 15 Oct 2024 18:18:57 GMT, Martin Doerr wrote: >> Load and store assembly instructions which takes Address object as argument. >> >> Tier 1 testing successful on linux-ppc64le and aix-ppc (fastdebug level) >> >> JBS : [JDK-8331861](https://bugs.openjdk.org/browse/JDK-8331861) > > Idea: Use the `RegisterOrConstant` version. This covers all cases including large offset and index. E.g. > `inline void stw( Register d, Address &a, Register tmp = noreg);` > > inline void Assembler::stw( Register d, Address &a, Register tmp) { > stw(d, a.index() != noreg ? RegisterOrConstant(a.index()) : RegisterOrConstant(a.disp()), a.base(), tmp); > } > > I'd move them into a separate section. @TheRealMDoerr are floating-point load/store instructions out of scope for this PR? I see couple of use cases: ./c1_LIRAssembler_ppc.cpp:591: __ stfd(rsrc, addr.disp(), addr.base()); ./c1_LIRAssembler_ppc.cpp:615: __ stfd(rsrc, addr.disp(), addr.base()); ------------- PR Comment: https://git.openjdk.org/jdk/pull/21492#issuecomment-2437053446 From kvn at openjdk.org Mon Oct 28 16:59:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 28 Oct 2024 16:59:11 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v2] In-Reply-To: References: <8UbXTjkMPw6xs9rvGtJukgbq8dOM4q-ogBLXlA7sdhA=.41db68e4-1f59-47e2-b0eb-6caa9a7a7d7b@github.com> Message-ID: On Tue, 22 Oct 2024 14:25:04 GMT, Roberto Casta?eda Lozano wrote: >> I'm fine with this change, although I'd then argue that we could simplify it and just say we add 4 words (instead of 3) for incoming/outgoing arguments. @vnkozlov You suggested the current wording that specifically mentions APX, what do you think? > >> we could simplify it and just say we add 4 words (instead of 3) for incoming/outgoing arguments > > That would work for me too. I am fine with comment suggested by Roberto. But I would still mention APX in additional statement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1819422226 From mdoerr at openjdk.org Mon Oct 28 17:02:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Oct 2024 17:02:38 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v8] In-Reply-To: References: Message-ID: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add sanity check: fpregs should point into the context. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21615/files - new: https://git.openjdk.org/jdk/pull/21615/files/24278108..f7bfe87c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21615&range=06-07 Stats: 12 lines in 2 files changed: 6 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21615.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21615/head:pull/21615 PR: https://git.openjdk.org/jdk/pull/21615 From mdoerr at openjdk.org Mon Oct 28 17:02:39 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Oct 2024 17:02:39 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v7] In-Reply-To: References: Message-ID: <31124tjhP7jUL6tsIxCvekG5jxCz-kTwS5daNbYwumk=.2fa6ddac-03ce-4d30-9fa5-3a33462e37e5@github.com> On Mon, 28 Oct 2024 14:19:29 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Add check for uc->uc_mcontext.fpregs >= uc. > > src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 552: > >> 550: size_t fpregs_offset = ((address)uc->uc_mcontext.fpregs >= (address)uc) ? >> 551: pointer_delta(uc->uc_mcontext.fpregs, uc, 1) : 0; >> 552: if (fpregs_offset >= sizeof(ucontext_t) || fpregs_offset == 0) { > > Why protect against the assertion in `pointer_delta` here and not also in `store_context`? > You could do the following in both methods: > Suggestion: > > size_t fpregs_offset = ((address)uc->uc_mcontext.fpregs >= (address)uc) ? > pointer_delta(uc->uc_mcontext.fpregs, uc, 1) : sizeof(ucontext_t); > if (fpregs_offset >= sizeof(ucontext_t)) { I've changed it such that we don't touch in this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1819426998 From kvn at openjdk.org Mon Oct 28 17:58:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 28 Oct 2024 17:58:15 GMT Subject: RFR: 8343137: C2: VerifyLoopOptimizations fails with "Was reachable in only one" In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 12:41:02 GMT, Christian Hagedorn wrote: > [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) introduced a new usage of `igvn.intcon()` but missed to call `set_ctrl()`. This simple patch fixes this. > > While working on this, I've noticed that a few Assertion Predicates tests ended up in the `predicates` directory while they would have better been placed into `predicates/assertion`. It's probably not worth to file a separate issue just for that, so I've squeezed this trivial move into this fix. Note that `AssertionPredicateDoesntConstantFold` does not define a package and thus does not need an update to the command line. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21739#pullrequestreview-2399807897 From kvn at openjdk.org Mon Oct 28 17:58:16 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 28 Oct 2024 17:58:16 GMT Subject: RFR: 8343137: C2: VerifyLoopOptimizations fails with "Was reachable in only one" In-Reply-To: References: Message-ID: <7M9KQnOfQ0QyoaFnz0bzt0nSFxN9s8MNIpWQafli1sw=.b95d98f5-3cf0-4310-b358-39ae5c1706d8@github.com> On Mon, 28 Oct 2024 12:42:40 GMT, Christian Hagedorn wrote: >> [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) introduced a new usage of `igvn.intcon()` but missed to call `set_ctrl()`. This simple patch fixes this. >> >> While working on this, I've noticed that a few Assertion Predicates tests ended up in the `predicates` directory while they would have better been placed into `predicates/assertion`. It's probably not worth to file a separate issue just for that, so I've squeezed this trivial move into this fix. Note that `AssertionPredicateDoesntConstantFold` does not define a package and thus does not need an update to the command line. >> >> Thanks, >> Christian > > src/hotspot/share/opto/loopnode.cpp line 4492: > >> 4490: if (!useful_predicates.member(opaque_node)) { // not in the useful list >> 4491: ConINode* one = _igvn.intcon(1); >> 4492: set_ctrl(one, C->root()); > > Noticed that we find this pattern quite often in our code. Would be nice to have a `PhaseIdealLoop::intcon()` which calls `igvn.intcon()` and takes care of setting ctrl. I filed [JDK-8343148](https://bugs.openjdk.org/browse/JDK-8343148) to keep track of that. Agree. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21739#discussion_r1819505737 From kvn at openjdk.org Mon Oct 28 18:07:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 28 Oct 2024 18:07:05 GMT Subject: RFR: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' [v3] In-Reply-To: <-ndagm4ibk40x3EzTHkxJ1Pm2mtbLTu9oa-fQTwDCd8=.ccf7187d-c757-41e0-b0d6-73509d8ffac0@github.com> References: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> <-ndagm4ibk40x3EzTHkxJ1Pm2mtbLTu9oa-fQTwDCd8=.ccf7187d-c757-41e0-b0d6-73509d8ffac0@github.com> Message-ID: On Mon, 28 Oct 2024 15:02:20 GMT, Matthias Baesken wrote: >> When running with ubsanized binaries on Linux x86_64, >> hs jtreg test compiler/startup/StartupOutput.java >> showed this issue >> >> jdk/src/hotspot/share/ci/ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' >> #0 0x7fcea0810117 in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1614 >> #1 0x7fcea3123577 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 >> #2 0x7fcea0c01499 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 >> #3 0x7fcea09e9d85 in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 >> #4 0x7fcea066da1d in Runtime1::generate_blob(BufferBlob*, C1StubId, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:233 >> #5 0x7fcea066dfb0 in Runtime1::generate_blob_for(BufferBlob*, C1StubId) src/hotspot/share/c1/c1_Runtime1.cpp:262 >> #6 0x7fcea066dfb0 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:272 >> #7 0x7fcea03d2be1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 >> #8 0x7fcea03d2be1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 >> #9 0x7fcea0acc0c2 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 >> #10 0x7fcea0ad9a3f in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 >> #11 0x7fcea161264a in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:759 >> #12 0x7fcea2ec739a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:234 >> #13 0x7fcea251e1d2 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 >> #14 0x7fcea7c6c6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 1b515766201d47a183932ba0c8c8bd0d9ee8755b) >> #15 0x7fcea730f58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: 448a3ddd22596e1adb8fb3dec8921ed5b9d54dc2) >> >> So a nullptr check should be better added . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > move check, add assert Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21684#pullrequestreview-2399827187 From mdoerr at openjdk.org Mon Oct 28 18:34:45 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Oct 2024 18:34:45 GMT Subject: RFR: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' [v3] In-Reply-To: <-ndagm4ibk40x3EzTHkxJ1Pm2mtbLTu9oa-fQTwDCd8=.ccf7187d-c757-41e0-b0d6-73509d8ffac0@github.com> References: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> <-ndagm4ibk40x3EzTHkxJ1Pm2mtbLTu9oa-fQTwDCd8=.ccf7187d-c757-41e0-b0d6-73509d8ffac0@github.com> Message-ID: <_ZT-eD-xqv9awZIZERv2g5R3ibZMmnFvo9c9YPlg8ag=.e59f2b27-f742-445a-82e9-0016c7094896@github.com> On Mon, 28 Oct 2024 15:02:20 GMT, Matthias Baesken wrote: >> When running with ubsanized binaries on Linux x86_64, >> hs jtreg test compiler/startup/StartupOutput.java >> showed this issue >> >> jdk/src/hotspot/share/ci/ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' >> #0 0x7fcea0810117 in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1614 >> #1 0x7fcea3123577 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 >> #2 0x7fcea0c01499 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 >> #3 0x7fcea09e9d85 in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 >> #4 0x7fcea066da1d in Runtime1::generate_blob(BufferBlob*, C1StubId, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:233 >> #5 0x7fcea066dfb0 in Runtime1::generate_blob_for(BufferBlob*, C1StubId) src/hotspot/share/c1/c1_Runtime1.cpp:262 >> #6 0x7fcea066dfb0 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:272 >> #7 0x7fcea03d2be1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 >> #8 0x7fcea03d2be1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 >> #9 0x7fcea0acc0c2 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 >> #10 0x7fcea0ad9a3f in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 >> #11 0x7fcea161264a in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:759 >> #12 0x7fcea2ec739a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:234 >> #13 0x7fcea251e1d2 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 >> #14 0x7fcea7c6c6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 1b515766201d47a183932ba0c8c8bd0d9ee8755b) >> #15 0x7fcea730f58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: 448a3ddd22596e1adb8fb3dec8921ed5b9d54dc2) >> >> So a nullptr check should be better added . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > move check, add assert LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21684#pullrequestreview-2399890440 From rrich at openjdk.org Mon Oct 28 19:36:23 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 28 Oct 2024 19:36:23 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v8] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 17:02:38 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add sanity check: fpregs should point into the context. Marked as reviewed by rrich (Reviewer). src/hotspot/share/utilities/debug.cpp line 738: > 736: if ((address)((const ucontext_t*)context)->uc_mcontext.fpregs > (address)context) { > 737: size_t fpregs_offset = pointer_delta(((const ucontext_t*)context)->uc_mcontext.fpregs, context, 1); > 738: if (fpregs_offset < sizeof(ucontext_t)) { I think you should set `fpregs` to null if the original `fpregs` is invalid. The message in `print_context` will be confusing otherwise. But I'll leave it to you assuming it'll never happen. Personally I think we should not try to avoid the assertion in `pointer_delta` and keep the code concise. ------------- PR Review: https://git.openjdk.org/jdk/pull/21615#pullrequestreview-2400036025 PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1819644667 From mdoerr at openjdk.org Mon Oct 28 19:46:21 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Oct 2024 19:46:21 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v8] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 19:34:07 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Add sanity check: fpregs should point into the context. > > src/hotspot/share/utilities/debug.cpp line 738: > >> 736: if ((address)((const ucontext_t*)context)->uc_mcontext.fpregs > (address)context) { >> 737: size_t fpregs_offset = pointer_delta(((const ucontext_t*)context)->uc_mcontext.fpregs, context, 1); >> 738: if (fpregs_offset < sizeof(ucontext_t)) { > > I think you should set `fpregs` to null if the original `fpregs` is invalid. The message in `print_context` will be confusing otherwise. > But I'll leave it to you assuming it'll never happen. Personally I think we should not try to avoid the assertion in `pointer_delta` and keep the code concise. I think null would be invalid, too. I prefer not touching it if it's already broken. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21615#discussion_r1819654332 From aturbanov at openjdk.org Mon Oct 28 20:53:27 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 28 Oct 2024 20:53:27 GMT Subject: RFR: 8342958: Use jvmArgs consistently in microbenchmarks In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:52:57 GMT, Claes Redestad wrote: > Many OpenJDK micros use `@Fork(jvmArgs/-Append/-Prepend)` to add JVM reasonable or necessary flags, but when deploying and running micros we often want to add or replace flags to tune to the machine, test different GCs, etc. The inconsistent use of the different `jvmArgs` options make it error prone, and we've had a few recent cases where we've not been testing with the expected set of flags. > > This PR suggests using `jvmArgs` consistently. I think this aligns with the intuition that when you use `jvmArgsAppend/-Prepend` intent is to add to a set of existing flags, while if you supply `jvmArgs` intent is "run with these and nothing else". Perhaps there are other opinions/preferences, and I don't feel strongly about which to consolidate to as long as we do so consistently. One argument could be made to consolidate on `jvmArgsAppend` since that one is (likely accidentally) the current most popular (143 compared to 59 `jvmArgsPrepend` and 18 `jvmArgs`). test/micro/org/openjdk/bench/vm/compiler/overhead/SimpleRepeatCompilation.java line 98: > 96: > 97: @Benchmark > 98: @Fork(jvmArgs={"-Xbatch",TRIVIAL_MATH_METHOD}) Suggestion: @Fork(jvmArgs={"-Xbatch", TRIVIAL_MATH_METHOD}) test/micro/org/openjdk/bench/vm/compiler/overhead/SimpleRepeatCompilation.java line 138: > 136: > 137: @Benchmark > 138: @Fork(jvmArgs={"-Xbatch",LARGE_METHOD}) Suggestion: @Fork(jvmArgs={"-Xbatch", LARGE_METHOD}) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21683#discussion_r1819745626 PR Review Comment: https://git.openjdk.org/jdk/pull/21683#discussion_r1819745805 From vlivanov at openjdk.org Mon Oct 28 21:44:10 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 28 Oct 2024 21:44:10 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 03:54:06 GMT, Jasmine Karthikeyan wrote: >> @jaskarth thanks for exploring platform-specific lowering! >> >> I briefly looked through the changes, but I didn't get a good understanding of its design goals. It's hard to see what use cases it is targeted for when only skeleton code is present. It would really help if there are some cases ported on top of it for illustration purposes and to solidify the design. >> >> Currently, there are multiple places in the code where IR lowering happens. In particular: >> * IGVN during post loop opts phase (guarded by `Compile::post_loop_opts_phase()`) (Ideal -> Ideal); >> * macro expansion (Ideal -> Ideal); >> * ad-hoc passes (GC barrier expansion, `MacroLogicV`) (Ideal -> Ideal); >> * final graph reshaping (Ideal -> Ideal); >> * matcher (Ideal -> Mach). >> >> I'd like to understand how the new pass is intended to interact with existing cases. >> >> Only the last one is truly platform-specific, but there are some platform-specific cases exposes in other places (e.g., MacroLogicV pass, DivMod combining) guarded by some predicates on `Matcher`. >> >> As the `PhaseLowering` is implemented now, it looks like a platform-specific macro expansion pass (1->many rewriting). But `MacroLogicV` case doesn't fit such model well. >> >> I see changes to enable platform-specific node classes. As of now, only Matcher introduces platform-specific nodes and all of them are Mach nodes. Platform-specific Ideal nodes are declared in shared code, but then their usages are guarded by `Matcher::has_match_rule()` thus ensuring there's enough support on back-end side. >> >> Some random observations: >> * the pass is performed unconditionally and it iterates over all live nodes; in contrast, macro nodes and nodes for post-loop opts IGVN are explicitly listed on the side (MacroLogicV pass is also guarded, but by a coarse-grained check); > > Thanks a lot for your analysis of the patch, @iwanowww! I hope to answer some of your questions here. > >> It's hard to see what use cases it is targeted for when only skeleton code is present. It would really help if there are some cases ported on top of it for illustration purposes and to solidify the design. > > I think this is a very fair point. I was testing some cases before I made the PR, but I wanted to submit just the system in isolation to make it easier to review. I can make some example use cases separately to show what could be possible with the new system. > >> I'd like to understand how the new pass is intended to interact with existing cases. > > The overarching goal is to support new kinds of transforms on ideal nodes that are only relevant to a single hardware platform, which would otherwise be too narrow in scope to put in shared code but would be difficult to do in purely AD code. It can be helpful having GVN while transforming the IR into a more backend-specific form. @merykitty added some nice examples above that illustrate possible use-cases. > >> As the `PhaseLowering` is implemented now, it looks like a platform-specific macro expansion pass (1->many rewriting). But MacroLogicV case doesn't fit such model well. > > The lowering implementation works similarly to how an `Ideal()` call works, so it's possible to do many->1 (like `MacroLogicV`) and many->many transformations as well. > >> I see changes to enable platform-specific node classes. As of now, only Matcher introduces platform-specific nodes and all of them are Mach nodes. > > I was thinking if we're introducing nodes that only have functionality on specific platforms it might be nice to make those nodes only exist on those platforms as well, to reduce the size of shared code on platforms where the nodes aren't relevant. Since the lowering phase introduces new nodes that are specially known to the backend they should be supported by the backend too. However, it's not a necessary component of the lowering phase, just something that I thought could help with the implementation of lowered nodes. > >> the pass is performed unconditionally and it iterates over all live nodes; in contrast, macro nodes and nodes for post-loop opts IGVN are explicitly listed on the side (MacroLogicV pass is also guarded, but by a coarse-grained check); > > This is true, my thought was since MacroLogicV currently also iterates across all live nodes doing it here as well would be alright. I think a way to collect lowering-spec... @jaskarth > I was testing some cases before I made the PR, but I wanted to submit just the system in isolation to make it easier to review. I can make some example use cases separately to show what could be possible with the new system. Thanks. Primarily, I'm interested in how it fits existing use cases. For example, if somebody can port MacroLogicV and DivMod on top and publish them as dependent PRs, that would be very helpful to guide the discussion. > I was thinking if we're introducing nodes that only have functionality on specific platforms it might be nice to make those nodes only exist on those platforms as well, to reduce the size of shared code on platforms where the nodes aren't relevant. It looks attractive at first, but the downside is subsequent passes may start to require platform-specific code as well (e.g., think of final graph reshaping which operates on Node opcodes). Also, total number of platform-specific Ideal nodes was low (especially, when compared to Mach nodes generated from AD files). So, keeping relevant code shared and guarding its usages with `Matcher::match_rule_supported()` seems appropriate. > ... my thought was since MacroLogicV currently also iterates across all live nodes doing it here as well would be alright. MacroLogicV pass is guarded by `C->max_vector_size() > 0` and `Matcher::match_rule_supported(Op_MacroLogicV)` which (1) limits it to AVX512-capable hardware; and (2) ensures that some vector nodes were produced during compilation. It is a coarser-grained check than strictly required, but very effective at detecting when there are no optimization opportunities present. > I think a way to collect lowering-specific nodes would be difficult since the nodes that actually get lowered could change between backends. It would definitely require a way to signal what Ideal nodes/IR patterns are interesting for a particular backend. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2442687335 From vlivanov at openjdk.org Mon Oct 28 22:56:57 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 28 Oct 2024 22:56:57 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 07:10:59 GMT, Jatin Bhateja wrote: > The application of lowering is very broad as it can help us perform arbitrary transformation as well as take advantages of GVN @merykitty thanks for the examples. The idea of gradual IR lowering is not new in C2. There are precedents in the code base, so I'd like to better understand how the new pass improves the overall situation. Introducing a way to perform arbitrary platform-specific transformations on Ideal does sound very powerful, but it also turns Ideal IR into platform-specific dialects which don't have to work with existing transformations (IGVN, in particular). Do the use cases mentioned so far justify a platform-specific lowering pass on Ideal IR which is intended to produce platform-specific Ideal IR shapes? I don't know yet. Also, there are alternative places where platform-specific transformations can take place (macro expansion, final graph reshaping, custom matching logic). Worth considering them as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2442814693 From redestad at openjdk.org Mon Oct 28 22:57:11 2024 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 28 Oct 2024 22:57:11 GMT Subject: RFR: 8342958: Use jvmArgs consistently in microbenchmarks In-Reply-To: References: Message-ID: <_nEO15xuXAky1fmDQKzsX0gBZ8n62tlTInXFddHPqWU=.fb5508ea-75e5-4992-b982-255f5df886f7@github.com> On Thu, 24 Oct 2024 13:52:57 GMT, Claes Redestad wrote: > Many OpenJDK micros use `@Fork(jvmArgs/-Append/-Prepend)` to add JVM reasonable or necessary flags, but when deploying and running micros we often want to add or replace flags to tune to the machine, test different GCs, etc. The inconsistent use of the different `jvmArgs` options make it error prone, and we've had a few recent cases where we've not been testing with the expected set of flags. > > This PR suggests using `jvmArgs` consistently. I think this aligns with the intuition that when you use `jvmArgsAppend/-Prepend` intent is to add to a set of existing flags, while if you supply `jvmArgs` intent is "run with these and nothing else". Perhaps there are other opinions/preferences, and I don't feel strongly about which to consolidate to as long as we do so consistently. One argument could be made to consolidate on `jvmArgsAppend` since that one is (likely accidentally) the current most popular (143 compared to 59 `jvmArgsPrepend` and 18 `jvmArgs`). Thanks for reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21683#issuecomment-2442806508 From redestad at openjdk.org Mon Oct 28 22:57:12 2024 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 28 Oct 2024 22:57:12 GMT Subject: RFR: 8342958: Use jvmArgs consistently in microbenchmarks In-Reply-To: References: Message-ID: <2yJtOeJKZ4E6P2xWCXH4EY-p2tif8-Vp279sdH4agmg=.5222d468-ef1f-4884-8647-9c387f39d31f@github.com> On Mon, 28 Oct 2024 20:51:22 GMT, Andrey Turbanov wrote: >> Many OpenJDK micros use `@Fork(jvmArgs/-Append/-Prepend)` to add JVM reasonable or necessary flags, but when deploying and running micros we often want to add or replace flags to tune to the machine, test different GCs, etc. The inconsistent use of the different `jvmArgs` options make it error prone, and we've had a few recent cases where we've not been testing with the expected set of flags. >> >> This PR suggests using `jvmArgs` consistently. I think this aligns with the intuition that when you use `jvmArgsAppend/-Prepend` intent is to add to a set of existing flags, while if you supply `jvmArgs` intent is "run with these and nothing else". Perhaps there are other opinions/preferences, and I don't feel strongly about which to consolidate to as long as we do so consistently. One argument could be made to consolidate on `jvmArgsAppend` since that one is (likely accidentally) the current most popular (143 compared to 59 `jvmArgsPrepend` and 18 `jvmArgs`). > > test/micro/org/openjdk/bench/vm/compiler/overhead/SimpleRepeatCompilation.java line 138: > >> 136: >> 137: @Benchmark >> 138: @Fork(jvmArgs={"-Xbatch",LARGE_METHOD}) > > Suggestion: > > @Fork(jvmArgs={"-Xbatch", LARGE_METHOD}) I don't think this PR is the place to address pre-existing and non-consequential style issues. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21683#discussion_r1819860308 From redestad at openjdk.org Mon Oct 28 22:57:13 2024 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 28 Oct 2024 22:57:13 GMT Subject: Integrated: 8342958: Use jvmArgs consistently in microbenchmarks In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:52:57 GMT, Claes Redestad wrote: > Many OpenJDK micros use `@Fork(jvmArgs/-Append/-Prepend)` to add JVM reasonable or necessary flags, but when deploying and running micros we often want to add or replace flags to tune to the machine, test different GCs, etc. The inconsistent use of the different `jvmArgs` options make it error prone, and we've had a few recent cases where we've not been testing with the expected set of flags. > > This PR suggests using `jvmArgs` consistently. I think this aligns with the intuition that when you use `jvmArgsAppend/-Prepend` intent is to add to a set of existing flags, while if you supply `jvmArgs` intent is "run with these and nothing else". Perhaps there are other opinions/preferences, and I don't feel strongly about which to consolidate to as long as we do so consistently. One argument could be made to consolidate on `jvmArgsAppend` since that one is (likely accidentally) the current most popular (143 compared to 59 `jvmArgsPrepend` and 18 `jvmArgs`). This pull request has now been integrated. Changeset: 90bd5445 Author: Claes Redestad URL: https://git.openjdk.org/jdk/commit/90bd544512de541cd98889bec58f419bc69a723d Stats: 202 lines in 142 files changed: 0 ins; 0 del; 202 mod 8342958: Use jvmArgs consistently in microbenchmarks Reviewed-by: ecaspole, jvernee ------------- PR: https://git.openjdk.org/jdk/pull/21683 From qamai at openjdk.org Mon Oct 28 23:44:08 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 28 Oct 2024 23:44:08 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 22:46:55 GMT, Vladimir Ivanov wrote: >>> @jatin-bhateja @iwanowww The application of lowering is very broad as it can help us perform arbitrary transformation as well as take advantages of GVN in the ideal world: >>> >>> 1, Any expansion that can benefit from GVN can be done in this pass. The first example is `ExtractXNode`s. Currently, it is expanded during code emission. An `int` extraction at the index 5 is currently expanded to: >>> >>> ``` >>> vextracti128 xmm1, ymm0, 1 >>> vpextrd eax, xmm1, 1 >>> ``` >>> >>> If we try to extract multiple elements then `vextracti128` would be needlessly emitted multiple times. By moving the expansion from code emission to lowering, we can do GVN and eliminate the redundant operations. For vector insertions, the situation is even worse, as it would be expanded into multiple instructions. For example, to construct a vector from 4 long values, we would have to: >>> >>> ``` >>> vpxor xmm0, xmm0, xmm0 >>> >>> vmovdqu xmm1, xmm0 >>> vpinsrq xmm1, xmm1, rax, 0 >>> vinserti128 ymm0, ymm0, xmm1, 0 >>> >>> vmovdqu xmm1, xmm0 >>> vpinsrq xmm1, xmm1, rcx, 1 >>> vinserti128 ymm0, ymm0, xmm1, 0 >>> >>> vextracti128 xmm1, ymm0, 1 >>> vpinsrq xmm1, xmm1, rdx, 0 >>> vinserti128 ymm0, ymm0, xmm1, 1 >>> >>> vextracti128 xmm1, ymm0, 1 >>> vpinsrq xmm1, xmm1, rbx, 1 >>> vinserti128 ymm0, ymm0, xmm1, 1 >>> ``` >>> >>> By moving the expansion to lowering we can have a much more efficient sequence: >>> >>> ``` >>> vmovq xmm0, rax >>> vpinsrq xmm0, xmm0, rcx, 1 >>> vmovq xmm1, rdx >>> vpinsrq xmm1, xmm1, rbx, 1 >>> vinserti128 ymm0, ymm0, xmm1, 1 >>> ``` >>> >> >> Hi @jaskarth >> Target specific IR compliments lowering pass, the example above very appropriately showcases the usefulness of lowering pass. For completeness we should extend this patch and add target specific extensions to "opto/classes.hpp" and a new Node.hpp' to record new target specific IR definitions. >> >> Hi @merykitty , >> Lowering will also reduce register pressure since we may be able to save additional temporary machine operands by splitting monolithic instruction encoding blocks across multiple lowered IR nodes, this together with GVN promoted sharing should be very powerful. > >> The application of lowering is very broad as it can help us perform arbitrary transformation as well as take advantages of GVN > > @merykitty thanks for the examples. The idea of gradual IR lowering is not new in C2. There are precedents in the code base, so I'd like to better understand how the new pass improves the overall situation. Introducing a way to perform arbitrary platform-specific transformations on Ideal does sound very powerful, but it also turns Ideal IR into platform-specific dialects which don't have to work with existing transformations (IGVN, in particular). > > Do the use cases mentioned so far justify a platform-specific lowering pass on Ideal IR which is intended to produce platform-specific Ideal IR shapes? I don't know yet. > > Also, there are alternative places where platform-specific transformations can take place (macro expansion, final graph reshaping, custom matching logic). Worth considering them as well. @iwanowww I hope to address some of your concerns: > It looks attractive at first, but the downside is subsequent passes may start to require platform-specific code as well (e.g., think of final graph reshaping which operates on Node opcodes). Also, total number of platform-specific Ideal nodes was low (especially, when compared to Mach nodes generated from AD files). So, keeping relevant code shared and guarding its usages with `Matcher::match_rule_supported()` seems appropriate. It would not be possible without a stretch, consider my example regarding `ExtractINode` above, since `Matcher::match_rule_support(ExtractINode)` will surely return `true`, we would need another `Matcher` method to decide when and how to expand such a node, as it is a really peculiar circumstance that x86 element extraction/insertion operations is only available with 128-bit vectors, and to do so with higher elements, we need to extract the corresponding 128-bit lane first. What do you think about keeping the node declaration in shared code but putting the lowering transformations in the backend-specific source files? We can then use prefixes to denote a node being available on a specific backend only. > `MacroLogicV` pass is guarded by `C->max_vector_size() > 0` and `Matcher::match_rule_supported(Op_MacroLogicV)` which (1) limits it to AVX512-capable hardware; and (2) ensures that some vector nodes were produced during compilation. It is a coarser-grained check than strictly required, but very effective at detecting when there are no optimization opportunities present. I don't think this is a concern, enumerating all live nodes once without doing anything is not expensive. > The idea of gradual IR lowering is not new in C2. There are precedents in the code base, so I'd like to better understand how the new pass improves the overall situation. Introducing a way to perform arbitrary platform-specific transformations on Ideal does sound very powerful, but it also turns Ideal IR into platform-specific dialects which don't have to work with existing transformations (IGVN, in particular). That's why it is intended to be executed only after general `igvn`. > Do the use cases mentioned so far justify a platform-specific lowering pass on Ideal IR which is intended to produce platform-specific Ideal IR shapes? I don't know yet. As you have mentioned, we do have platform-specific transformations already, the issue is that they are fragmented in shared code. Introducing lowering allows us to consolidate those into 1 place with platform-specific transformations living nicely in plarform-specific code. And in addition to that, it allows us to perform more platform-specific transformations in a scalable manner, such as #21244 . > Also, there are alternative places where platform-specific transformations can take place (macro expansion, final graph reshaping, custom matching logic). Worth considering them as well. Macro expansion would be too early, as we still do platform-independent `igvn` there, while final graph reshaping and custom matching logic would be too late, as we have destroyed the node hash table already. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2442875876 From qamai at openjdk.org Mon Oct 28 23:51:07 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 28 Oct 2024 23:51:07 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> Message-ID: On Sun, 27 Oct 2024 01:22:13 GMT, Jatin Bhateja wrote: >> I believe the matcher only needs the exact type of the node but not its inputs. E.g. it should not be an issue if we `AddVB` a `vector` and a `vector` into a `vector`. > > Generic vector operand resolution cocretizes generic operands based on type agnostic node size, its a post matcher pass, and its job is to replace generic MachOper operand nodes with cocrete ones (vec[SDXYZ]) which holds precise register mask needed by register allocator. @jaskarth I notice we have `process_late_inline_calls_no_inline` below, please put lowering after it because `process_late_inline_calls_no_inline` does do igvn. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1819902884 From amitkumar at openjdk.org Tue Oct 29 04:25:09 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 29 Oct 2024 04:25:09 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes [v3] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 10:36:17 GMT, Martin Doerr wrote: > I suggest testing it by setting large_offset to true. I ran tier1 test again with suggested settings and didn't see any regression. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21703#issuecomment-2443172807 From amitkumar at openjdk.org Tue Oct 29 04:25:10 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 29 Oct 2024 04:25:10 GMT Subject: RFR: 8342962: [s390x] TestOSRLotsOfLocals.java crashes [v3] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 16:16:42 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> more comments from lutz > > Wonderful! Thanks @RealLucy, @TheRealMDoerr for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21703#issuecomment-2443173214 From amitkumar at openjdk.org Tue Oct 29 04:25:11 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 29 Oct 2024 04:25:11 GMT Subject: Integrated: 8342962: [s390x] TestOSRLotsOfLocals.java crashes In-Reply-To: References: Message-ID: <9EdTaNqhkGCsEi1lZ5J5YH39TF9Lo-f2zZL43FMfza0=.97000078-6061-4d77-acc1-48fab1b77751@github.com> On Fri, 25 Oct 2024 06:28:02 GMT, Amit Kumar wrote: > We are on thin ice with the `TestOSRLotsOfLocals` test. Before it breaks, I'd like to provide the fix. :-) > > Testing : Tier1 test with fastdebug vm. This pull request has now been integrated. Changeset: 54327bc4 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/54327bc4e38773b7461977ce17f2185c068bce9b Stats: 17 lines in 1 file changed: 14 ins; 0 del; 3 mod 8342962: [s390x] TestOSRLotsOfLocals.java crashes Reviewed-by: lucy, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/21703 From chagedorn at openjdk.org Tue Oct 29 07:38:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Oct 2024 07:38:05 GMT Subject: RFR: 8343137: C2: VerifyLoopOptimizations fails with "Was reachable in only one" In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 12:41:02 GMT, Christian Hagedorn wrote: > [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) introduced a new usage of `igvn.intcon()` but missed to call `set_ctrl()`. This simple patch fixes this. > > While working on this, I've noticed that a few Assertion Predicates tests ended up in the `predicates` directory while they would have better been placed into `predicates/assertion`. It's probably not worth to file a separate issue just for that, so I've squeezed this trivial move into this fix. Note that `AssertionPredicateDoesntConstantFold` does not define a package and thus does not need an update to the command line. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21739#issuecomment-2443459675 From stuefe at openjdk.org Tue Oct 29 07:55:08 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 29 Oct 2024 07:55:08 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v8] In-Reply-To: References: Message-ID: <3mV4LH3bntqMZlMJyu6RQM5e5S1Ev4TT8nnzvidNJxQ=.4785612e-6cef-4a3a-b33e-5b044b7b90e2@github.com> On Mon, 28 Oct 2024 17:02:38 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add sanity check: fpregs should point into the context. Okay! ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21615#pullrequestreview-2401025928 From chagedorn at openjdk.org Tue Oct 29 08:10:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Oct 2024 08:10:17 GMT Subject: Integrated: 8343137: C2: VerifyLoopOptimizations fails with "Was reachable in only one" In-Reply-To: References: Message-ID: <7Nml7DMUWhqQw7fwDDN45Yn5eUlXkrlsiCaceI_TsXI=.526df7e9-8398-4e9b-8a0c-4b10531e3992@github.com> On Mon, 28 Oct 2024 12:41:02 GMT, Christian Hagedorn wrote: > [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) introduced a new usage of `igvn.intcon()` but missed to call `set_ctrl()`. This simple patch fixes this. > > While working on this, I've noticed that a few Assertion Predicates tests ended up in the `predicates` directory while they would have better been placed into `predicates/assertion`. It's probably not worth to file a separate issue just for that, so I've squeezed this trivial move into this fix. Note that `AssertionPredicateDoesntConstantFold` does not define a package and thus does not need an update to the command line. > > Thanks, > Christian This pull request has now been integrated. Changeset: e389f82b Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/e389f82b1b2365a43fef744936b222328d71494b Stats: 234 lines in 6 files changed: 145 ins; 85 del; 4 mod 8343137: C2: VerifyLoopOptimizations fails with "Was reachable in only one" Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21739 From thartmann at openjdk.org Tue Oct 29 08:14:07 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 29 Oct 2024 08:14:07 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied [v2] In-Reply-To: References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: On Fri, 25 Oct 2024 15:09:50 GMT, Roland Westrelin wrote: >> The transformation: >> >> >> (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) >> >> >> when i fits in an int is not always applied: when the type of `i` is >> narrowed so it fits in an int, the `CastX2P` is not enqueued for >> igvn. This can get in the way of vectorization as shown by test case >> `test2`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > fix test Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21714#pullrequestreview-2401062941 From mbaesken at openjdk.org Tue Oct 29 08:22:09 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 29 Oct 2024 08:22:09 GMT Subject: RFR: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' [v3] In-Reply-To: <-ndagm4ibk40x3EzTHkxJ1Pm2mtbLTu9oa-fQTwDCd8=.ccf7187d-c757-41e0-b0d6-73509d8ffac0@github.com> References: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> <-ndagm4ibk40x3EzTHkxJ1Pm2mtbLTu9oa-fQTwDCd8=.ccf7187d-c757-41e0-b0d6-73509d8ffac0@github.com> Message-ID: On Mon, 28 Oct 2024 15:02:20 GMT, Matthias Baesken wrote: >> When running with ubsanized binaries on Linux x86_64, >> hs jtreg test compiler/startup/StartupOutput.java >> showed this issue >> >> jdk/src/hotspot/share/ci/ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' >> #0 0x7fcea0810117 in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1614 >> #1 0x7fcea3123577 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 >> #2 0x7fcea0c01499 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 >> #3 0x7fcea09e9d85 in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 >> #4 0x7fcea066da1d in Runtime1::generate_blob(BufferBlob*, C1StubId, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:233 >> #5 0x7fcea066dfb0 in Runtime1::generate_blob_for(BufferBlob*, C1StubId) src/hotspot/share/c1/c1_Runtime1.cpp:262 >> #6 0x7fcea066dfb0 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:272 >> #7 0x7fcea03d2be1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 >> #8 0x7fcea03d2be1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 >> #9 0x7fcea0acc0c2 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 >> #10 0x7fcea0ad9a3f in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 >> #11 0x7fcea161264a in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:759 >> #12 0x7fcea2ec739a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:234 >> #13 0x7fcea251e1d2 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 >> #14 0x7fcea7c6c6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 1b515766201d47a183932ba0c8c8bd0d9ee8755b) >> #15 0x7fcea730f58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: 448a3ddd22596e1adb8fb3dec8921ed5b9d54dc2) >> >> So a nullptr check should be better added . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > move check, add assert Hi Vladimir and Martin, thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21684#issuecomment-2443533776 From mbaesken at openjdk.org Tue Oct 29 08:22:10 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 29 Oct 2024 08:22:10 GMT Subject: Integrated: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> References: <765oLXxFYPUnQq8xyATal2oPWTFef26xrbI0-wR7jpU=.1a01b12f-3cde-4349-b21d-d9fec388f459@github.com> Message-ID: On Thu, 24 Oct 2024 14:02:26 GMT, Matthias Baesken wrote: > When running with ubsanized binaries on Linux x86_64, > hs jtreg test compiler/startup/StartupOutput.java > showed this issue > > jdk/src/hotspot/share/ci/ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fcea0810117 in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1614 > #1 0x7fcea3123577 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fcea0c01499 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fcea09e9d85 in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fcea066da1d in Runtime1::generate_blob(BufferBlob*, C1StubId, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:233 > #5 0x7fcea066dfb0 in Runtime1::generate_blob_for(BufferBlob*, C1StubId) src/hotspot/share/c1/c1_Runtime1.cpp:262 > #6 0x7fcea066dfb0 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:272 > #7 0x7fcea03d2be1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fcea03d2be1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fcea0acc0c2 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fcea0ad9a3f in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fcea161264a in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:759 > #12 0x7fcea2ec739a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:234 > #13 0x7fcea251e1d2 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fcea7c6c6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 1b515766201d47a183932ba0c8c8bd0d9ee8755b) > #15 0x7fcea730f58e in clone (/lib64/libc.so.6+0x11858e) (BuildId: 448a3ddd22596e1adb8fb3dec8921ed5b9d54dc2) > > So a nullptr check should be better added . This pull request has now been integrated. Changeset: beff8bfe Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/beff8bfe2a5334823b67cb748bc8652dc6a3f3d4 Stats: 6 lines in 2 files changed: 2 ins; 2 del; 2 mod 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask' Reviewed-by: kvn, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/21684 From duke at openjdk.org Tue Oct 29 09:37:10 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 29 Oct 2024 09:37:10 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:47:47 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with two additional commits since the last revision: > > - num_8b_elems_in_vec --> nof_vec_elems > - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2443701334 From dlunden at openjdk.org Tue Oct 29 10:42:26 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 29 Oct 2024 10:42:26 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v3] In-Reply-To: References: Message-ID: > Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. > > ### Changeset > > - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. > - Add a regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update mask size comment after suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21612/files - new: https://git.openjdk.org/jdk/pull/21612/files/6f3ccc40..873a8ffe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21612&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21612&range=01-02 Stats: 7 lines in 1 file changed: 3 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21612.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21612/head:pull/21612 PR: https://git.openjdk.org/jdk/pull/21612 From dlunden at openjdk.org Tue Oct 29 10:42:26 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 29 Oct 2024 10:42:26 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v3] In-Reply-To: References: <8UbXTjkMPw6xs9rvGtJukgbq8dOM4q-ogBLXlA7sdhA=.41db68e4-1f59-47e2-b0eb-6caa9a7a7d7b@github.com> Message-ID: On Tue, 22 Oct 2024 14:25:04 GMT, Roberto Casta?eda Lozano wrote: >> I'm fine with this change, although I'd then argue that we could simplify it and just say we add 4 words (instead of 3) for incoming/outgoing arguments. @vnkozlov You suggested the current wording that specifically mentions APX, what do you think? > >> we could simplify it and just say we add 4 words (instead of 3) for incoming/outgoing arguments > > That would work for me too. Thanks @robcasloz and @vnkozlov, now updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1820550156 From jbhateja at openjdk.org Tue Oct 29 11:34:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 29 Oct 2024 11:34:09 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: <-J1c3-4NdfyIyOpkRDJBtw1qzR375U56X7TCvvrb4u0=.c1fd68a5-b64f-4c46-a8ec-93530c16ba97@github.com> Message-ID: On Mon, 28 Oct 2024 23:48:48 GMT, Quan Anh Mai wrote: >> Generic vector operand resolution cocretizes generic operands based on type agnostic node size, its a post matcher pass, and its job is to replace generic MachOper operand nodes with cocrete ones (vec[SDXYZ]) which holds precise register mask needed by register allocator. > > @jaskarth I notice we have `process_late_inline_calls_no_inline` below, please put lowering after it because `process_late_inline_calls_no_inline` does do igvn. Invariants in the lowered graph will get scheduled out of the loop during GCM, but lowering may still impact the unrolling policy which is based on the [loop body size. ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L1024) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1820624424 From mbaesken at openjdk.org Tue Oct 29 13:05:09 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 29 Oct 2024 13:05:09 GMT Subject: RFR: 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 10:09:13 GMT, Amit Kumar wrote: > Adjust TestVectorizationMismatchedAccess.java for Big Endian Platforms Marked as reviewed by mbaesken (Reviewer). Fixes the error on AIX in our central tests. ------------- PR Review: https://git.openjdk.org/jdk/pull/21736#pullrequestreview-2401802061 PR Comment: https://git.openjdk.org/jdk/pull/21736#issuecomment-2444159521 From amitkumar at openjdk.org Tue Oct 29 13:27:10 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 29 Oct 2024 13:27:10 GMT Subject: RFR: 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms In-Reply-To: <_NrQsE0AEcizkZlXbN3Dgjfc2iHx6bYBcGzS_22abp8=.92b49b23-9b51-4ce3-85a2-6ea99b4e2d3a@github.com> References: <_NrQsE0AEcizkZlXbN3Dgjfc2iHx6bYBcGzS_22abp8=.92b49b23-9b51-4ce3-85a2-6ea99b4e2d3a@github.com> Message-ID: On Mon, 28 Oct 2024 14:45:44 GMT, Emanuel Peter wrote: >> Adjust TestVectorizationMismatchedAccess.java for Big Endian Platforms > > Ah yes, you need to undo this [PR's](https://github.com/openjdk/jdk/pull/21708) change, as asked for above. @eme64, @rwestrel can I get one more approval ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21736#issuecomment-2444222487 From epeter at openjdk.org Tue Oct 29 13:31:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 13:31:13 GMT Subject: RFR: 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 10:09:13 GMT, Amit Kumar wrote: > Adjust TestVectorizationMismatchedAccess.java for Big Endian Platforms Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21736#pullrequestreview-2401892496 From mdoerr at openjdk.org Tue Oct 29 13:34:12 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 29 Oct 2024 13:34:12 GMT Subject: RFR: 8342607: Enhance register printing on x86_64 platforms [v8] In-Reply-To: References: Message-ID: <0yGhCkU4tn6Kl-o7yY3Cwst7sl4kEeXd-4cl3hcT8hE=.b1b8a09e-aeb7-4a06-a8eb-c9a48d7f8b67@github.com> On Mon, 28 Oct 2024 17:02:38 GMT, Martin Doerr wrote: >> There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. >> I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) >> >> Example output (linux): >> >> Registers: >> RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 >> RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 >> R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 >> R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 >> RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 >> TRAPNO=0x000000000000000e >> >> XMM[0]=0x0000000000000000 0x0000000000000000 >> XMM[1]=0x00007fea3c034200 0x0000000000000000 >> XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 >> XMM[3]=0x00007fea7c3d6608 0x0000000000000000 >> XMM[4]=0x00007f0000000000 0x0000000000000000 >> XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff >> XMM[6]=0x0000000000000000 0x00007fea897d0f98 >> XMM[7]=0x0202020202020202 0x0000000000000000 >> XMM[8]=0x0000000000000000 0x0202020202020202 >> XMM[9]=0x666e69206e6f6974 0x0000000000000000 >> XMM[10]=0x0000000000000000 0x6e6f6974616d726f >> XMM[11]=0x0000000000000001 0x0000000000000000 >> XMM[12]=0x00007fea8b684400 0x0000000000000001 >> XMM[13]=0x0000000000000000 0x0000000000000000 >> XMM[14]=0x0000000000000000 0x0000000000000000 >> XMM[15]=0x0000000000000000 0x0000000000000000 >> MXCSR=0x0000037f > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add sanity check: fpregs should point into the context. Thanks again for the reviews! I didn't expect that printing a couple of register could be so difficult :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21615#issuecomment-2444237906 From mdoerr at openjdk.org Tue Oct 29 13:34:14 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 29 Oct 2024 13:34:14 GMT Subject: Integrated: 8342607: Enhance register printing on x86_64 platforms In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 14:29:20 GMT, Martin Doerr wrote: > There are some situations in which the XMM registers are relevant to understand errors. E.g. C2 compiler uses them to spill GPR values (see https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/cpu/x86/x86_64.ad#L1312), so they may contain Oops etc. We may consider searching and printing the content for Oops in a future RFE. > I've implemented [JDK-8342607](https://bugs.openjdk.org/browse/JDK-8342607) such that linux and Windows show the same output format. (Skipped Intel-Mac because Apple has stopped shipping that platform. I don't have it and I'm not familiar with it.) > > Example output (linux): > > Registers: > RAX=0x00007fea8bdb3000, RBX=0x00007fea8b48d5d4, RCX=0x00007fea8b4d2255, RDX=0x0000000000000340 > RSP=0x00007fea897d0b60, RBP=0x00007fea897d0b90, RSI=0x00007fea8b5f1448, RDI=0x00000000e0000000 > R8 =0x00007fea8b48d5d4, R9 =0x0000000000000006, R10=0x00007fea8bb4b500, R11=0x00007fea7cc2f120 > R12=0x0000000000000000, R13=0x00007fea897d0bc0, R14=0x00007fea897d0c50, R15=0x00007fea8402c9c0 > RIP=0x00007fea8ac008e5, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 > TRAPNO=0x000000000000000e > > XMM[0]=0x0000000000000000 0x0000000000000000 > XMM[1]=0x00007fea3c034200 0x0000000000000000 > XMM[2]=0x00000000fffffffe 0x00007fea8402c9c0 > XMM[3]=0x00007fea7c3d6608 0x0000000000000000 > XMM[4]=0x00007f0000000000 0x0000000000000000 > XMM[5]=0x00007fea897d0fe8 0x00007feaffffffff > XMM[6]=0x0000000000000000 0x00007fea897d0f98 > XMM[7]=0x0202020202020202 0x0000000000000000 > XMM[8]=0x0000000000000000 0x0202020202020202 > XMM[9]=0x666e69206e6f6974 0x0000000000000000 > XMM[10]=0x0000000000000000 0x6e6f6974616d726f > XMM[11]=0x0000000000000001 0x0000000000000000 > XMM[12]=0x00007fea8b684400 0x0000000000000001 > XMM[13]=0x0000000000000000 0x0000000000000000 > XMM[14]=0x0000000000000000 0x0000000000000000 > XMM[15]=0x0000000000000000 0x0000000000000000 > MXCSR=0x0000037f This pull request has now been integrated. Changeset: d8b3685d Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/d8b3685d36873904248e9701f66459e074a4a8ab Stats: 37 lines in 3 files changed: 36 ins; 0 del; 1 mod 8342607: Enhance register printing on x86_64 platforms Co-authored-by: Richard Reingruber Reviewed-by: rrich, stuefe, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/21615 From epeter at openjdk.org Tue Oct 29 13:48:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 13:48:27 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v7] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 82 commits: - manual merge with master - changes to NoOverflowInt for Dean - rm dead assert - updates for Vladimir - some unsafe and native benchmarks added - more examples and comments for Vladimir - Merge branch 'master' into JDK-8335392-MemPointer - Merge branch 'master' into JDK-8335392-MemPointer - fix build and test - add precompiled.hpp to gtest - ... and 72 more: https://git.openjdk.org/jdk/compare/d8b3685d...8f58e889 ------------- Changes: https://git.openjdk.org/jdk/pull/19970/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=06 Stats: 2592 lines in 16 files changed: 2327 ins; 213 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From chagedorn at openjdk.org Tue Oct 29 13:59:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Oct 2024 13:59:21 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v7] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 29 Oct 2024 13:48:27 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 82 commits: > > - manual merge with master > - changes to NoOverflowInt for Dean > - rm dead assert > - updates for Vladimir > - some unsafe and native benchmarks added > - more examples and comments for Vladimir > - Merge branch 'master' into JDK-8335392-MemPointer > - Merge branch 'master' into JDK-8335392-MemPointer > - fix build and test > - add precompiled.hpp to gtest > - ... and 72 more: https://git.openjdk.org/jdk/compare/d8b3685d...8f58e889 Nice work! I have a first round of comments - mostly minor things. So far, it looks good. Will pick this up again tomorrow. ------------- PR Review: https://git.openjdk.org/jdk/pull/19970#pullrequestreview-2399057865 From chagedorn at openjdk.org Tue Oct 29 13:59:31 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Oct 2024 13:59:31 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 22 Oct 2024 07:19:54 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > changes to NoOverflowInt for Dean src/hotspot/share/opto/memnode.cpp line 2781: > 2779: private: > 2780: PhaseGVN* _phase; > 2781: StoreNode* _store; Was like that before but maybe you can make them `const` with this change: Suggestion: PhaseGVN* const _phase; StoreNode* const _store; src/hotspot/share/opto/memnode.cpp line 2881: > 2879: } > 2880: > 2881: NOT_PRODUCT( if(is_trace_basic()) { tty->print("[TraceMergeStores] MergePrimitiveStores::run: "); _store->dump(); }) Suggestion: NOT_PRODUCT( if (is_trace_basic()) { tty->print("[TraceMergeStores] MergePrimitiveStores::run: "); _store->dump(); }) src/hotspot/share/opto/memnode.cpp line 2886: > 2884: // then that use or a store further down is the "last" store. > 2885: Status status_use = find_adjacent_use_store(_store); > 2886: NOT_PRODUCT( if(is_trace_basic()) { tty->print("[TraceMergeStores] expect no use: "); status_use.print_on(tty); }) Suggestion: NOT_PRODUCT( if (is_trace_basic()) { tty->print("[TraceMergeStores] expect no use: "); status_use.print_on(tty); }) src/hotspot/share/opto/memnode.cpp line 2893: > 2891: // Check if we can merge with at least one def, so that we have at least 2 stores to merge. > 2892: Status status_def = find_adjacent_def_store(_store); > 2893: NOT_PRODUCT( if(is_trace_basic()) { tty->print("[TraceMergeStores] expect def: "); status_def.print_on(tty); }) Suggestion: NOT_PRODUCT( if (is_trace_basic()) { tty->print("[TraceMergeStores] expect def: "); status_def.print_on(tty); }) src/hotspot/share/opto/memnode.cpp line 2907: > 2905: StoreNode* merged_store = make_merged_store(merge_list, merged_input_value); > 2906: > 2907: NOT_PRODUCT( if(is_trace_success()) { trace(merge_list, merged_input_value, merged_store); } ) Suggestion: NOT_PRODUCT( if (is_trace_success()) { trace(merge_list, merged_input_value, merged_store); } ) src/hotspot/share/opto/memnode.cpp line 3143: > 3141: while (current != nullptr && merge_list.size() < merge_list_max_size) { > 3142: Status status = find_adjacent_def_store(current); > 3143: NOT_PRODUCT( if(is_trace_basic()) { tty->print("[TraceMergeStores] find def: "); status.print_on(tty); }) Suggestion: NOT_PRODUCT( if (is_trace_basic()) { tty->print("[TraceMergeStores] find def: "); status.print_on(tty); }) src/hotspot/share/opto/memnode.cpp line 3151: > 3149: // We can have at most one RangeCheck. > 3150: if (status.found_range_check()) { > 3151: NOT_PRODUCT( if(is_trace_basic()) { tty->print_cr("[TraceMergeStores] found RangeCheck, stop traversal."); }) Suggestion: NOT_PRODUCT( if (is_trace_basic()) { tty->print_cr("[TraceMergeStores] found RangeCheck, stop traversal."); }) src/hotspot/share/opto/memnode.cpp line 3157: > 3155: } > 3156: > 3157: NOT_PRODUCT( if(is_trace_basic()) { tty->print_cr("[TraceMergeStores] found:"); merge_list.dump(); }) Suggestion: NOT_PRODUCT( if (is_trace_basic()) { tty->print_cr("[TraceMergeStores] found:"); merge_list.dump(); }) src/hotspot/share/opto/memnode.cpp line 3164: > 3162: while (merge_list.size() > pow2size) { merge_list.pop(); } > 3163: > 3164: NOT_PRODUCT( if(is_trace_basic()) { tty->print_cr("[TraceMergeStores] truncated:"); merge_list.dump(); }) Suggestion: NOT_PRODUCT( if (is_trace_basic()) { tty->print_cr("[TraceMergeStores] truncated:"); merge_list.dump(); }) src/hotspot/share/opto/mempointer.hpp line 49: > 47: // > 48: // > 49: // Example1: byte array access: I suggest to add spaces for the example numbers, same below. Suggestion: // Example 1: byte array access: src/hotspot/share/opto/mempointer.hpp line 64: > 62: // > 63: // pointer = array_base + ARRAY_INT_BASE_OFFSET + 4 * 5 + 4 * j + 4 * 3 * j > 64: // = 1 * array_base + ARRAY_INT_BASE_OFFSET + 20 + 4 * j + 12 * j Suggestion: // pointer = array_base + ARRAY_INT_BASE_OFFSET + 4 * 5 + 4 * i + 4 * 3 * j // = 1 * array_base + ARRAY_INT_BASE_OFFSET + 20 + 4 * i + 12 * j src/hotspot/share/opto/mempointer.hpp line 75: > 73: // pointer = array_base + ARRAY_INT_BASE_OFFSET + 4 * i > 74: // = 1 * array_base + ARRAY_INT_BASE_OFFSET + 4 * i > 75: // = scale_0 * variable_0 + con + scale_1 * variable_1 Not sure if it was intentionally or just forgotten to add but I found the dashes useful. Maybe you want to add them here as well as for the other examples: Suggestion: // = 1 * array_base + ARRAY_INT_BASE_OFFSET + 4 * i // -------------------- --------------------- -------------------- // = scale_0 * variable_0 + con + scale_1 * variable_1 src/hotspot/share/opto/mempointer.hpp line 85: > 83: // pointer = address + 4 * i > 84: // = 1 * address + 0 + 4 * i > 85: // = scale_0 * variable_0 + con + scale_1 * variable_1 Same here: Suggestion: // = 1 * address + 0 + 4 * i // -------------------- --- -------------------- // = scale_0 * variable_0 + con + scale_1 * variable_1 src/hotspot/share/opto/mempointer.hpp line 110: > 108: // > 109: // pointer = ms.heapBase() + ms.address() + i > 110: // = 0 + 1 * ms.address() + 1 * i For variation, do you want to change this to `short` to also have an example with a `scale` other than 1 for `MemorySegment`? src/hotspot/share/opto/mempointer.hpp line 120: > 118: // > 119: // pointer = array_base + ARRAY_INT_BASE_OFFSET + 4 * 5 + 4 * j + 4 * j * k > 120: // = 1 * array_base + ARRAY_INT_BASE_OFFSET + 20 + 4 * j + 4 * j * k Suggestion: // pointer = array_base + ARRAY_INT_BASE_OFFSET + 4 * 5 + 4 * i + 4 * j * k // = 1 * array_base + ARRAY_INT_BASE_OFFSET + 20 + 4 * i + 4 * j * k src/hotspot/share/opto/mempointer.hpp line 143: > 141: // > 142: // MemPointerDecomposedForm: > 143: // When the pointer is parsed, it is decomposed into sum of summands plus a constant: Suggestion: // When the pointer is parsed, it is decomposed into a sum of summands plus a constant: src/hotspot/share/opto/mempointer.hpp line 153: > 151: // Hence, the full decomposed form is: > 152: // > 153: // pointer = sum_i(scale_i * variable_i) + con `sum_i` is a little bit confusing as it suggests to be another variable. What about the following? Suggestion: // When the pointer is parsed, it is decomposed into a SUM of summands plus a constant: // // pointer = SUM(summands) + con // // Where each summand_i in summands has the form: // // summand_i = scale_i * variable_i // // Hence, the full decomposed form is: // // pointer = SUM(scale_i * variable_i) + con src/hotspot/share/opto/mempointer.hpp line 157: > 155: // Note: the scale_i are compile-time constants (NoOverflowInt), and the variable_i are > 156: // compile-time variables (C2 nodes). > 157: // On 64bit systems, this decomposed form is computed with long-add/mul, on 32bit systems Suggestion: // On 64-bit systems, this decomposed form is computed with long-add/mul, on 32-bit systems src/hotspot/share/opto/mempointer.hpp line 165: > 163: // > 164: // pointer1 = sum(summands) + con1 > 165: // pointer2 = sum(summands) + con2 To match my comment above: Suggestion: // pointer1 = SUM(summands) + con1 // pointer2 = SUM(summands) + con2 src/hotspot/share/opto/mempointer.hpp line 185: > 183: // > 184: // At first, computing aliasing is difficult because the distance is hidden inside the > 185: // ConvI2L. we can convert this (with array_int_base_offset = 16) into these decomposed forms: As discussed offline: Suggestion: // At first, computing the aliasing is not immediately straight-forward in the general case because // the distance is hidden inside the ConvI2L. We can convert this (with array_int_base_offset = 16) // into these decomposed forms: src/hotspot/share/opto/mempointer.hpp line 200: > 198: // ----------------------------------------------------------------------------------------- > 199: // > 200: // We have to be careful on 64bit systems with ConvI2L: decomposing its input is not Suggestion: // We have to be careful on 64-bit systems with ConvI2L: decomposing its input is not src/hotspot/share/opto/mempointer.hpp line 225: > 223: // Resulting in: +-------------------------+ > 224: // mp_{i+1} = con + dec_con + sum(dec_summands) + sum(other_summands) > 225: // = new_con + sum(new_summands) Suggestion: // mp_i = con + summand + SUM(other_summands) // Resulting in: +-------------------------+ // mp_{i+1} = con + dec_con + SUM(dec_summands) + SUM(other_summands) // = new_con + SUM(new_summands) src/hotspot/share/opto/mempointer.hpp line 250: > 248: // S3) All summands of mp1 and mp2 are identical. > 249: // > 250: // Then the ponter difference between p1 and p2 is identical to the difference between Suggestion: // Then the pointer difference between p1 and p2 is identical to the difference between src/hotspot/share/opto/mempointer.hpp line 267: > 265: // Case 1: only decompositions of type (SAFE1) were used: > 266: // We make an induction proof over the decompositions from p1 to mp1, starting with > 267: // the trivial decompoisition: Suggestion: // the trivial decompoisition: Suggestion: // the trivial decomposition: src/hotspot/share/opto/mempointer.hpp line 302: > 300: // > 301: // And hence, there must be an x, such that: > 302: // p1 - p2 = mp1 - mp2 + x * array_element_size_in_bytes * 2^32 Maybe for completeness: Suggestion: // And hence, there must be an x, such that: // p1 - p2 = mp1 - mp2 + x * array_element_size_in_bytes * 2^32 // where // x = x1 - x2 src/hotspot/share/opto/mempointer.hpp line 314: > 312: // -- apply S2 and S3 -- > 313: // > array_element_size_in_bytes * 2^32 - 2^31 > 314: // >= array_element_size_in_bytes * 2^31 Should be obvious, be for completeness, maybe add: Suggestion: // > array_element_size_in_bytes * 2^32 - 2^31 // -- apply array_element_size_in_bytes > 0 -- // >= array_element_size_in_bytes * 2^31 src/hotspot/share/opto/mempointer.hpp line 365: > 363: { > 364: const jint max_distance = 1 << 30; > 365: assert(_distance < max_distance && _distance > -max_distance, "safe distance"); The variable name "max_distance" suggests that the assert should use `>=` and `<=`. Would that still be correct? Maybe you should add a comment about the max distance and why it has this value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819274089 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820457663 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820457902 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820458057 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820458329 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820459729 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820459884 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820460140 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820460278 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819047762 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819046083 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819052480 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819052958 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819061173 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819062236 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819063570 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819088073 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819089446 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819091390 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819125396 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819109444 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819131087 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819142055 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819152480 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819206340 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819218044 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1819224524 From epeter at openjdk.org Tue Oct 29 14:17:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 14:17:37 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 28 Oct 2024 13:31:24 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> changes to NoOverflowInt for Dean > > src/hotspot/share/opto/mempointer.hpp line 143: > >> 141: // >> 142: // MemPointerDecomposedForm: >> 143: // When the pointer is parsed, it is decomposed into sum of summands plus a constant: > > Suggestion: > > // When the pointer is parsed, it is decomposed into a sum of summands plus a constant: obsolete after your other suggestion ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820891420 From epeter at openjdk.org Tue Oct 29 14:17:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 14:17:37 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v8] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with four additional commits since the last revision: - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/8f58e889..c52c5b60 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=06-07 Stats: 35 lines in 2 files changed: 6 ins; 0 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Tue Oct 29 14:31:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 14:31:37 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v9] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more updates for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/c52c5b60..46bcc48a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=07-08 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Tue Oct 29 14:31:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 14:31:37 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 28 Oct 2024 13:29:51 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> changes to NoOverflowInt for Dean > > src/hotspot/share/opto/mempointer.hpp line 110: > >> 108: // >> 109: // pointer = ms.heapBase() + ms.address() + i >> 110: // = 0 + 1 * ms.address() + 1 * i > > For variation, do you want to change this to `short` to also have an example with a `scale` other than 1 for `MemorySegment`? good idea! > src/hotspot/share/opto/mempointer.hpp line 365: > >> 363: { >> 364: const jint max_distance = 1 << 30; >> 365: assert(_distance < max_distance && _distance > -max_distance, "safe distance"); > > The variable name "max_distance" suggests that the assert should use `>=` and `<=`. Would that still be correct? Maybe you should add a comment about the max distance and why it has this value. Good point. I'll try to remember the reason and add better comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820915837 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1820918265 From mdoerr at openjdk.org Tue Oct 29 15:44:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 29 Oct 2024 15:44:06 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Mon, 14 Oct 2024 12:20:45 GMT, Varada M wrote: > Load and store assembly instructions which takes Address object as argument. > > Tier 1 testing successful on linux-ppc64le and aix-ppc (fastdebug level) > > JBS : [JDK-8331861](https://bugs.openjdk.org/browse/JDK-8331861) LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21492#pullrequestreview-2402357836 From epeter at openjdk.org Tue Oct 29 16:09:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 16:09:37 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v10] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/46bcc48a..51381eb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From kvn at openjdk.org Tue Oct 29 16:13:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Oct 2024 16:13:15 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v5] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Mon, 28 Oct 2024 14:42:34 GMT, Christian Hagedorn wrote: >> #### Replacing the Remaining Predicate Walking and Cloning Code >> In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. >> >> #### Single Template Assertion Predicate Check >> This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). >> >> #### Common Refactorings for all the Patches in this Series >> In each of the patch, I will do similar refactoring ideas: >> - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. >> - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. >> - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. >> - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). >> - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. >> >> #### Refactorings of this Patch >> This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342... > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Fix indentation > - Fix build Seems reasonable. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21679#pullrequestreview-2402442160 From kvn at openjdk.org Tue Oct 29 16:14:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Oct 2024 16:14:10 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v3] In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 10:42:26 GMT, Daniel Lund?n wrote: >> Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. >> >> ### Changeset >> >> - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. >> - Add a regression test. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update mask size comment after suggestions Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21612#pullrequestreview-2402445104 From kvn at openjdk.org Tue Oct 29 16:15:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Oct 2024 16:15:11 GMT Subject: RFR: 8342540: InterfaceCalls micro-benchmark gives misleading results [v2] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 16:18:02 GMT, Andrew Haley wrote: >> `InterfaceCalls.java` makes highly predictable memory accesses, which leads to a gross time underestimate of the case where a megamorphic access is unpredictable. >> >> Here's one example, with and without randomization. The unpredictable megamorphic call takes more than 4* as long as the benchmark. >> >> >> Benchmark (randomized) Mode Cnt Score Error Units >> InterfaceCalls.test2ndInt3Types false avgt 4 5.013 ? 0.081 ns/op >> InterfaceCalls.test2ndInt3Types true avgt 4 23.421 ? 0.102 ns/op >> ``` >> >> This patch adds the "randomized" parameter, which allows the measurement of predictable and unpredictable megamorphic calls. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Update test/micro/org/openjdk/bench/vm/compiler/InterfaceCalls.java > > Co-authored-by: Aleksey Shipil?v Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21581#pullrequestreview-2402447482 From syan at openjdk.org Tue Oct 29 17:04:37 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 29 Oct 2024 17:04:37 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' Message-ID: Hi all, On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). The both PRs add the same C++ function `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. I think the newly added function can be merged to one. I choose the one added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. Additonal testing: - [x] linux x64 build with release/fastdebug/slowdebug configure - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build ------------- Commit messages: - 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' Changes: https://git.openjdk.org/jdk/pull/21768/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21768&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343211 Stats: 13 lines in 2 files changed: 0 ins; 12 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21768.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21768/head:pull/21768 PR: https://git.openjdk.org/jdk/pull/21768 From duke at openjdk.org Tue Oct 29 17:23:06 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 29 Oct 2024 17:23:06 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 16:59:12 GMT, SendaoYan wrote: > Hi all, > On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). > The both PRs add the same C++ function implementation `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. > I think the newly added functions can be merged to one. > I choose to delete the implementation added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. > > Additonal testing: > > - [x] linux x64 build with release/fastdebug/slowdebug configure > - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build > > The releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) shows below: > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java LGTM Thanks ------------- Marked as reviewed by vpaprotsk at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/21768#pullrequestreview-2402645352 From sparasa at openjdk.org Tue Oct 29 17:24:46 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 29 Oct 2024 17:24:46 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support Message-ID: The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) ------------- Commit messages: - JDK-8343214: Fix encoding errors in APX New Data Destination Instructions Support Changes: https://git.openjdk.org/jdk/pull/21770/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343214 Stats: 36 lines in 1 file changed: 2 ins; 1 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/21770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21770/head:pull/21770 PR: https://git.openjdk.org/jdk/pull/21770 From sviswanathan at openjdk.org Tue Oct 29 17:29:11 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 29 Oct 2024 17:29:11 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 16:59:12 GMT, SendaoYan wrote: > Hi all, > On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). > The both PRs add the same C++ function implementation `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. > I think the newly added functions can be merged to one. > I choose to delete the implementation added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. > > Additonal testing: > > - [x] linux x64 build with release/fastdebug/slowdebug configure > - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build > > The releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) shows below: > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21768#pullrequestreview-2402658374 From thartmann at openjdk.org Tue Oct 29 17:32:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 29 Oct 2024 17:32:03 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 16:59:12 GMT, SendaoYan wrote: > Hi all, > On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). > The both PRs add the same C++ function implementation `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. > I think the newly added functions can be merged to one. > I choose to delete the implementation added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. > > Additonal testing: > > - [x] linux x64 build with release/fastdebug/slowdebug configure > - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build > > The releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) shows below: > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21768#pullrequestreview-2402666273 From kvn at openjdk.org Tue Oct 29 17:40:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Oct 2024 17:40:12 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 16:59:12 GMT, SendaoYan wrote: > Hi all, > On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). > The both PRs add the same C++ function implementation `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. > I think the newly added functions can be merged to one. > I choose to delete the implementation added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. > > Additonal testing: > > - [x] linux x64 build with release/fastdebug/slowdebug configure > - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build > > The releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) shows below: > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java So we had declaration in .hpp file for long time but not implementation? @jatin-bhateja which version should be chosen? test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java line 2: > 1: /* > 2: * Copyright (c) 2014, 2024, Oracle and/or its affiliates. All rights reserved. Did we miss this year update in previous PRs? ------------- PR Review: https://git.openjdk.org/jdk/pull/21768#pullrequestreview-2402675206 PR Comment: https://git.openjdk.org/jdk/pull/21768#issuecomment-2444936070 PR Review Comment: https://git.openjdk.org/jdk/pull/21768#discussion_r1821282163 From syan at openjdk.org Tue Oct 29 17:40:12 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 29 Oct 2024 17:40:12 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: <7B7MMHwfwAfOtB7M4UMRx31YG_WU2S5BbLsKS9KMhh0=.b0bae427-aefd-413d-a189-b2ce147ae76e@github.com> On Tue, 29 Oct 2024 16:59:12 GMT, SendaoYan wrote: > Hi all, > On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). > The both PRs add the same C++ function implementation `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. > I think the newly added functions can be merged to one. > I choose to delete the implementation added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. > > Additonal testing: > > - [x] linux x64 build with release/fastdebug/slowdebug configure > - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build > > The releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) shows below: > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21768#issuecomment-2444927690 From kvn at openjdk.org Tue Oct 29 17:40:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Oct 2024 17:40:13 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 17:36:13 GMT, Vladimir Kozlov wrote: > @jatin-bhateja which version should be chosen? My concern is about different assert's condition. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21768#issuecomment-2444939000 From syan at openjdk.org Tue Oct 29 17:40:14 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 29 Oct 2024 17:40:14 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: <-zaUsqi5XijF6n-YowNGDR_pk22SlMSblzDAZ8fIWV4=.6dc2da76-1080-4041-97df-bec8db4d6532@github.com> On Tue, 29 Oct 2024 17:34:09 GMT, Vladimir Kozlov wrote: >> Hi all, >> On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). >> The both PRs add the same C++ function implementation `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. >> I think the newly added functions can be merged to one. >> I choose to delete the implementation added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. >> >> Additonal testing: >> >> - [x] linux x64 build with release/fastdebug/slowdebug configure >> - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build >> >> The releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) shows below: >> >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3Intrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3MultiBlockIntrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java > > test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java line 2: > >> 1: /* >> 2: * Copyright (c) 2014, 2024, Oracle and/or its affiliates. All rights reserved. > > Did we miss this year update in previous PRs? Yes. Miss by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21768#discussion_r1821286770 From epeter at openjdk.org Tue Oct 29 17:45:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 17:45:27 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 29 Oct 2024 14:28:49 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/mempointer.hpp line 365: >> >>> 363: { >>> 364: const jint max_distance = 1 << 30; >>> 365: assert(_distance < max_distance && _distance > -max_distance, "safe distance"); >> >> The variable name "max_distance" suggests that the assert should use `>=` and `<=`. Would that still be correct? Maybe you should add a comment about the max distance and why it has this value. > > Good point. I'll try to remember the reason and add better comments. Woopsies. I think this was a left-over from something earlier. I was able to trigger this assert with this: static long[] arr201 = new long[1 << 28]; public static void test201() { UNSAFE.putByte(arr201, Unsafe.ARRAY_LONG_BASE_OFFSET + (1L << 12), (byte)64); UNSAFE.putByte(arr201, Unsafe.ARRAY_LONG_BASE_OFFSET + (1L << 12) + (1L << 30), (byte)64); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1821292124 From epeter at openjdk.org Tue Oct 29 17:45:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 17:45:27 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 29 Oct 2024 17:41:26 GMT, Emanuel Peter wrote: >> Good point. I'll try to remember the reason and add better comments. > > Woopsies. I think this was a left-over from something earlier. I was able to trigger this assert with this: > > > static long[] arr201 = new long[1 << 28]; > > public static void test201() { > UNSAFE.putByte(arr201, Unsafe.ARRAY_LONG_BASE_OFFSET + (1L << 12), (byte)64); > UNSAFE.putByte(arr201, Unsafe.ARRAY_LONG_BASE_OFFSET + (1L << 12) + (1L << 30), (byte)64); > } Good catch! Thanks for having a close look! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1821292914 From sviswanathan at openjdk.org Tue Oct 29 18:10:11 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 29 Oct 2024 18:10:11 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 17:37:43 GMT, Vladimir Kozlov wrote: > > @jatin-bhateja which version should be chosen? > > My concern is about different assert's condition. Both the asserts are equivalent: Retained: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); Removed: assert(vector_len <= AVX_256bit ? VM_Version::supports_avx512vlbw() : VM_Version::supports_avx512bw(), ""); Please note that supports_avx512vlbw() implies both supports_avx512vl() and supports_avx512bw(). So both require supports_avx512bw(). And for vector_len < AVX_512bit both in addition require supports_avx512vl(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21768#issuecomment-2444998092 From varadam at openjdk.org Tue Oct 29 18:22:17 2024 From: varadam at openjdk.org (Varada M) Date: Tue, 29 Oct 2024 18:22:17 GMT Subject: RFR: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Fri, 25 Oct 2024 10:09:01 GMT, Martin Doerr wrote: >> Idea: Use the `RegisterOrConstant` version. This covers all cases including large offset and index. E.g. >> `inline void stw( Register d, Address &a, Register tmp = noreg);` >> >> inline void Assembler::stw( Register d, Address &a, Register tmp) { >> stw(d, a.index() != noreg ? RegisterOrConstant(a.index()) : RegisterOrConstant(a.disp()), a.base(), tmp); >> } >> >> I'd move them into a separate section. > >> @TheRealMDoerr are floating-point load/store instructions out of scope for this PR? >> >> I see couple of use cases: >> >> ```c++ >> ./c1_LIRAssembler_ppc.cpp:591: __ stfd(rsrc, addr.disp(), addr.base()); >> ./c1_LIRAssembler_ppc.cpp:615: __ stfd(rsrc, addr.disp(), addr.base()); >> ``` > > That could be done, too, but floating point instructions are so rarely used, that we could skip them. Thanks @TheRealMDoerr @offamitkumar ------------- PR Comment: https://git.openjdk.org/jdk/pull/21492#issuecomment-2445022457 From varadam at openjdk.org Tue Oct 29 18:22:18 2024 From: varadam at openjdk.org (Varada M) Date: Tue, 29 Oct 2024 18:22:18 GMT Subject: Integrated: 8331861: [PPC64] Implement load / store assembler functions which take an Address object In-Reply-To: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> References: <8VK00MNPa6iGIkwBH61GNrcpW4dltVsL8kGUU6b1Zr0=.4a44ad0e-7645-45b8-8f95-b17424fdd403@github.com> Message-ID: On Mon, 14 Oct 2024 12:20:45 GMT, Varada M wrote: > Load and store assembly instructions which takes Address object as argument. > > Tier 1 testing successful on linux-ppc64le and aix-ppc (fastdebug level) > > JBS : [JDK-8331861](https://bugs.openjdk.org/browse/JDK-8331861) This pull request has now been integrated. Changeset: 520ddac9 Author: Varada M URL: https://git.openjdk.org/jdk/commit/520ddac97053be669d9678375266ccfd6724e3e1 Stats: 70 lines in 4 files changed: 41 ins; 0 del; 29 mod 8331861: [PPC64] Implement load / store assembler functions which take an Address object Reviewed-by: amitkumar, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/21492 From epeter at openjdk.org Tue Oct 29 18:29:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 18:29:04 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix distance assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/51381eb3..9f442d27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=09-10 Stats: 66 lines in 2 files changed: 64 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Tue Oct 29 18:29:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 18:29:07 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v7] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 29 Oct 2024 13:57:00 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 82 commits: >> >> - manual merge with master >> - changes to NoOverflowInt for Dean >> - rm dead assert >> - updates for Vladimir >> - some unsafe and native benchmarks added >> - more examples and comments for Vladimir >> - Merge branch 'master' into JDK-8335392-MemPointer >> - Merge branch 'master' into JDK-8335392-MemPointer >> - fix build and test >> - add precompiled.hpp to gtest >> - ... and 72 more: https://git.openjdk.org/jdk/compare/d8b3685d...8f58e889 > > Nice work! I have a first round of comments - mostly minor things. So far, it looks good. Will pick this up again tomorrow. @chhagedorn thanks for all your comments. I addressed all now. I had to fix that one assert in `MemPointerAliasing` constructor - it was wrong and I added regression tests for it now ? Looking forward to the next part of your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2445034722 From epeter at openjdk.org Tue Oct 29 18:29:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 29 Oct 2024 18:29:08 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v6] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <7qaz1yLm84eY4tB4Fi_4QrC_GoWnr8BuD4j4W-I3EyQ=.7981e9fc-f1e2-4de0-a3a9-c3a8a0a7521e@github.com> On Tue, 29 Oct 2024 17:42:01 GMT, Emanuel Peter wrote: >> Woopsies. I think this was a left-over from something earlier. I was able to trigger this assert with this: >> >> >> static long[] arr201 = new long[1 << 28]; >> >> public static void test201() { >> UNSAFE.putByte(arr201, Unsafe.ARRAY_LONG_BASE_OFFSET + (1L << 12), (byte)64); >> UNSAFE.putByte(arr201, Unsafe.ARRAY_LONG_BASE_OFFSET + (1L << 12) + (1L << 30), (byte)64); >> } > > Good catch! Thanks for having a close look! I added some regression tests for that case, and fixed the assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1821345558 From shade at openjdk.org Tue Oct 29 18:39:07 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Oct 2024 18:39:07 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 16:59:12 GMT, SendaoYan wrote: > Hi all, > On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). > The both PRs add the same C++ function implementation `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. > I think the newly added functions can be merged to one. > I choose to delete the implementation added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. > > Additonal testing: > > - [x] linux x64 build with release/fastdebug/slowdebug configure > - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build > > The releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) shows below: > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java Let's do it. Current trunk is broken without this fix. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21768#pullrequestreview-2402799641 From psandoz at openjdk.org Tue Oct 29 20:48:27 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 29 Oct 2024 20:48:27 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Sun, 6 Oct 2024 08:32:20 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > [vectorapi] Refactor VectorShuffle implementation This is a nice change. It's as if `VectorShuffle` has a payload of `Vector` of the same shape as the shuffle and where` F` is the bit size equivalent integral type of `E`, and where the lane elements of the vector are constrained to be within `[-VLENGTH, VLENGTH-1]` (I do wonder if we might be able to refactor towards that more explicit representation later on with Valhalla.) That simplifies things and opens up more optimizations and complements the modifications we recently did to `rearrange`/`selectFrom`. (Recommend you do a merge with master to get latest Vector API changes just in case there is some impact.) src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 228: > 226: } > 227: > 228: AbstractVector iota = vspecies().asIntegral().iota(); I suspect the non-power of two code is more efficient. (Even better if the MUL could be transformed to a shift for power of two values.) Separately, it makes me wonder if we should revisit the shuffle factories if it is now much more efficient to construct a shuffle from a vector. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Int256Vector.java line 870: > 868: @Override > 869: public final Int256Shuffle rearrange(VectorShuffle shuffle) { > 870: return (Int256Shuffle) toBitsVector().rearrange(((Int256Shuffle) shuffle) I think the cast is redundant for all vector kinds. Similarly the explicit cast is redundant for the integral vectors, perhaps in the template separate out the expressions to avoid it where not needed? We could also refer to `VSPECIES` directly rather than calling `vspecies()`, same applies in other methods in the concrete vector classes. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Int256Vector.java line 908: > 906: } > 907: > 908: private static boolean indicesInRange(int[] indices) { Since this method is only called from an assert statement in the constructor we could avoid the clever checking that assertions are enabled and the explicit throwing on an AssertionError by using a second expression that produces an error message when the assertion fails : e.g., assert indicesInRange(indices) : outOfBoundsAssertMessage(indices); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/IntVector.java line 2392: > 2390: this, shuffle, null, > 2391: (v1, s_, m_) -> v1.uOp((i, a) -> { > 2392: int ei = Integer.remainderUnsigned(s_.laneSource(i), v1.length()); Note to self - the intrinsic performs the wrapping of shuffle values using bitwise AND. Nice use of method (equiv to `Math.floorMod` for the range on input arguments). src/jdk.incubator.vector/share/classes/jdk/incubator/vector/IntVector.java line 2473: > 2471: final > 2472: VectorShuffle toShuffle(AbstractSpecies dsp, boolean wrap) { > 2473: assert(dsp.elementSize() == vspecies().elementSize()); Even though we force inline I cannot quite decide if it is better to forego the assert since it unduly increases method size. Regardless it may be useful to place the partial wrapping logic in a separate method, given it is less likely to be used. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-VectorBits.java.template line 1150: > 1148: @Override > 1149: @ForceInline > 1150: public void intoArray(int[] a, int offset) { Separately, we might consider optimizing `shuffleFromArray`. ------------- PR Review: https://git.openjdk.org/jdk/pull/21042#pullrequestreview-2402948659 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1821489034 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1821471669 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1821478372 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1821450485 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1821456333 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1821501842 From dholmes at openjdk.org Tue Oct 29 21:57:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Oct 2024 21:57:10 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 16:59:12 GMT, SendaoYan wrote: > Hi all, > On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). > The both PRs add the same C++ function implementation `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. > I think the newly added functions can be merged to one. > I choose to delete the implementation added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. > > Additonal testing: > > - [x] linux x64 build with release/fastdebug/slowdebug configure > - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build > > The releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) shows below: > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java Why has this not been integrated? This is causing carnage in our CI. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21768#issuecomment-2445399911 From kvn at openjdk.org Tue Oct 29 22:45:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Oct 2024 22:45:04 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 21:53:58 GMT, David Holmes wrote: > Why has this not been integrated? This is causing carnage in our CI. Night in his timezone? Should we create duplicated PR with him as co-author? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21768#issuecomment-2445458214 From cslucas at openjdk.org Tue Oct 29 22:50:30 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 29 Oct 2024 22:50:30 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" [v2] In-Reply-To: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> Message-ID: > Please, review this patch to fix an issue that may occur when serializing debug information related to reduce allocation merges. The problem happens when there are more than one JVMS in a `uncommon_trap` and a _younger_ JVMS doesn't have the RAM inputs as a local/expression/monitor but an older JVMS does. In that situation the loop at line 1173 of output.cpp will set the `is_root` property of the ObjectValue to `false` when processing the younger JVMS even though it may have been set to `true` when visiting the older JVMS. > > Tested on: > - Win, Mac & Linux tier1-4 on x64 & Aarch64. > - CTW with some thousands of jars. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address PR feedback: typo on test & refactor in output.cpp. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21624/files - new: https://git.openjdk.org/jdk/pull/21624/files/8e345b59..26b0b869 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21624&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21624&range=00-01 Stats: 18 lines in 2 files changed: 8 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21624.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21624/head:pull/21624 PR: https://git.openjdk.org/jdk/pull/21624 From cslucas at openjdk.org Tue Oct 29 22:50:30 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 29 Oct 2024 22:50:30 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" [v2] In-Reply-To: <-eGvtC-tGwThPGTfRVKEySOMZZRm-E0IJKOBT78Icu4=.b03bb08b-9b37-4127-b9a7-3a8a1bf8c9fc@github.com> References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> <-eGvtC-tGwThPGTfRVKEySOMZZRm-E0IJKOBT78Icu4=.b03bb08b-9b37-4127-b9a7-3a8a1bf8c9fc@github.com> Message-ID: On Fri, 25 Oct 2024 23:39:39 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Address PR feedback: typo on test & refactor in output.cpp. > > src/hotspot/share/opto/output.cpp line 1179: > >> 1177: // the younger JVMS. >> 1178: if (ov->is_root()) { >> 1179: continue; > > You can either fuse `ov->is_root()` check into `is_root` computation (`bool is_root = ov->is_root() || ...`) or turn it into an `if-then-else` (`if (ov->is_root()) { /* comment */ } else { bool is_root = ...; ov->set_root(is_root); }`). I find both cases easier to read. Done. Thanks Vladimir. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21624#discussion_r1821618289 From syan at openjdk.org Tue Oct 29 23:21:16 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 29 Oct 2024 23:21:16 GMT Subject: Integrated: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 16:59:12 GMT, SendaoYan wrote: > Hi all, > On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). > The both PRs add the same C++ function implementation `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. > I think the newly added functions can be merged to one. > I choose to delete the implementation added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. > > Additonal testing: > > - [x] linux x64 build with release/fastdebug/slowdebug configure > - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build > > The releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) shows below: > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java This pull request has now been integrated. Changeset: 40f3d50b Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/40f3d50badc20db5fbfcd485447e634778d03248 Stats: 13 lines in 2 files changed: 0 ins; 12 del; 1 mod 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' Reviewed-by: vpaprotski, sviswanathan, thartmann, shade ------------- PR: https://git.openjdk.org/jdk/pull/21768 From vlivanov at openjdk.org Tue Oct 29 23:24:11 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 29 Oct 2024 23:24:11 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v2] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 01:26:10 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. >> >> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Address some changes from code review I'd like to reiterate that the primary place in C2 designated for platform-specific IR transformations is during matching phase. That's where the vast majority of cases is covered. Everything else are special cases which are handled in different places (mostly due to historic reasons and differing requirements). > It would not be possible without a stretch, consider my example regarding ExtractINode above, ... Matcher exposes a number of constants and queries which are used in shared code to sense platform capabilities (akin to `VM_Version`). > What do you think about keeping the node declaration in shared code but putting the lowering transformations in the backend-specific source files? I definitely prefer shared over platform-specific Ideal node declarations. > We can then use prefixes to denote a node being available on a specific backend only. Ideal-to-Mach IR lowering is already partial as `Matcher::match_rule_support` usages illustrate. > That's why it is intended to be executed only after general igvn. Keep in mind that final graph reshaping also resides in shared code. > Macro expansion would be too early, as we still do platform-independent igvn there, while final graph reshaping and custom matching logic would be too late, as we have destroyed the node hash table already. The nice thing about macro expansion is that it is performed consistently. Every macro node is unconditionally expanded and subsequent passes don't have to care about them. (The same applies to matching: Ideal nodes are consistently turned into platform-specific Mach nodes, except a very limited number of well-known cases.) I'd say that if we want the lowering pass being discussed to be truly scalable, it's better to follow the same pattern. I have some doubts that platform-specific ad-hoc IR tweaks scale will scale well. BTW it's not clear to me now what particular benefits IGVN brings. `DivMod` transformation doesn't use IGVN and after examining `MacroLogicV` code it can be rewritten to avoid IGVN as well. I believe that `ExtractI` case you mentioned can be implemented without relying on IGVN. In that case, lowering transformations can be moved to a later phase. > I don't think this is a concern, enumerating all live nodes once without doing anything is not expensive. I do have some concerns about unconditionally performing something which is useless most of the time (or even completely redundant on some platforms). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2445501092 From dholmes at openjdk.org Tue Oct 29 23:24:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Oct 2024 23:24:13 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: <7B7MMHwfwAfOtB7M4UMRx31YG_WU2S5BbLsKS9KMhh0=.b0bae427-aefd-413d-a189-b2ce147ae76e@github.com> References: <7B7MMHwfwAfOtB7M4UMRx31YG_WU2S5BbLsKS9KMhh0=.b0bae427-aefd-413d-a189-b2ce147ae76e@github.com> Message-ID: On Tue, 29 Oct 2024 17:32:06 GMT, SendaoYan wrote: >> Hi all, >> On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). >> The both PRs add the same C++ function implementation `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. >> I think the newly added functions can be merged to one. >> I choose to delete the implementation added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. >> >> Additonal testing: >> >> - [x] linux x64 build with release/fastdebug/slowdebug configure >> - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build >> >> The releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) shows below: >> >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3Intrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3MultiBlockIntrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java > > Thanks all for the review. Thanks @sendaoYan ------------- PR Comment: https://git.openjdk.org/jdk/pull/21768#issuecomment-2445501296 From syan at openjdk.org Tue Oct 29 23:24:13 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 29 Oct 2024 23:24:13 GMT Subject: RFR: 8343211: Compile error: redefinition of 'Assembler::evmovdquw(XMMRegister,KRegister,XMMRegister,bool,int)' In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 16:59:12 GMT, SendaoYan wrote: > Hi all, > On linux-x64 gcc generate compile error: `src/hotspot/cpu/x86/assembler_x86.cpp:3646:6: error: redefinition of 'void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)'` after [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) and [JDK-8338021](https://bugs.openjdk.org/browse/JDK-8338021). > The both PRs add the same C++ function implementation `void Assembler::evmovdquw(XMMRegister, KRegister, XMMRegister, bool, int)`. > I think the newly added functions can be merged to one. > I choose to delete the implementation added by [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527). And I verify the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527), after this PR the releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) run passed. > > Additonal testing: > > - [x] linux x64 build with release/fastdebug/slowdebug configure > - [ ] jtreg tests(include tier1/2/3 etc.) on linux x64 with release build > > The releated tests of [JDK-8341527](https://bugs.openjdk.org/browse/JDK-8341527) shows below: > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA256MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA512MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA3MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1MultiBlockIntrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java Sorry for the delay integration... ------------- PR Comment: https://git.openjdk.org/jdk/pull/21768#issuecomment-2445501304 From amitkumar at openjdk.org Wed Oct 30 03:12:11 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 30 Oct 2024 03:12:11 GMT Subject: RFR: 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 13:02:31 GMT, Matthias Baesken wrote: >> Adjust TestVectorizationMismatchedAccess.java for Big Endian Platforms > > Fixes the error on AIX in our central tests. Thanks @MBaesken @eme64 for testing & approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21736#issuecomment-2445736067 From amitkumar at openjdk.org Wed Oct 30 03:12:12 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 30 Oct 2024 03:12:12 GMT Subject: Integrated: 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 10:09:13 GMT, Amit Kumar wrote: > Adjust TestVectorizationMismatchedAccess.java for Big Endian Platforms This pull request has now been integrated. Changeset: b6f745df Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/b6f745df5795341dab1fc049a188a9e70d563a1a Stats: 31 lines in 1 file changed: 8 ins; 3 del; 20 mod 8342489: compiler/c2/irTests/TestVectorizationMismatchedAccess.java fails on big-endian platforms Reviewed-by: epeter, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/21736 From jkarthikeyan at openjdk.org Wed Oct 30 04:43:55 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 30 Oct 2024 04:43:55 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: References: Message-ID: > Hi all, > This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. > > In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into phase-lowering - Remove platform-dependent node definitions, rework PhaseLowering implementation - Address some changes from code review - Implement PhaseLowering ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21599/files - new: https://git.openjdk.org/jdk/pull/21599/files/7fbc4509..c7ceec71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=01-02 Stats: 292301 lines in 2840 files changed: 243768 ins; 34360 del; 14173 mod Patch: https://git.openjdk.org/jdk/pull/21599.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21599/head:pull/21599 PR: https://git.openjdk.org/jdk/pull/21599 From jkarthikeyan at openjdk.org Wed Oct 30 04:43:55 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 30 Oct 2024 04:43:55 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: <_bIKE5NejQ9yFVAXMMUwDxREhXLJGQw-1U-V1KqC1xY=.1e3f43a1-3773-4151-862a-5404a1a084ff@github.com> References: <_bIKE5NejQ9yFVAXMMUwDxREhXLJGQw-1U-V1KqC1xY=.1e3f43a1-3773-4151-862a-5404a1a084ff@github.com> Message-ID: On Mon, 28 Oct 2024 04:55:37 GMT, Quan Anh Mai wrote: >> Ah, I see what you mean now. I think this makes extending IGVN more appealing because we could continue to do Ideal on lowered nodes, as you mentioned. We could override `PhaseGVN::apply_ideal` to return `nullptr` when processing regular nodes, but run the other `Ideal` type when encountering lowered nodes. Do you think it would be better to add another method to `Node` or should we re-use the existing Ideal call, but lowering specific nodes are guarded with a new node flag? > > I think having a new method in `Node` would be more manageable, I can imagine it allows us to reuse pre-lowered nodes for lowering. The example I gave above we reuse `ExtractI` since the semantics is still the same, the only difference is that from here `ExtractI` can only appear with the index parameter being smaller than 4. I think in general we'll want different nodes to reduce sources of ambiguity, but I've made this change in the latest patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1821863386 From jkarthikeyan at openjdk.org Wed Oct 30 04:51:09 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 30 Oct 2024 04:51:09 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 04:43:55 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. >> >> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into phase-lowering > - Remove platform-dependent node definitions, rework PhaseLowering implementation > - Address some changes from code review > - Implement PhaseLowering Thanks everyone for the discussion. I've pushed a commit that restructures the pass, removing the backend-specific node definition and making the pass extend `PhaseIterGVN` so that nodes can do further idealizations during lowering without complicating the main lowering switch. I also added a shared component to lowering, to facilitate moving transforms that impact multiple backends like `DivMod` to it. Lowering is also now the final phase before final graph reshaping now, since late inlines could also use IGVN. Some more comments: > It looks attractive at first, but the downside is subsequent passes may start to require platform-specific code as well (e.g., think of final graph reshaping which operates on Node opcodes). This makes sense to me. I agree that the extra complexity required to deal with this change in other parts of the code isn't worth it. The new commit removes this part of the changeset. > BTW it's not clear to me now what particular benefits IGVN brings. `DivMod` transformation doesn't use IGVN and after examining `MacroLogicV` code it can be rewritten to avoid IGVN as well. The main benefits are being able to reuse node hashing to de-duplicate redundant nodes and being able to use the existing IGVN types that were calculated (which #21244 uses). Some examples where GVN could be useful in final graph reshaping is when reshaping shift nodes and `Op_CmpUL`, where new nodes are created to approximate existing nodes on platforms without support. While I think it is unlikely that any of the created nodes would common with existing nodes except the `ConNode`s, I think it would be nice to reduce the possibility of redundant nodes in the graph before matching. This would include `DivMod` in the cases where the backend doesn't support the `DivMod` node, as multiplication and subtraction is emitted instead. I'm working on refactoring these cases in my example patch. I think it would be nice to make lowering where these platform specific optimizations occur while final graph reshaping focuses on preparing the graph for matching. > I'd say that if we want the lowering pass being discussed to be truly scalable, it's better to follow the same pattern. I have some doubts that platform-specific ad-hoc IR tweaks scale will scale well. My main concern with the macro-expansion style is that with the proposed transforms unconditional expansion/lowering of nodes isn't always possible. For example, In final graph reshaping for `DivMod` it can be the case that we don't find a relevant `ModNode` to combine into a `DivMod`. So with that limitation we wouldn't be able to move it to this pass. With regard to scalability, since lowering is the last step before final graph reshaping now, I think the only places where lowered nodes could interact with existing code is in final graph reshaping and final cleanup before matching. With final graph reshaping I think the impact is minimal, as any changes that would need to be done there could be done during lowering itself. During final cleanup before matching, we convert nodes with more than 2 inputs into a binary tree form for the platform matcher to consume. In this place it would be possible to process lowered nodes, but with lowered nodes being defined in shared code now this can be done purely in shared code still. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2445839113 From chagedorn at openjdk.org Wed Oct 30 06:14:09 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Oct 2024 06:14:09 GMT Subject: RFR: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor [v5] In-Reply-To: References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Mon, 28 Oct 2024 14:42:34 GMT, Christian Hagedorn wrote: >> #### Replacing the Remaining Predicate Walking and Cloning Code >> In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. >> >> #### Single Template Assertion Predicate Check >> This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). >> >> #### Common Refactorings for all the Patches in this Series >> In each of the patch, I will do similar refactoring ideas: >> - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. >> - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. >> - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. >> - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). >> - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. >> >> #### Refactorings of this Patch >> This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342... > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Fix indentation > - Fix build Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21679#issuecomment-2445955092 From chagedorn at openjdk.org Wed Oct 30 06:14:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Oct 2024 06:14:10 GMT Subject: Integrated: 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor In-Reply-To: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> References: <4500YskED7RIOnx9fctSBmQhZfT7Gh3pv7Zxhl83ztE=.1c1a1fbf-e3e4-43da-983f-8ea7daed1687@github.com> Message-ID: On Thu, 24 Oct 2024 10:45:12 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > In the next series of patches (this, [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943), [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945), and [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946)) I want to replace the predicate walking and cloning code used for Loop Peeling, Pre/Main/Post Loops, Loop Unswitching, removing useless Assertion Predicates and Loop Unrolling with new `PredicateVisitors` which can be used in combination with the new `PredicateIterator`. > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head (see **P1** PR comment). > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates (see **P2** PR comment). This limitation should eventually be removed. But I want to do that separately at a later point. > > #### Refactorings of this Patch > This first patch replaces the predicate walking and cloning code for Loop Peeling and lays the foundation for the replacement for main/post loops ([JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943)) which is quite similar. Th... This pull request has now been integrated. Changeset: 63c19d3d Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/63c19d3db586920108808866c7a094a5ae41bc22 Stats: 200 lines in 4 files changed: 134 ins; 53 del; 13 mod 8341977: Replace predicate walking and cloning code for Loop Peeling with a predicate visitor Reviewed-by: kvn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/21679 From rcastanedalo at openjdk.org Wed Oct 30 08:30:09 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Oct 2024 08:30:09 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v3] In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 10:42:26 GMT, Daniel Lund?n wrote: >> Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. >> >> ### Changeset >> >> - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. >> - Add a regression test. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update mask size comment after suggestions Looks better, thanks. I suggest though a slight rephrase to avoid misleading the reader into thinking there's some other problem with rounding besides what is already described in the comment. src/hotspot/share/adlc/formsopt.cpp line 180: > 178: // in the register mask regardless of how much slack is created by rounding > 179: // up. Problematic rounding occurred when we added 16 new registers for > 180: // APX. Suggestion: // This was found necessary after adding 16 new registers for APX. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21612#pullrequestreview-2403974424 PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1822106032 From epeter at openjdk.org Wed Oct 30 08:59:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 30 Oct 2024 08:59:08 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 15:15:54 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Suggestions from review Drive-by comments. Generally looks reasonable, nice work. I have 2 comments below. src/hotspot/share/opto/addnode.cpp line 1268: > 1266: Node* MaxINode::Identity(PhaseGVN* phase) { > 1267: const TypeInt* t1 = phase->type(in(1))->is_int(); > 1268: const TypeInt* t2 = phase->type(in(2))->is_int(); Could any input be `TOP`? test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 35: > 33: * @summary Test identities of MinNodes and MaxNodes. > 34: * @key randomness > 35: * @requires (os.simpleArch == "x64" & vm.cpu.features ~= ".*avx.*") | os.arch == "aarch64" | os.arch == "riscv64" Is there a chance we can add these `requires` to the `@IR` rules instead? That way we can still do the result verification on all other platforms, which could be valuable on its own. ------------- PR Review: https://git.openjdk.org/jdk/pull/21439#pullrequestreview-2404033717 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1822143491 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1822149392 From jbhateja at openjdk.org Wed Oct 30 09:50:07 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 30 Oct 2024 09:50:07 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support In-Reply-To: References: Message-ID: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> On Tue, 29 Oct 2024 17:19:20 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) I think we should first check-in extended gtest asm validation script detecting these issues either before or along with this patch. src/hotspot/cpu/x86/assembler_x86.cpp line 1483: > 1481: void Assembler::eaddl(Register dst, Register src1, Register src2, bool no_flags) { > 1482: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 1483: (void) evex_prefix_and_encode_ndd(src2->encoding(), dst->encoding(), src1->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); Hi @vamsi-parasa, NDD is very flexible in terms of argument selection, i.e. ADDL NDD, SRC1 (ModRM.R/M), SRC2 (ModRM.REG) has opcode 0x01 Whereas, ADDL NDD, SRC1 (ModRM.REG), SRC2 (ModRM.R/M) has opcode 0x03 In this case, we are trying to match GCC encoding scheme. Can you please add the following comment here since the argument nomenclature does not match with parameter nomenclature? NDD shares its encoding bits with NDS bits for regular EVEX instruction. Therefore we are passing DST as the second argument to minimize changes in leaf level routine. src/hotspot/cpu/x86/assembler_x86.cpp line 2632: > 2630: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 2631: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); > 2632: evex_prefix_nf(src, 0, dst->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); Could you also replace VEX_OPCODE_OF_3C with the standard naming convention of VEX_OPCODE_MAP4? I added /*MAP4*/ in the comments after the prefix for the setzuCC instruction, but it's better to make this change consistently in all places. ------------- PR Review: https://git.openjdk.org/jdk/pull/21770#pullrequestreview-2403880657 PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1822168026 PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1822048023 From dlunden at openjdk.org Wed Oct 30 09:53:50 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 30 Oct 2024 09:53:50 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v4] In-Reply-To: References: Message-ID: > Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. > > ### Changeset > > - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. > - Add a regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/adlc/formsopt.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21612/files - new: https://git.openjdk.org/jdk/pull/21612/files/873a8ffe..33809e9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21612&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21612&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21612.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21612/head:pull/21612 PR: https://git.openjdk.org/jdk/pull/21612 From dlunden at openjdk.org Wed Oct 30 09:53:51 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 30 Oct 2024 09:53:51 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v3] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 08:23:19 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update mask size comment after suggestions > > src/hotspot/share/adlc/formsopt.cpp line 180: > >> 178: // in the register mask regardless of how much slack is created by rounding >> 179: // up. Problematic rounding occurred when we added 16 new registers for >> 180: // APX. > > Suggestion: > > // This was found necessary after adding 16 new registers for APX. Thanks, I've committed the suggestion ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1822244351 From dlunden at openjdk.org Wed Oct 30 09:56:09 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 30 Oct 2024 09:56:09 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v3] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 08:27:42 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update mask size comment after suggestions > > Looks better, thanks. I suggest though a slight rephrase to avoid misleading the reader into thinking there's some other problem with rounding besides what is already described in the comment. Thanks @robcasloz and @vnkozlov for the reviews. I need one final re-review now after commiting Roberto's last suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21612#issuecomment-2446384475 From rcastanedalo at openjdk.org Wed Oct 30 10:01:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Oct 2024 10:01:07 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v4] In-Reply-To: References: Message-ID: <1Cjdx6s9mMzgEiHpN1l1dMkkFbM6_gZyfpGxq-bXLcA=.fe4f15c0-7b20-4da0-be7a-d93f25534a8f@github.com> On Wed, 30 Oct 2024 09:53:50 GMT, Daniel Lund?n wrote: >> Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. >> >> ### Changeset >> >> - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. >> - Add a regression test. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/adlc/formsopt.cpp > > Co-authored-by: Roberto Casta?eda Lozano Thanks, looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21612#pullrequestreview-2404231652 From thartmann at openjdk.org Wed Oct 30 10:13:06 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 30 Oct 2024 10:13:06 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" [v2] In-Reply-To: References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> Message-ID: On Tue, 29 Oct 2024 22:50:30 GMT, Cesar Soares Lucas wrote: >> Please, review this patch to fix an issue that may occur when serializing debug information related to reduce allocation merges. The problem happens when there are more than one JVMS in a `uncommon_trap` and a _younger_ JVMS doesn't have the RAM inputs as a local/expression/monitor but an older JVMS does. In that situation the loop at line 1173 of output.cpp will set the `is_root` property of the ObjectValue to `false` when processing the younger JVMS even though it may have been set to `true` when visiting the older JVMS. >> >> Tested on: >> - Win, Mac & Linux tier1-4 on x64 & Aarch64. >> - CTW with some thousands of jars. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: typo on test & refactor in output.cpp. Looks good to me too. src/hotspot/share/opto/output.cpp line 1182: > 1180: bool is_root = locarray->contains(ov) || > 1181: exparray->contains(ov) || > 1182: contains_as_owner(monarray, ov) || Indentation is slightly off here. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21624#pullrequestreview-2404275120 PR Review Comment: https://git.openjdk.org/jdk/pull/21624#discussion_r1822282197 From chagedorn at openjdk.org Wed Oct 30 11:19:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Oct 2024 11:19:10 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v4] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 09:53:50 GMT, Daniel Lund?n wrote: >> Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. >> >> ### Changeset >> >> - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. >> - Add a regression test. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/adlc/formsopt.cpp > > Co-authored-by: Roberto Casta?eda Lozano Looks good to me, too. Thanks for fixing it and adding me as contributor! src/hotspot/share/adlc/formsopt.cpp line 178: > 176: // - Round up to the next doubleword size. > 177: // - Add one more word to accommodate a reasonable number of stack locations > 178: // in the register mask regardless of how much slack is created by rounding Suggestion: // in the register mask regardless of how much slack is created by rounding. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21612#pullrequestreview-2404485699 PR Review Comment: https://git.openjdk.org/jdk/pull/21612#discussion_r1822403745 From thartmann at openjdk.org Wed Oct 30 11:43:34 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 30 Oct 2024 11:43:34 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers Message-ID: @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. Thanks, Tobias ------------- Commit messages: - 8343206: Final graph reshaping should not compress abstract or interface class pointers Changes: https://git.openjdk.org/jdk/pull/21784/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343206 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21784/head:pull/21784 PR: https://git.openjdk.org/jdk/pull/21784 From thartmann at openjdk.org Wed Oct 30 11:47:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 30 Oct 2024 11:47:50 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v2] In-Reply-To: References: Message-ID: > @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. > > I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 > > Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 > > And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. > > I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21784/files - new: https://git.openjdk.org/jdk/pull/21784/files/a2e4c17d..54f08ead Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21784/head:pull/21784 PR: https://git.openjdk.org/jdk/pull/21784 From thartmann at openjdk.org Wed Oct 30 11:53:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 30 Oct 2024 11:53:40 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: References: Message-ID: > @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. > > I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 > > Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 > > And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. > > I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Typo2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21784/files - new: https://git.openjdk.org/jdk/pull/21784/files/54f08ead..c69022b3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21784/head:pull/21784 PR: https://git.openjdk.org/jdk/pull/21784 From coleenp at openjdk.org Wed Oct 30 12:20:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 30 Oct 2024 12:20:06 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: References: Message-ID: <4_84pZqk5-pV1iTUdpf5wmVczTdHq-9-Re1qjbGU7Eo=.0fb46e18-883f-45f8-827d-567602373431@github.com> On Wed, 30 Oct 2024 11:53:40 GMT, Tobias Hartmann wrote: >> @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. >> >> I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 >> >> Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 >> >> And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. >> >> I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Typo2 src/hotspot/share/opto/compile.cpp line 3498: > 3496: assert(false, "Interface or abstract class pointers should not be compressed"); > 3497: } else { > 3498: new_in2 = ConNode::make(t->make_narrowklass()); When I was looking through this code, I was hoping there'd be some sort of assert in the make_narrowklass function so any caller would assert but maybe you don't have that info? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1822503328 From dlunden at openjdk.org Wed Oct 30 12:25:51 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 30 Oct 2024 12:25:51 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v5] In-Reply-To: References: Message-ID: > Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. > > ### Changeset > > - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. > - Add a regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/adlc/formsopt.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21612/files - new: https://git.openjdk.org/jdk/pull/21612/files/33809e9b..87745d69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21612&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21612&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21612.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21612/head:pull/21612 PR: https://git.openjdk.org/jdk/pull/21612 From rcastanedalo at openjdk.org Wed Oct 30 12:36:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Oct 2024 12:36:07 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v5] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 12:25:51 GMT, Daniel Lund?n wrote: >> Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. >> >> ### Changeset >> >> - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. >> - Add a regression test. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/adlc/formsopt.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21612#pullrequestreview-2404717739 From chagedorn at openjdk.org Wed Oct 30 13:41:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Oct 2024 13:41:15 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v5] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 12:25:51 GMT, Daniel Lund?n wrote: >> Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. >> >> ### Changeset >> >> - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. >> - Add a regression test. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/adlc/formsopt.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21612#pullrequestreview-2404953685 From chagedorn at openjdk.org Wed Oct 30 13:42:26 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Oct 2024 13:42:26 GMT Subject: RFR: 8343296: IGV: Show pre/main/post at CountedLoopNodes Message-ID: It's sometimes difficult to keep track of which counted loops are pre, main or post loops. This patch adds this property to `CountedLoops`, similarly to the other node info that we print, for example, like a method name for a `CallStaticJava`. This is achieved by emitting a new property to `CountedLoopNodes` and then extending the node info filter in IGV. This will show the pre/main/post info like the other node info below the node name: ![image](https://github.com/user-attachments/assets/69abe3f4-3e97-4dfe-983a-0153057b1a8d) Thanks to @robcasloz for helping me with the IGV filter changes! Thanks, Christian ------------- Commit messages: - 8343296: IGV: Show pre/main/post at CountedLoopNodes Changes: https://git.openjdk.org/jdk/pull/21788/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21788&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343296 Stats: 30 lines in 4 files changed: 30 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21788.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21788/head:pull/21788 PR: https://git.openjdk.org/jdk/pull/21788 From cslucas at openjdk.org Wed Oct 30 15:11:18 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 30 Oct 2024 15:11:18 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" Message-ID: Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. --------- ### Tests Win, Mac & Linux tier1-4 on x64 & Aarch64. ------------- Commit messages: - fix spaces - Reevaluate Phi reducible status after one of its input become NSR. Changes: https://git.openjdk.org/jdk/pull/21778/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21778&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340454 Stats: 114 lines in 3 files changed: 111 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21778/head:pull/21778 PR: https://git.openjdk.org/jdk/pull/21778 From rcastanedalo at openjdk.org Wed Oct 30 15:14:09 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Oct 2024 15:14:09 GMT Subject: RFR: 8343296: IGV: Show pre/main/post at CountedLoopNodes In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 13:36:53 GMT, Christian Hagedorn wrote: > It's sometimes difficult to keep track of which counted loops are pre, main or post loops. This patch adds this property to `CountedLoops`, similarly to the other node info that we print, for example, like a method name for a `CallStaticJava`. > > This is achieved by emitting a new property to `CountedLoopNodes` and then extending the node info filter in IGV. This will show the pre/main/post info like the other node info below the node name: > > ![image](https://github.com/user-attachments/assets/69abe3f4-3e97-4dfe-983a-0153057b1a8d) > > Thanks to @robcasloz for helping me with the IGV filter changes! > > Thanks, > Christian Looks good otherwise, thanks for doing this Christian! src/utils/IdealGraphVisualizer/ServerCompiler/src/main/resources/com/sun/hotspot/igv/servercompiler/filters/customNodeInfo.filter line 37: > 35: editProperty(hasProperty("loop_kind"), ["loop_kind"], "extra_label", > 36: function(loop_kind) { return loop_kind[0]; }); > 37: Can you remove this extra line at the end? ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21788#pullrequestreview-2405230412 PR Review Comment: https://git.openjdk.org/jdk/pull/21788#discussion_r1822817709 From chagedorn at openjdk.org Wed Oct 30 15:37:22 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Oct 2024 15:37:22 GMT Subject: RFR: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor Message-ID: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> #### Replacing the Remaining Predicate Walking and Cloning Code The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (this PR) - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicate (upcoming) - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Main and Post Loop (upcoming) --- (Sections taken over from https://github.com/openjdk/jdk/pull/21679) #### Single Template Assertion Predicate Check This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). #### Common Refactorings for all the Patches in this Series In each of the patch, I will do similar refactoring ideas: - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. --- #### Refactorings of this Patch This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code established with [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977).This patch includes: - Replacing Assertion Predicate code for the main loop. - Replacing Assertion Predicate code for the post loop: - Moved code inside `insert_post_loop()` to solve two problems: - Before this patch, Assertion Predicates for the main and post loop have only been created after creating the pre and post loop. This made the rewiring logic for data nodes more complicated because we could have nodes being part of the pre, main or post loop. By moving the Assertion Predicate creation to `insert_post_loop()`, we do not need to worry about the pre loop data nodes (not created, yet) when processing the post loop. Similarly, we do not need to worry about the post loop data nodes (already processed) when creating Assertion Predicates for the main loop. - A post loop is also inserted as vector post loop with `insert_vector_post_loop()`. I'm not sure if there was a problem with omitting the Assertion Predicates there. Either way, it's now fixed for any kind of post loop. - To correctly rewire data dependencies, we need to check if a node is part of the original loop body or the cloned loop body. To do that, I've introduced an interface `NodeInLoopBody` with [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977) with an implementation for the original loop body (i.e. `NodeInOriginalLoopBody`). We now also need the other case and thus I've added the class `NodeInClonedLoopBody`. Thanks, Christian ------------- Commit messages: - swap parameters - 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor Changes: https://git.openjdk.org/jdk/pull/21790/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21790&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342943 Stats: 275 lines in 5 files changed: 94 ins; 164 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/21790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21790/head:pull/21790 PR: https://git.openjdk.org/jdk/pull/21790 From chagedorn at openjdk.org Wed Oct 30 15:37:27 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Oct 2024 15:37:27 GMT Subject: RFR: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor In-Reply-To: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> References: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> Message-ID: On Wed, 30 Oct 2024 15:18:56 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (this PR) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicate (upcoming) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Main and Post Loop (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > --- > > #### Refactorings of this Patch > This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code establishe... src/hotspot/share/opto/loopTransform.cpp line 1320: > 1318: // Assertion Predicates ensures that the main-loop is removed if some type ranges of Cast or Convert nodes become > 1319: // impossible and are replaced by top (i.e. a sign that the main-loop is dead). > 1320: void PhaseIdealLoop::copy_assertion_predicates_to_main_loop_helper(const PredicateBlock* predicate_block, Node* init, Replaced with `PhaseIdealLoop::initialize_assertion_predicates_for_main_loop()` src/hotspot/share/opto/loopTransform.cpp line 1472: > 1470: // removed after loop opts, these are never executed. We therefore insert a Halt node instead of an uncommon trap. > 1471: Node* PhaseIdealLoop::clone_template_assertion_predicate(IfNode* iff, Node* new_init, Node* predicate, Node* uncommon_proj, > 1472: Node* control, IdealLoopTree* outer_loop, Node* new_control) { Moved to `TemplateAssertionPredicate::clone_and_replace_init()`. src/hotspot/share/opto/loopTransform.cpp line 1487: > 1485: } > 1486: > 1487: void PhaseIdealLoop::copy_assertion_predicates_to_main_loop(CountedLoopNode* pre_head, Node* init, Node* stride, Replaced with `PhaseIdealLoop::initialize_assertion_predicates_for_main_loop()`. src/hotspot/share/opto/loopTransform.cpp line 1852: > 1850: const uint first_node_index_in_cloned_loop_body) { > 1851: const NodeInClonedLoopBody node_in_cloned_loop_body(first_node_index_in_cloned_loop_body); > 1852: create_assertion_predicates_at_loop(main_loop_head, post_loop_head, node_in_cloned_loop_body, false); With the already refactored code for Loop Peeling, we can simply reuse `create_assertion_predicates_at_loop()` which simplifies the code a lot. src/hotspot/share/opto/loopTransform.cpp line 1864: > 1862: Node* target_loop_entry = target_outer_loop_head->in(LoopNode::EntryControl); > 1863: CreateAssertionPredicatesVisitor create_assertion_predicates_visitor(init, stride, target_loop_entry, this, > 1864: _node_in_loop_body, clone_template); Not a very elegant solution to introduce `clone_template` but I'm planning to update this code again at a later point. This was the least invasive way to handle this. src/hotspot/share/opto/loopTransform.cpp line 1942: > 1940: // Go over the Assertion Predicates of the main loop and make a copy for the post loop with its initial iv value and > 1941: // stride as inputs. > 1942: void PhaseIdealLoop::copy_assertion_predicates_to_post_loop(LoopNode* main_loop_head, CountedLoopNode* post_loop_head, Replaced with `PhaseIdealLoop::initialize_assertion_predicates_for_post_loop()`. src/hotspot/share/opto/loopnode.hpp line 947: > 945: IfTrueNode* create_initialized_assertion_predicate(IfNode* template_assertion_predicate, Node* new_init, > 946: Node* new_stride, Node* control); > 947: DEBUG_ONLY(static bool assertion_predicate_has_loop_opaque_node(IfNode* iff);) Needs to be public since I'm calling it from the `TemplateAssertionPredicate` class. Planning to refactor/move this method at a later point in time. src/hotspot/share/opto/predicates.cpp line 764: > 762: } > 763: if (_clone_template) { > 764: _new_control = clone_template_and_replace_init_input(template_assertion_predicate); For the main loop, we need to initialize templates **and** clone them to further create Initialized Assertion Predicates from later when unrolling. Note that in the full fix, we should **always** clone templates since we don't know if the loop is going to split further. However, I don't want to change semantics here as this is a simple refactoring patch. I'm addressing these problems in a later PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21790#discussion_r1822858323 PR Review Comment: https://git.openjdk.org/jdk/pull/21790#discussion_r1822859741 PR Review Comment: https://git.openjdk.org/jdk/pull/21790#discussion_r1822861130 PR Review Comment: https://git.openjdk.org/jdk/pull/21790#discussion_r1822883179 PR Review Comment: https://git.openjdk.org/jdk/pull/21790#discussion_r1822866079 PR Review Comment: https://git.openjdk.org/jdk/pull/21790#discussion_r1822861954 PR Review Comment: https://git.openjdk.org/jdk/pull/21790#discussion_r1822874539 PR Review Comment: https://git.openjdk.org/jdk/pull/21790#discussion_r1822870022 From chagedorn at openjdk.org Wed Oct 30 15:38:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Oct 2024 15:38:57 GMT Subject: RFR: 8343296: IGV: Show pre/main/post at CountedLoopNodes [v2] In-Reply-To: References: Message-ID: <1XQshU5s-sTqVK-9uKf_TNBJ8l3YKepyKj0YJZXmLF8=.fee84a26-68fc-4ed8-80e5-aad2a3c5dbef@github.com> > It's sometimes difficult to keep track of which counted loops are pre, main or post loops. This patch adds this property to `CountedLoops`, similarly to the other node info that we print, for example, like a method name for a `CallStaticJava`. > > This is achieved by emitting a new property to `CountedLoopNodes` and then extending the node info filter in IGV. This will show the pre/main/post info like the other node info below the node name: > > ![image](https://github.com/user-attachments/assets/69abe3f4-3e97-4dfe-983a-0153057b1a8d) > > Thanks to @robcasloz for helping me with the IGV filter changes! > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: - add new line - remove empty line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21788/files - new: https://git.openjdk.org/jdk/pull/21788/files/d29b67ff..13cef55a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21788&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21788&range=00-01 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21788.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21788/head:pull/21788 PR: https://git.openjdk.org/jdk/pull/21788 From chagedorn at openjdk.org Wed Oct 30 15:38:58 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Oct 2024 15:38:58 GMT Subject: RFR: 8343296: IGV: Show pre/main/post at CountedLoopNodes In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 13:36:53 GMT, Christian Hagedorn wrote: > It's sometimes difficult to keep track of which counted loops are pre, main or post loops. This patch adds this property to `CountedLoops`, similarly to the other node info that we print, for example, like a method name for a `CallStaticJava`. > > This is achieved by emitting a new property to `CountedLoopNodes` and then extending the node info filter in IGV. This will show the pre/main/post info like the other node info below the node name: > > ![image](https://github.com/user-attachments/assets/69abe3f4-3e97-4dfe-983a-0153057b1a8d) > > Thanks to @robcasloz for helping me with the IGV filter changes! > > Thanks, > Christian Thanks Roberto for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21788#issuecomment-2447568304 From chagedorn at openjdk.org Wed Oct 30 15:39:00 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Oct 2024 15:39:00 GMT Subject: RFR: 8343296: IGV: Show pre/main/post at CountedLoopNodes [v2] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 14:58:44 GMT, Roberto Casta?eda Lozano wrote: >> Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: >> >> - add new line >> - remove empty line > > src/utils/IdealGraphVisualizer/ServerCompiler/src/main/resources/com/sun/hotspot/igv/servercompiler/filters/customNodeInfo.filter line 37: > >> 35: editProperty(hasProperty("loop_kind"), ["loop_kind"], "extra_label", >> 36: function(loop_kind) { return loop_kind[0]; }); >> 37: > > Can you remove this extra line at the end? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21788#discussion_r1822886261 From dfenacci at openjdk.org Wed Oct 30 16:29:36 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 30 Oct 2024 16:29:36 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 Message-ID: # Issue The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). # Solution The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: * when 1GB huge pages are supported and can be allocated correctly * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). ------------- Commit messages: - JDK-8343153: add check for 2MB huge pages match - JDK-8343153: split long string - JDK-8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 Changes: https://git.openjdk.org/jdk/pull/21757/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343153 Stats: 15 lines in 2 files changed: 12 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21757.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21757/head:pull/21757 PR: https://git.openjdk.org/jdk/pull/21757 From dfenacci at openjdk.org Wed Oct 30 16:33:04 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 30 Oct 2024 16:33:04 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 In-Reply-To: References: Message-ID: <7wRu3Qr9iuBhrmawGoJxirBdEppXFD3wPXKopgOeNGw=.6f292838-dbc8-4c1c-8a78-f5fc81955f53@github.com> On Tue, 29 Oct 2024 10:54:31 GMT, Damon Fenacci wrote: > # Issue > > The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. > > On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). > > If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). > > # Solution > > The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. > So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: > * when 1GB huge pages are supported and can be allocated correctly > * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). @eastig I noticed that you are the author of the original `testNonSegmented1GbCodeCacheWith1GbLargePages` test. Could I ask you to have a look at this change? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21757#issuecomment-2447737434 From qamai at openjdk.org Wed Oct 30 17:27:14 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 30 Oct 2024 17:27:14 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 04:43:55 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. >> >> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into phase-lowering > - Remove platform-dependent node definitions, rework PhaseLowering implementation > - Address some changes from code review > - Implement PhaseLowering src/hotspot/share/opto/phaseX.cpp line 2273: > 2271: > 2272: Node* PhaseLowering::apply_ideal(Node* k, bool can_reshape) { > 2273: // Run the lowered Ideal method to continue doing transformations on the node, while avoiding existing transforms Can you call `lower(k)` here and as a result, you can simply do `lower.optimize()` as the main entry? src/hotspot/share/opto/phaseX.cpp line 2289: > 2287: _worklist.ensure_empty(); > 2288: > 2289: C->identify_useful_nodes(_worklist); To address @iwanowww 's concern, you can have a backend-specific method `PhaseLowering::do_lowering()` that will decide whether we should perform lowering on a particular graph. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1823087121 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1823088532 From cslucas at openjdk.org Wed Oct 30 17:42:30 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 30 Oct 2024 17:42:30 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" [v3] In-Reply-To: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> Message-ID: > Please, review this patch to fix an issue that may occur when serializing debug information related to reduce allocation merges. The problem happens when there are more than one JVMS in a `uncommon_trap` and a _younger_ JVMS doesn't have the RAM inputs as a local/expression/monitor but an older JVMS does. In that situation the loop at line 1173 of output.cpp will set the `is_root` property of the ObjectValue to `false` when processing the younger JVMS even though it may have been set to `true` when visiting the older JVMS. > > Tested on: > - Win, Mac & Linux tier1-4 on x64 & Aarch64. > - CTW with some thousands of jars. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: fix indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21624/files - new: https://git.openjdk.org/jdk/pull/21624/files/26b0b869..818b09e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21624&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21624&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21624.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21624/head:pull/21624 PR: https://git.openjdk.org/jdk/pull/21624 From rcastanedalo at openjdk.org Wed Oct 30 17:56:05 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Oct 2024 17:56:05 GMT Subject: RFR: 8343296: IGV: Show pre/main/post at CountedLoopNodes [v2] In-Reply-To: <1XQshU5s-sTqVK-9uKf_TNBJ8l3YKepyKj0YJZXmLF8=.fee84a26-68fc-4ed8-80e5-aad2a3c5dbef@github.com> References: <1XQshU5s-sTqVK-9uKf_TNBJ8l3YKepyKj0YJZXmLF8=.fee84a26-68fc-4ed8-80e5-aad2a3c5dbef@github.com> Message-ID: <4Ks2uO9fi3ecReMlA99FMGo2jfgIWiczc_Xhosf2Itg=.7eff3b84-29d0-4707-98b8-45865038e3d4@github.com> On Wed, 30 Oct 2024 15:38:57 GMT, Christian Hagedorn wrote: >> It's sometimes difficult to keep track of which counted loops are pre, main or post loops. This patch adds this property to `CountedLoops`, similarly to the other node info that we print, for example, like a method name for a `CallStaticJava`. >> >> This is achieved by emitting a new property to `CountedLoopNodes` and then extending the node info filter in IGV. This will show the pre/main/post info like the other node info below the node name: >> >> ![image](https://github.com/user-attachments/assets/69abe3f4-3e97-4dfe-983a-0153057b1a8d) >> >> Thanks to @robcasloz for helping me with the IGV filter changes! >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - add new line > - remove empty line Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21788#pullrequestreview-2405807752 From kvn at openjdk.org Wed Oct 30 18:46:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Oct 2024 18:46:12 GMT Subject: RFR: 8343296: IGV: Show pre/main/post at CountedLoopNodes [v2] In-Reply-To: <1XQshU5s-sTqVK-9uKf_TNBJ8l3YKepyKj0YJZXmLF8=.fee84a26-68fc-4ed8-80e5-aad2a3c5dbef@github.com> References: <1XQshU5s-sTqVK-9uKf_TNBJ8l3YKepyKj0YJZXmLF8=.fee84a26-68fc-4ed8-80e5-aad2a3c5dbef@github.com> Message-ID: On Wed, 30 Oct 2024 15:38:57 GMT, Christian Hagedorn wrote: >> It's sometimes difficult to keep track of which counted loops are pre, main or post loops. This patch adds this property to `CountedLoops`, similarly to the other node info that we print, for example, like a method name for a `CallStaticJava`. >> >> This is achieved by emitting a new property to `CountedLoopNodes` and then extending the node info filter in IGV. This will show the pre/main/post info like the other node info below the node name: >> >> ![image](https://github.com/user-attachments/assets/69abe3f4-3e97-4dfe-983a-0153057b1a8d) >> >> Thanks to @robcasloz for helping me with the IGV filter changes! >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - add new line > - remove empty line Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21788#pullrequestreview-2405965883 From kvn at openjdk.org Wed Oct 30 21:10:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Oct 2024 21:10:52 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 00:40:22 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: > > - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 > > - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. > > - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. > > After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. > > The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. > > --------- > > ### Tests > > Win, Mac & Linux tier1-4 on x64 & Aarch64. Seems reasonable. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21778#pullrequestreview-2406320378 From sparasa at openjdk.org Wed Oct 30 21:58:34 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 30 Oct 2024 21:58:34 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support In-Reply-To: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 09:47:05 GMT, Jatin Bhateja wrote: > I think we should first check-in extended gtest asm validation script detecting these issues either before or along with this patch. Please see the latest PR submitted for updated test generation tool to handle APX NDD/NF instructions: https://github.com/openjdk/jdk/pull/21795 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21770#issuecomment-2448497688 From sparasa at openjdk.org Wed Oct 30 22:14:32 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 30 Oct 2024 22:14:32 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support In-Reply-To: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 07:46:26 GMT, Jatin Bhateja wrote: >> The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > src/hotspot/cpu/x86/assembler_x86.cpp line 2632: > >> 2630: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 2631: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); >> 2632: evex_prefix_nf(src, 0, dst->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); > > Could you also replace VEX_OPCODE_OF_3C with the standard naming convention of VEX_OPCODE_MAP4? > I added /*MAP4*/ in the comments after the prefix for the setzuCC instruction, but it's better to make this change consistently in all places. Hi Jatin, If I understand correctly, are you suggesting that I add a comment in front like `/* MAP4 */VEX_OPCODE_0F_3C` for all occurrences of VEX_OPCODE_OF_3C in this PR? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1823509217 From jkarthikeyan at openjdk.org Thu Oct 31 02:22:41 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 31 Oct 2024 02:22:41 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: References: Message-ID: <6ABTGpRWisFfAgR9R6gCqxJMasj8pEYnMRsXCIes9Tc=.b3495a73-aacc-4b7e-9f3a-1e0428cc539a@github.com> On Wed, 30 Oct 2024 17:22:42 GMT, Quan Anh Mai wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into phase-lowering >> - Remove platform-dependent node definitions, rework PhaseLowering implementation >> - Address some changes from code review >> - Implement PhaseLowering > > src/hotspot/share/opto/phaseX.cpp line 2273: > >> 2271: >> 2272: Node* PhaseLowering::apply_ideal(Node* k, bool can_reshape) { >> 2273: // Run the lowered Ideal method to continue doing transformations on the node, while avoiding existing transforms > > Can you call `lower(k)` here and as a result, you can simply do `lower.optimize()` as the main entry? I would prefer to keep it as-is because `PhaseIterGVN::optimize` does a lot of logic that may not be relevant here (such as IGVN verification and IGV printing). This way we can avoid changes to IGVN in the future accidentally impacting lowering in unexpected ways. > src/hotspot/share/opto/phaseX.cpp line 2289: > >> 2287: _worklist.ensure_empty(); >> 2288: >> 2289: C->identify_useful_nodes(_worklist); > > To address @iwanowww 's concern, you can have a backend-specific method `PhaseLowering::do_lowering()` that will decide whether we should perform lowering on a particular graph. I think if we're planning on moving optimizations like `DivMod` to the platform-independent part of this phase we may end up processing nodes more generally, since the graph patterns we expect would be pretty wide and shared between multiple backends. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1823694036 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1823694120 From qamai at openjdk.org Thu Oct 31 02:45:34 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 31 Oct 2024 02:45:34 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: <6ABTGpRWisFfAgR9R6gCqxJMasj8pEYnMRsXCIes9Tc=.b3495a73-aacc-4b7e-9f3a-1e0428cc539a@github.com> References: <6ABTGpRWisFfAgR9R6gCqxJMasj8pEYnMRsXCIes9Tc=.b3495a73-aacc-4b7e-9f3a-1e0428cc539a@github.com> Message-ID: On Thu, 31 Oct 2024 02:19:50 GMT, Jasmine Karthikeyan wrote: >> src/hotspot/share/opto/phaseX.cpp line 2289: >> >>> 2287: _worklist.ensure_empty(); >>> 2288: >>> 2289: C->identify_useful_nodes(_worklist); >> >> To address @iwanowww 's concern, you can have a backend-specific method `PhaseLowering::do_lowering()` that will decide whether we should perform lowering on a particular graph. > > I think if we're planning on moving optimizations like `DivMod` to the platform-independent part of this phase we may end up processing nodes more generally, since the graph patterns we expect would be pretty wide and shared between multiple backends. What do you think? But there are still backends where we don't do such optimization, so it is still worth it to have a predicate for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1823707056 From qamai at openjdk.org Thu Oct 31 02:50:31 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 31 Oct 2024 02:50:31 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: <6ABTGpRWisFfAgR9R6gCqxJMasj8pEYnMRsXCIes9Tc=.b3495a73-aacc-4b7e-9f3a-1e0428cc539a@github.com> References: <6ABTGpRWisFfAgR9R6gCqxJMasj8pEYnMRsXCIes9Tc=.b3495a73-aacc-4b7e-9f3a-1e0428cc539a@github.com> Message-ID: <3kQ-4gSCJWVed41_y2EvHcqxX1tDLYSTGeBL_QTfPn8=.55f7ce6e-e209-465a-97af-257770e13a65@github.com> On Thu, 31 Oct 2024 02:19:44 GMT, Jasmine Karthikeyan wrote: >> src/hotspot/share/opto/phaseX.cpp line 2273: >> >>> 2271: >>> 2272: Node* PhaseLowering::apply_ideal(Node* k, bool can_reshape) { >>> 2273: // Run the lowered Ideal method to continue doing transformations on the node, while avoiding existing transforms >> >> Can you call `lower(k)` here and as a result, you can simply do `lower.optimize()` as the main entry? > > I would prefer to keep it as-is because `PhaseIterGVN::optimize` does a lot of logic that may not be relevant here (such as IGVN verification and IGV printing). This way we can avoid changes to IGVN in the future accidentally impacting lowering in unexpected ways. I actually think it is a good idea to have verification and printing. Since Lowering does IGVN-like transformations, they should behave in generally the same way. If it turns out that we actually need a separate entry then we can create it then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1823709719 From jkarthikeyan at openjdk.org Thu Oct 31 03:43:30 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 31 Oct 2024 03:43:30 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 08:50:33 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Suggestions from review > > src/hotspot/share/opto/addnode.cpp line 1268: > >> 1266: Node* MaxINode::Identity(PhaseGVN* phase) { >> 1267: const TypeInt* t1 = phase->type(in(1))->is_int(); >> 1268: const TypeInt* t2 = phase->type(in(2))->is_int(); > > Could any input be `TOP`? I think we can't encounter `TOP` here because we run `Value()` before `Identity()`, so if TOP is returned by Value() the idealization process exits to return the top node before running Identity(). > test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 35: > >> 33: * @summary Test identities of MinNodes and MaxNodes. >> 34: * @key randomness >> 35: * @requires (os.simpleArch == "x64" & vm.cpu.features ~= ".*avx.*") | os.arch == "aarch64" | os.arch == "riscv64" > > Is there a chance we can add these `requires` to the `@IR` rules instead? That way we can still do the result verification on all other platforms, which could be valuable on its own. >From my understanding this isn't possible as-is since CPU features seem to be checked regardless of whether the architecture supports it or not, so we can't simply check for AVX because that would fail on aarch64 and riscv64. I think we could work around this with `applyIfCPUFeatureOr = {"avx", "true", "asimd", "true", "rvv", "true"}` to force a check for all 3 platforms but it'd be filtering more platforms than strictly necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1823736920 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1823736946 From thartmann at openjdk.org Thu Oct 31 07:00:44 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 31 Oct 2024 07:00:44 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v32] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:36:50 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Review resolutions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Factor out IR tests and Transforms to follow-up PRs. > - Replacing flag based checks with CPU feature checks in IR validation test. > - Remove Saturating IRNode patterns. > - Restrict IR validation to newly added UMin/UMax transforms. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Prod build fix > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - New IR tests + additional IR transformations > - ... and 27 more: https://git.openjdk.org/jdk/compare/158b93d1...0e10139c This caused a regression: [JDK-8343246](https://bugs.openjdk.org/browse/JDK-8343246) Jatin, could you please have a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2449161173 From thartmann at openjdk.org Thu Oct 31 07:23:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 31 Oct 2024 07:23:29 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" [v3] In-Reply-To: References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> Message-ID: On Wed, 30 Oct 2024 17:42:30 GMT, Cesar Soares Lucas wrote: >> Please, review this patch to fix an issue that may occur when serializing debug information related to reduce allocation merges. The problem happens when there are more than one JVMS in a `uncommon_trap` and a _younger_ JVMS doesn't have the RAM inputs as a local/expression/monitor but an older JVMS does. In that situation the loop at line 1173 of output.cpp will set the `is_root` property of the ObjectValue to `false` when processing the younger JVMS even though it may have been set to `true` when visiting the older JVMS. >> >> Tested on: >> - Win, Mac & Linux tier1-4 on x64 & Aarch64. >> - CTW with some thousands of jars. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > fix indentation Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21624#pullrequestreview-2407108600 From thartmann at openjdk.org Thu Oct 31 07:27:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 31 Oct 2024 07:27:28 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 00:40:22 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: > > - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 > > - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. > > - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. > > After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. > > The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. > > --------- > > ### Tests > > Win, Mac & Linux tier1-4 on x64 & Aarch64. test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationAndNonReduciblePhi.java line 31: > 29: * its SR inputs is flagged as NSR. > 30: * @run main/othervm compiler.escapeAnalysis.TestReduceAllocationAndNonReduciblePhi > 31: * @run main compiler.escapeAnalysis.TestReduceAllocationAndNonReduciblePhi Why do you need both? Should `@run main/othervm` add `-Xbatch`? test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationAndNonReduciblePhi.java line 40: > 38: int result = 0; > 39: > 40: for (int i=0; i<20000; i++) { Suggestion: for (int i = 0; i < 20000; i++) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21778#discussion_r1823930666 PR Review Comment: https://git.openjdk.org/jdk/pull/21778#discussion_r1823933985 From epeter at openjdk.org Thu Oct 31 07:33:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 31 Oct 2024 07:33:36 GMT Subject: RFR: 8342958: Use jvmArgs consistently in microbenchmarks In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:52:57 GMT, Claes Redestad wrote: > Many OpenJDK micros use `@Fork(jvmArgs/-Append/-Prepend)` to add JVM reasonable or necessary flags, but when deploying and running micros we often want to add or replace flags to tune to the machine, test different GCs, etc. The inconsistent use of the different `jvmArgs` options make it error prone, and we've had a few recent cases where we've not been testing with the expected set of flags. > > This PR suggests using `jvmArgs` consistently. I think this aligns with the intuition that when you use `jvmArgsAppend/-Prepend` intent is to add to a set of existing flags, while if you supply `jvmArgs` intent is "run with these and nothing else". Perhaps there are other opinions/preferences, and I don't feel strongly about which to consolidate to as long as we do so consistently. One argument could be made to consolidate on `jvmArgsAppend` since that one is (likely accidentally) the current most popular (143 compared to 59 `jvmArgsPrepend` and 18 `jvmArgs`). Hi Claes, I'm having some strange results with your JMH PR: https://github.com/openjdk/jdk/pull/21683 And my benchmark: https://github.com/openjdk/jdk/blame/master/test/micro/org/openjdk/bench/vm/compiler/VectorLoadToStoreForwarding.java Running it like this: `make test TEST="micro:vm.compiler.VectorLoadToStoreForwarding.*benchmark_03" CONF=linux-x64` And I'm getting this result: Benchmark (SIZE) (seed) Mode Cnt Score Error Units VectorLoadToStoreForwarding.VectorLoadToStoreForwardingNoSuperWord.benchmark_03 2048 0 avgt 10 4563.110 ? 64.988 ns/op VectorLoadToStoreForwarding.VectorLoadToStoreForwardingSuperWord.benchmark_03 2048 0 avgt 10 4549.337 ? 32.239 ns/op But when I revert your change, back to `jvmArgsPrepend` : Benchmark (SIZE) (seed) Mode Cnt Score Error Units VectorLoadToStoreForwarding.VectorLoadToStoreForwardingNoSuperWord.benchmark_03 2048 0 avgt 10 1040.538 ? 14.660 ns/op VectorLoadToStoreForwarding.VectorLoadToStoreForwardingSuperWord.benchmark_03 2048 0 avgt 10 4533.227 ? 5.161 ns/op It seems that with your PR, the flags don't have the intended effect any more! With your change, I see this: `# VM options: -Djava.library.path=/oracle-work/jdk-fork6/build/linux-x64/images/test/micro/native` And reverting back, I see this: `# VM options: -XX:-UseSuperWord -Djava.library.path=/oracle-work/jdk-fork6/build/linux-x64/images/test/micro/native` ------------- PR Comment: https://git.openjdk.org/jdk/pull/21683#issuecomment-2449207896 From rehn at openjdk.org Thu Oct 31 07:36:27 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 31 Oct 2024 07:36:27 GMT Subject: RFR: 8343122: RISC-V: C2: Small improvement for real runtime callouts In-Reply-To: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> References: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> Message-ID: On Mon, 28 Oct 2024 04:39:17 GMT, Fei Yang wrote: > Hi, please review this small improvement. > > Currently, we do 11 instructions for real C2 runtime callouts (See riscv_enc_java_to_runtime). > Seems we can materialize the pointer faster with `movptr2`, which will reduce 2 instructions. > But we will need to reorder the original calling sequence a bit to make `t0` available for `movptr2`. > > Testing on linux-riscv64 platform: > - [x] tier1-tier3 (release) > - [x] hotspot:tier1 (fastdebug) Nice! But I don't see the code change for this comment change: - // sd(t1, Address(sp, wordSize)) -> sd + // sd(t0, Address(sp, wordSize)) -> sd What am I missing ? EDIT: Sorry ignore above, early morning... Marked as reviewed by rehn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21733#pullrequestreview-2407190032 PR Review: https://git.openjdk.org/jdk/pull/21733#pullrequestreview-2407200564 From thartmann at openjdk.org Thu Oct 31 07:43:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 31 Oct 2024 07:43:31 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 10:54:31 GMT, Damon Fenacci wrote: > # Issue > > The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. > > On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). > > If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). > > # Solution > > The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. > So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: > * when 1GB huge pages are supported and can be allocated correctly > * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 120: > 118: // 1GB large pages configured but none available > 119: "Failed to reserve and commit memory with given page size\\. " + > 120: "req_addr: [^ ]+ size: 1[gG], page size: 1[gG], \\(errno = 12\\)"); Took me a while to figure that these are `OR` matches due to the `|` hiding at the end of the first line. Would it make sense to update the comment to something like this? // 1GB large pages configured and available "CodeCache:\\s+min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]|" + // or 1GB large pages configured but none available ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21757#discussion_r1823994543 From thartmann at openjdk.org Thu Oct 31 07:43:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 31 Oct 2024 07:43:31 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 07:39:03 GMT, Tobias Hartmann wrote: >> # Issue >> >> The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. >> >> On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). >> >> If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). >> >> # Solution >> >> The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. >> So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: >> * when 1GB huge pages are supported and can be allocated correctly >> * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). > > test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 120: > >> 118: // 1GB large pages configured but none available >> 119: "Failed to reserve and commit memory with given page size\\. " + >> 120: "req_addr: [^ ]+ size: 1[gG], page size: 1[gG], \\(errno = 12\\)"); > > Took me a while to figure that these are `OR` matches due to the `|` hiding at the end of the first line. Would it make sense to update the comment to something like this? > > // 1GB large pages configured and available > "CodeCache:\\s+min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]|" + > // or 1GB large pages configured but none available Also, isn't there a `CodeCache:\` line in the output in the failing case as well that should be added here in the OR part? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21757#discussion_r1823999523 From dlunden at openjdk.org Thu Oct 31 07:49:38 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 31 Oct 2024 07:49:38 GMT Subject: RFR: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 [v5] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 12:25:51 GMT, Daniel Lund?n wrote: >> Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. >> >> ### Changeset >> >> - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. >> - Add a regression test. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/adlc/formsopt.cpp > > Co-authored-by: Christian Hagedorn Thanks everyone! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21612#issuecomment-2449242393 From dlunden at openjdk.org Thu Oct 31 07:49:39 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 31 Oct 2024 07:49:39 GMT Subject: Integrated: 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 In-Reply-To: References: Message-ID: <1EXFkcw7hdj0v9pm5h2TstvxZl6HVOGRNH7klsz8Jg0=.630f96a9-e087-46b0-bf97-a102b9d53c2c@github.com> On Mon, 21 Oct 2024 14:05:54 GMT, Daniel Lund?n wrote: > Adding C2 register allocation support for APX EGPRs ([JDK-8329032](https://bugs.openjdk.org/browse/JDK-8329032)) reduced, due to unfortunate rounding in the register mask size computation, the available space for incoming/outgoing method arguments in register masks. > > ### Changeset > > - Bump the number of 32-bit words dedicated to incoming/outgoing arguments in register masks from 3 to 4. > - Add a regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11436050131) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - C2 compilation time benchmarking for DaCapo on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. There is no observable difference in C2 compilation time. This pull request has now been integrated. Changeset: 388d44fb Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/388d44fbf0126f253860edc88c2efd57f86e5a2b Stats: 54 lines in 2 files changed: 51 ins; 0 del; 3 mod 8342156: C2: Compilation failure with fewer arguments after JDK-8329032 Co-authored-by: Christian Hagedorn Reviewed-by: rcastanedalo, chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21612 From rehn at openjdk.org Thu Oct 31 08:10:28 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 31 Oct 2024 08:10:28 GMT Subject: RFR: 8343121: RISC-V: More accurate max size for C2SafepointPollStub and C2EntryBarrierStub In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 04:09:57 GMT, Fei Yang wrote: > Hi, please review this small change. > > The current max size these two stubs is a bit overestimated and thus is more than needed. > Since `la`, `far_call` and `far_jump` assember routines used by these two stubs will always > emit 2 instructions for address inside the code cache, we can make the max size more accurate. > > Testing on linux-riscv64 platform: > - [x] tier1-tier3 (release) > - [x] hotspot:tier1 (fastdebug) Seems fine, thanks. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21732#pullrequestreview-2407309954 From redestad at openjdk.org Thu Oct 31 08:54:33 2024 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 31 Oct 2024 08:54:33 GMT Subject: RFR: 8342958: Use jvmArgs consistently in microbenchmarks In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:52:57 GMT, Claes Redestad wrote: > Many OpenJDK micros use `@Fork(jvmArgs/-Append/-Prepend)` to add JVM reasonable or necessary flags, but when deploying and running micros we often want to add or replace flags to tune to the machine, test different GCs, etc. The inconsistent use of the different `jvmArgs` options make it error prone, and we've had a few recent cases where we've not been testing with the expected set of flags. > > This PR suggests using `jvmArgs` consistently. I think this aligns with the intuition that when you use `jvmArgsAppend/-Prepend` intent is to add to a set of existing flags, while if you supply `jvmArgs` intent is "run with these and nothing else". Perhaps there are other opinions/preferences, and I don't feel strongly about which to consolidate to as long as we do so consistently. One argument could be made to consolidate on `jvmArgsAppend` since that one is (likely accidentally) the current most popular (143 compared to 59 `jvmArgsPrepend` and 18 `jvmArgs`). An oversight, but I forgot to change the `RunTests.gmk`. Filed https://bugs.openjdk.org/browse/JDK-8343345 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21683#issuecomment-2449345803 From epeter at openjdk.org Thu Oct 31 09:00:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 31 Oct 2024 09:00:34 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 03:41:18 GMT, Jasmine Karthikeyan wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 35: >> >>> 33: * @summary Test identities of MinNodes and MaxNodes. >>> 34: * @key randomness >>> 35: * @requires (os.simpleArch == "x64" & vm.cpu.features ~= ".*avx.*") | os.arch == "aarch64" | os.arch == "riscv64" >> >> Is there a chance we can add these `requires` to the `@IR` rules instead? That way we can still do the result verification on all other platforms, which could be valuable on its own. > > From my understanding this isn't possible as-is since CPU features seem to be checked regardless of whether the architecture supports it or not, so we can't simply check for AVX because that would fail on aarch64 and riscv64. I think we could work around this with `applyIfCPUFeatureOr = {"avx", "true", "asimd", "true", "rvv", "true"}` to force a check for all 3 platforms but it'd be filtering more platforms than strictly necessary. Which platforms would be filtered "more platforms than strictly necessary"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1824102379 From epeter at openjdk.org Thu Oct 31 09:13:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 31 Oct 2024 09:13:36 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> Message-ID: On Tue, 29 Oct 2024 18:29:04 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix distance assert FYI: I ran performance testing, with no significant changes detected. That was expected (this is a niche optimization, that probably does not feature prominently in the benchmarks - especially Unsafe stores are not that prevalent). Still, this change is justified by my added micro-benchmarks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2449379770 From chagedorn at openjdk.org Thu Oct 31 09:20:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 31 Oct 2024 09:20:34 GMT Subject: RFR: 8343296: IGV: Show pre/main/post at CountedLoopNodes [v2] In-Reply-To: <1XQshU5s-sTqVK-9uKf_TNBJ8l3YKepyKj0YJZXmLF8=.fee84a26-68fc-4ed8-80e5-aad2a3c5dbef@github.com> References: <1XQshU5s-sTqVK-9uKf_TNBJ8l3YKepyKj0YJZXmLF8=.fee84a26-68fc-4ed8-80e5-aad2a3c5dbef@github.com> Message-ID: On Wed, 30 Oct 2024 15:38:57 GMT, Christian Hagedorn wrote: >> It's sometimes difficult to keep track of which counted loops are pre, main or post loops. This patch adds this property to `CountedLoops`, similarly to the other node info that we print, for example, like a method name for a `CallStaticJava`. >> >> This is achieved by emitting a new property to `CountedLoopNodes` and then extending the node info filter in IGV. This will show the pre/main/post info like the other node info below the node name: >> >> ![image](https://github.com/user-attachments/assets/69abe3f4-3e97-4dfe-983a-0153057b1a8d) >> >> Thanks to @robcasloz for helping me with the IGV filter changes! >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - add new line > - remove empty line Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21788#issuecomment-2449389100 From chagedorn at openjdk.org Thu Oct 31 09:20:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 31 Oct 2024 09:20:35 GMT Subject: Integrated: 8343296: IGV: Show pre/main/post at CountedLoopNodes In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 13:36:53 GMT, Christian Hagedorn wrote: > It's sometimes difficult to keep track of which counted loops are pre, main or post loops. This patch adds this property to `CountedLoops`, similarly to the other node info that we print, for example, like a method name for a `CallStaticJava`. > > This is achieved by emitting a new property to `CountedLoopNodes` and then extending the node info filter in IGV. This will show the pre/main/post info like the other node info below the node name: > > ![image](https://github.com/user-attachments/assets/69abe3f4-3e97-4dfe-983a-0153057b1a8d) > > Thanks to @robcasloz for helping me with the IGV filter changes! > > Thanks, > Christian This pull request has now been integrated. Changeset: c40bb762 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/c40bb7621c0e49581dac587b6900b6d281572813 Stats: 30 lines in 4 files changed: 30 ins; 0 del; 0 mod 8343296: IGV: Show pre/main/post at CountedLoopNodes Co-authored-by: Roberto Casta?eda Lozano Reviewed-by: rcastanedalo, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21788 From thartmann at openjdk.org Thu Oct 31 10:03:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 31 Oct 2024 10:03:29 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: <4_84pZqk5-pV1iTUdpf5wmVczTdHq-9-Re1qjbGU7Eo=.0fb46e18-883f-45f8-827d-567602373431@github.com> References: <4_84pZqk5-pV1iTUdpf5wmVczTdHq-9-Re1qjbGU7Eo=.0fb46e18-883f-45f8-827d-567602373431@github.com> Message-ID: On Wed, 30 Oct 2024 12:17:39 GMT, Coleen Phillimore wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Typo2 > > src/hotspot/share/opto/compile.cpp line 3498: > >> 3496: assert(false, "Interface or abstract class pointers should not be compressed"); >> 3497: } else { >> 3498: new_in2 = ConNode::make(t->make_narrowklass()); > > When I was looking through this code, I was hoping there'd be some sort of assert in the make_narrowklass function so any caller would assert but maybe you don't have that info? Right, I was hoping for that too and tried to move the assert into `TypeNarrowKlass::make`. We do have all the information there but we hit false positives in rare cases like this when `MyAbstract` does not have any subtypes at compile time (mostly with `-Xcomp`): MyAbstract obj = ...; obj.getClass(); C2 will add a dependency that will invalidate the code once a subclass is loaded and then optimizes the narrow class load from `obj` to be of constant narrow class type `MyAbstract`. The assert will trigger but we will never emit a compressed class pointer because the narrow class load + decode is folded to a non-narrow constant. We could move the assert to a later stage though. I'll give that a try. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1824187919 From thartmann at openjdk.org Thu Oct 31 10:10:47 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 31 Oct 2024 10:10:47 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v4] In-Reply-To: References: Message-ID: > @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. > > I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 > > Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 > > And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. > > I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Moved assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21784/files - new: https://git.openjdk.org/jdk/pull/21784/files/c69022b3..3da09500 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=02-03 Stats: 15 lines in 1 file changed: 8 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21784/head:pull/21784 PR: https://git.openjdk.org/jdk/pull/21784 From thartmann at openjdk.org Thu Oct 31 10:10:47 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 31 Oct 2024 10:10:47 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 11:53:40 GMT, Tobias Hartmann wrote: >> @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. >> >> I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 >> >> Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 >> >> And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. >> >> I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Typo2 I now moved the assert to when we visit all constant narrow class pointers during final graph reshaping to ensure that we never compress an interface or abstract class pointer. Thoughts? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21784#issuecomment-2449485514 From aph at openjdk.org Thu Oct 31 10:23:38 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 31 Oct 2024 10:23:38 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v3] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 15:55:24 GMT, Amit Kumar wrote: >> Add match rules for UDivI, UModI, UDivL, UModL. And also adds `dlr` and `dlgr` instruction. >> >> Tier1 test are clean for fastdebug vm; >> >> Before this patch, `compiler/c2/TestDivModNodes.java` was failing (see jbs issue) but with this patch test is passing. >> >> Without Patch: >> >> >> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units >> IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 1935.176 ? 2.191 ns/op >> IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 1934.915 ? 3.207 ns/op >> IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 1934.325 ? 1.108 ns/op >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 15 1809.782 ? 49.341 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 15 1769.326 ? 2.607 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 15 1784.053 ? 71.190 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 2026.978 ? 1.534 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 2028.039 ? 3.812 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 2437.843 ? 636.808 ns/op >> Finished running test 'micro:java.lang.IntegerDivMod' >> >> >> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units >> LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 4524.897 ? 16.566 ns/op >> LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 4373.714 ? 9.514 ns/op >> LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2018.309 ? 1.788 ns/op >> LongDivMod.testDivideUnsigned 1024 mixed avgt 15 4320.382 ? 19.055 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 15 3988.953 ? 8.770 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 15 1069.703 ? 1.525 ns/op >> LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 5589.319 ? 4.247 ns/op >> LongDivMod.testRemainderUnsigned 1024 positive avgt 15 3904.555 ? 3.191 ns/op >> LongDivMod.testRemainderUnsigned 10... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > removes extra whitespaces Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21559#pullrequestreview-2407574843 From fgao at openjdk.org Thu Oct 31 10:41:30 2024 From: fgao at openjdk.org (Fei Gao) Date: Thu, 31 Oct 2024 10:41:30 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> Message-ID: On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment for tanh LGTM! Thanks for update. ------------- Marked as reviewed by fgao (Committer). PR Review: https://git.openjdk.org/jdk/pull/21502#pullrequestreview-2407608861 From eosterlund at openjdk.org Thu Oct 31 11:04:35 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 31 Oct 2024 11:04:35 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v4] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 10:10:47 GMT, Tobias Hartmann wrote: >> @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. >> >> I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 >> >> Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 >> >> And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. >> >> I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Moved assert This makes me feel less nervous. Thanks for adding this verification. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21784#pullrequestreview-2407650716 From eastigeevich at openjdk.org Thu Oct 31 11:56:28 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 31 Oct 2024 11:56:28 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 10:54:31 GMT, Damon Fenacci wrote: > The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. https://bugs.openjdk.org/browse/JDK-8321526 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21757#issuecomment-2449669602 From chagedorn at openjdk.org Thu Oct 31 12:33:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 31 Oct 2024 12:33:01 GMT Subject: RFR: 8343380: C2: assert(iff->in(1)->is_OpaqueNotNull()) failed: must be OpaqueNotNull Message-ID: The assert added in [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) turns out to be strong as shown with the test cases. I was unsure about that in the first place when I added it here: https://github.com/openjdk/jdk/pull/21608#discussion_r1808732859 The assert was more of a best guess and just an additional guarantee that does not provide any benefit. I've found two cases where we have once an `OuterStripMinedLoopEnd` node and once a `ParsePredicate` in `ConnectionGraph::can_reduce_check_users()` which trigger the assert. How we end up with such a graph is explained in the comments at the test cases. I don't think it's worth to tweak the assert as we simply bail out afterwards anyway. I therefore propose to simply get rid of the assert again. Thanks, Christian ------------- Commit messages: - 8343380: C2: assert(iff->in(1)->is_OpaqueNotNull()) failed: must be OpaqueNotNull Changes: https://git.openjdk.org/jdk/pull/21805/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21805&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343380 Stats: 111 lines in 2 files changed: 108 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21805.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21805/head:pull/21805 PR: https://git.openjdk.org/jdk/pull/21805 From duke at openjdk.org Thu Oct 31 12:38:17 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Thu, 31 Oct 2024 12:38:17 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v6] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: Improved a comment in CompilerThread. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21285/files - new: https://git.openjdk.org/jdk/pull/21285/files/e07d4448..7e0f1a42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=04-05 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From duke at openjdk.org Thu Oct 31 12:38:17 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Thu, 31 Oct 2024 12:38:17 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v5] In-Reply-To: References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <4X5xzYDU45UnuXtoC0sEJ7dF5seNqQDJxhdNrdktsV8=.03a8e2b5-7e43-44b7-92b4-b4440dc26770@github.com> Message-ID: On Thu, 10 Oct 2024 07:29:46 GMT, Doug Simon wrote: >> Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplified C2V_BLOCK. > > src/hotspot/share/compiler/compilerThread.cpp line 58: > >> 56: >> 57: void CompilerThread::set_compiler(AbstractCompiler* c) { >> 58: /* > > The comment could be a little shorter: > > /* > * Compiler threads need to make Java upcalls to the jargraal compiler. > * Java upcalls are also needed by the InterpreterRuntime when using jargraal. > */ Resolved in https://github.com/openjdk/jdk/pull/21285/commits/7e0f1a4227f388dc8e22e6200dc026f056d26eed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21285#discussion_r1824373085 From thartmann at openjdk.org Thu Oct 31 12:51:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 31 Oct 2024 12:51:29 GMT Subject: RFR: 8343380: C2: assert(iff->in(1)->is_OpaqueNotNull()) failed: must be OpaqueNotNull In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 12:27:45 GMT, Christian Hagedorn wrote: > The assert added in [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) turns out to be too strong as shown with the test cases. I was unsure about that in the first place when I added it here: > > https://github.com/openjdk/jdk/pull/21608#discussion_r1808732859 > > The assert was more of a best guess and just an additional guarantee that does not provide any benefit. I've found two cases where we have once an `OuterStripMinedLoopEnd` node and once a `ParsePredicate` in `ConnectionGraph::can_reduce_check_users()` which trigger the assert. How we end up with such a graph is explained in the comments at the test cases. > > I don't think it's worth to tweak the assert as we simply bail out afterwards anyway. I therefore propose to simply get rid of the assert again. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21805#pullrequestreview-2407848824 From dnsimon at openjdk.org Thu Oct 31 12:52:29 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 31 Oct 2024 12:52:29 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v6] In-Reply-To: References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: On Thu, 31 Oct 2024 12:38:17 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > Improved a comment in CompilerThread. Still look good to me. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21285#pullrequestreview-2407850562 From thartmann at openjdk.org Thu Oct 31 12:55:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 31 Oct 2024 12:55:29 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v4] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 10:10:47 GMT, Tobias Hartmann wrote: >> @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. >> >> I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 >> >> Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 >> >> And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. >> >> I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Moved assert Thanks for the review Erik! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21784#issuecomment-2449773748 From chagedorn at openjdk.org Thu Oct 31 12:56:33 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 31 Oct 2024 12:56:33 GMT Subject: RFR: 8343380: C2: assert(iff->in(1)->is_OpaqueNotNull()) failed: must be OpaqueNotNull In-Reply-To: References: Message-ID: <5qoK-AZV3tMQD6XyLeJqWaepgzCx6_0wT_7Vz3aGjcM=.f58bf08f-bf23-4e75-b5e0-9b07bde9ed10@github.com> On Thu, 31 Oct 2024 12:27:45 GMT, Christian Hagedorn wrote: > The assert added in [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) turns out to be too strong as shown with the test cases. I was unsure about that in the first place when I added it here: > > https://github.com/openjdk/jdk/pull/21608#discussion_r1808732859 > > The assert was more of a best guess and just an additional guarantee that does not provide any benefit. I've found two cases where we have once an `OuterStripMinedLoopEnd` node and once a `ParsePredicate` in `ConnectionGraph::can_reduce_check_users()` which trigger the assert. How we end up with such a graph is explained in the comments at the test cases. > > I don't think it's worth to tweak the assert as we simply bail out afterwards anyway. I therefore propose to simply get rid of the assert again. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21805#issuecomment-2449776759 From eastigeevich at openjdk.org Thu Oct 31 13:18:28 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 31 Oct 2024 13:18:28 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 In-Reply-To: References: Message-ID: <0tK_3KUqMNg0R5YdFFUlxsSeYZvF57UP_U0b6wdDhG8=.084bfe90-911f-4de2-aa5f-19ed208657b4@github.com> On Thu, 31 Oct 2024 11:53:45 GMT, Evgeny Astigeevich wrote: >> # Issue >> >> The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. >> >> On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). >> >> If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). >> >> # Solution >> >> The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. >> So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: >> * when 1GB huge pages are supported and can be allocated correctly >> * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). > >> The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. > > https://bugs.openjdk.org/browse/JDK-8321526 > @eastig I noticed that you are the author of the original `testNonSegmented1GbCodeCacheWith1GbLargePages` test. Could I ask you to have a look at this change? Thanks a lot! `testDefaultCodeCacheWith1GbLargePages` and `testNonSegmented1GbCodeCacheWith1GbLargePages` should only be run if a system provides 1Gb pages. This is mentioned in their names: `...With1GbLargePages`. If there are no 1Gb pages available, the test should not be run. I suggest to check `/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >= 1`. If not, output "Skipping testDefaultCodeCacheWith1GbLargePages and testDefaultCodeCacheWith1GbLargePages, no 1Gb pages available" ------------- PR Comment: https://git.openjdk.org/jdk/pull/21757#issuecomment-2449819859 From fjiang at openjdk.org Thu Oct 31 13:32:30 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 31 Oct 2024 13:32:30 GMT Subject: RFR: 8343121: RISC-V: More accurate max size for C2SafepointPollStub and C2EntryBarrierStub In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 04:09:57 GMT, Fei Yang wrote: > Hi, please review this small change. > > The current max size these two stubs is a bit overestimated and thus is more than needed. > Since `la`, `far_call` and `far_jump` assember routines used by these two stubs will always > emit 2 instructions for address inside the code cache, we can make the max size more accurate. > > Testing on linux-riscv64 platform: > - [x] tier1-tier3 (release) > - [x] hotspot:tier1 (fastdebug) Looks good, thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/21732#pullrequestreview-2407981233 From fjiang at openjdk.org Thu Oct 31 13:44:31 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 31 Oct 2024 13:44:31 GMT Subject: RFR: 8343122: RISC-V: C2: Small improvement for real runtime callouts In-Reply-To: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> References: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> Message-ID: <_yjh59rF4YV80YvALJzTo-IcAWok-oR5xTMQKRG2gjs=.806551e3-73c9-433e-a886-f6c92d538e79@github.com> On Mon, 28 Oct 2024 04:39:17 GMT, Fei Yang wrote: > Hi, please review this small improvement. > > Currently, we do 11 instructions for real C2 runtime callouts (See riscv_enc_java_to_runtime). > Seems we can materialize the pointer faster with `movptr2`, which will help reduce 2 instructions. > But we will need to reorder the original calling sequence a bit to make `t0` available for `movptr2`. > > Testing on linux-riscv64 platform: > - [x] tier1-tier3 (release) > - [x] hotspot:tier1 (fastdebug) LGTM ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/21733#pullrequestreview-2408020114 From jkarthikeyan at openjdk.org Thu Oct 31 14:15:29 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 31 Oct 2024 14:15:29 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 08:57:28 GMT, Emanuel Peter wrote: >> From my understanding this isn't possible as-is since CPU features seem to be checked regardless of whether the architecture supports it or not, so we can't simply check for AVX because that would fail on aarch64 and riscv64. I think we could work around this with `applyIfCPUFeatureOr = {"avx", "true", "asimd", "true", "rvv", "true"}` to force a check for all 3 platforms but it'd be filtering more platforms than strictly necessary. > > Which platforms would be filtered "more platforms than strictly necessary"? With the workaround to check for CPU features on all 3 platforms, we'd be not checking the IR when `asimd = false` or `rvv = false`, but the IR check should pass with those features too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1824549814 From epeter at openjdk.org Thu Oct 31 14:50:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 31 Oct 2024 14:50:28 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 14:12:52 GMT, Jasmine Karthikeyan wrote: >> Which platforms would be filtered "more platforms than strictly necessary"? > > With the workaround to check for CPU features on all 3 platforms, we'd be not checking the IR when `asimd = false` or `rvv = false`, but the IR check should pass with those features too. I think your tags are actually available, with `applyIfPlatform`. Look at `irTestingPlatforms` in https://github.com/openjdk/jdk/blame/master/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java#L69 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1824602936 From epeter at openjdk.org Thu Oct 31 14:50:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 31 Oct 2024 14:50:29 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 14:45:22 GMT, Emanuel Peter wrote: >> With the workaround to check for CPU features on all 3 platforms, we'd be not checking the IR when `asimd = false` or `rvv = false`, but the IR check should pass with those features too. > > I think your tags are actually available, with `applyIfPlatform`. Look at `irTestingPlatforms` in https://github.com/openjdk/jdk/blame/master/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java#L69 Example: `./test/hotspot/jtreg/compiler/loopopts/superword/TestGeneralizedReductions.java: @IR(applyIfPlatform = {"riscv64", "true"},` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1824605598 From epeter at openjdk.org Thu Oct 31 14:50:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 31 Oct 2024 14:50:29 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 14:46:53 GMT, Emanuel Peter wrote: >> I think your tags are actually available, with `applyIfPlatform`. Look at `irTestingPlatforms` in https://github.com/openjdk/jdk/blame/master/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java#L69 > > Example: > `./test/hotspot/jtreg/compiler/loopopts/superword/TestGeneralizedReductions.java: @IR(applyIfPlatform = {"riscv64", "true"},` The usage of these is quite rare - usually we focus more on the CPU features, and not the platform tags ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1824606611 From jkarthikeyan at openjdk.org Thu Oct 31 15:00:30 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 31 Oct 2024 15:00:30 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: <4cQMEn8vNq0swD5ZNrgPzFfBj7Oo6Mv76llUFjzU0F4=.68c9ff72-535a-4df5-872a-809cc9d7773f@github.com> On Thu, 31 Oct 2024 14:47:29 GMT, Emanuel Peter wrote: >> Example: >> `./test/hotspot/jtreg/compiler/loopopts/superword/TestGeneralizedReductions.java: @IR(applyIfPlatform = {"riscv64", "true"},` > > The usage of these is quite rare - usually we focus more on the CPU features, and not the platform tags ;) Ah, I was meaning that with `applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}` we would still need at least `applyIfCPUFeature = {"avx", true"}` because on x86 we only make Min/MaxF and Min/MaxD with AVX. But that applyIfCPUFeature will check for AVX on aarch64 and riscv64 as well, but will fail because it's not available for those platforms. That's why I suggested the workaround of checking the other CPU features, to make the test at least run on the other platforms. It'd be nice to be able to express platform and CPU feature combinations like with `@requires`, but the use-case here is pretty niche. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1824625007 From epeter at openjdk.org Thu Oct 31 15:40:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 31 Oct 2024 15:40:34 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: <4cQMEn8vNq0swD5ZNrgPzFfBj7Oo6Mv76llUFjzU0F4=.68c9ff72-535a-4df5-872a-809cc9d7773f@github.com> References: <4cQMEn8vNq0swD5ZNrgPzFfBj7Oo6Mv76llUFjzU0F4=.68c9ff72-535a-4df5-872a-809cc9d7773f@github.com> Message-ID: On Thu, 31 Oct 2024 14:57:47 GMT, Jasmine Karthikeyan wrote: >> The usage of these is quite rare - usually we focus more on the CPU features, and not the platform tags ;) > > Ah, I was meaning that with `applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}` we would still need at least `applyIfCPUFeature = {"avx", true"}` because on x86 we only make Min/MaxF and Min/MaxD with AVX. But that applyIfCPUFeature will check for AVX on aarch64 and riscv64 as well, but will fail because it's not available for those platforms. That's why I suggested the workaround of checking the other CPU features, to make the test at least run on the other platforms. It'd be nice to be able to express platform and CPU feature combinations like with `@requires`, but the use-case here is pretty niche. First: yes, I'm fine with just using CPU features - it will IR test it on fewer platforms than maybe desired, but that is ok, I think. I guess the issue with `@IR` is that it only allows `AND` or `OR` clauses ... and not the more complicated mix of `&` and `|` from `@requires`. But in theory, you can just have multiple `@IR` rules (that simulates the OR): - One for `x64` (applyIfPlatform) and `avx` (applyIfCPUFeature) - One IR rule for each of: `aarch64` and `riscv64` (applyIfPlatform) But again, I'm ok with only checking for CPU features... it is more simple. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1824690675 From epeter at openjdk.org Thu Oct 31 15:40:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 31 Oct 2024 15:40:34 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: <4cQMEn8vNq0swD5ZNrgPzFfBj7Oo6Mv76llUFjzU0F4=.68c9ff72-535a-4df5-872a-809cc9d7773f@github.com> Message-ID: On Thu, 31 Oct 2024 15:37:03 GMT, Emanuel Peter wrote: >> Ah, I was meaning that with `applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}` we would still need at least `applyIfCPUFeature = {"avx", true"}` because on x86 we only make Min/MaxF and Min/MaxD with AVX. But that applyIfCPUFeature will check for AVX on aarch64 and riscv64 as well, but will fail because it's not available for those platforms. That's why I suggested the workaround of checking the other CPU features, to make the test at least run on the other platforms. It'd be nice to be able to express platform and CPU feature combinations like with `@requires`, but the use-case here is pretty niche. > > First: yes, I'm fine with just using CPU features - it will IR test it on fewer platforms than maybe desired, but that is ok, I think. > > I guess the issue with `@IR` is that it only allows `AND` or `OR` clauses ... and not the more complicated mix of `&` and `|` from `@requires`. But in theory, you can just have multiple `@IR` rules (that simulates the OR): > - One for `x64` (applyIfPlatform) and `avx` (applyIfCPUFeature) > - One IR rule for each of: `aarch64` and `riscv64` (applyIfPlatform) > > But again, I'm ok with only checking for CPU features... it is more simple. @chhagedorn you may be interested in this conversation as well ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1824691185 From duke at openjdk.org Thu Oct 31 15:54:33 2024 From: duke at openjdk.org (duke) Date: Thu, 31 Oct 2024 15:54:33 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" [v3] In-Reply-To: References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> Message-ID: On Wed, 30 Oct 2024 17:42:30 GMT, Cesar Soares Lucas wrote: >> Please, review this patch to fix an issue that may occur when serializing debug information related to reduce allocation merges. The problem happens when there are more than one JVMS in a `uncommon_trap` and a _younger_ JVMS doesn't have the RAM inputs as a local/expression/monitor but an older JVMS does. In that situation the loop at line 1173 of output.cpp will set the `is_root` property of the ObjectValue to `false` when processing the younger JVMS even though it may have been set to `true` when visiting the older JVMS. >> >> Tested on: >> - Win, Mac & Linux tier1-4 on x64 & Aarch64. >> - CTW with some thousands of jars. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > fix indentation @JohnTortugo Your change (at version 818b09e6d2c74a0b08981a5bb92a36792f8ab743) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21624#issuecomment-2450225690 From cslucas at openjdk.org Thu Oct 31 15:57:46 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 31 Oct 2024 15:57:46 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" [v2] In-Reply-To: References: Message-ID: > Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: > > - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 > > - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. > > - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. > > After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. > > The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. > > --------- > > ### Tests > > Win, Mac & Linux tier1-4 on x64 & Aarch64. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationAndNonReduciblePhi.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21778/files - new: https://git.openjdk.org/jdk/pull/21778/files/0361f272..d549e0de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21778&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21778&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21778/head:pull/21778 PR: https://git.openjdk.org/jdk/pull/21778 From lmesnik at openjdk.org Thu Oct 31 16:13:53 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 31 Oct 2024 16:13:53 GMT Subject: RFR: 8343173: Remove ZGC-specific non-JVMCI test groups [v2] In-Reply-To: References: Message-ID: > The JVMCI should be supported by all GCs and specific > hotspot_compiler_all_gcs > group is not needed anymore. > > There are few failures of JVMCI tests with ZGC happened, the bug > https://bugs.openjdk.org/browse/JDK-8343233 > is filed and corresponding tests are problemlisted. Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - typo fixed - Merge branch 'master' of https://github.com/openjdk/jdk into 8343173 - 8343173: Remove ZGC-specific non-JVMCI test groups ------------- Changes: https://git.openjdk.org/jdk/pull/21774/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21774&range=01 Stats: 12 lines in 2 files changed: 8 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21774/head:pull/21774 PR: https://git.openjdk.org/jdk/pull/21774 From jkarthikeyan at openjdk.org Thu Oct 31 16:53:57 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 31 Oct 2024 16:53:57 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v3] In-Reply-To: References: Message-ID: <3A-W4pcQj_I0QNWlUU3qibf6SQbNnZyO1JxeH1ym9Lw=.d343a0a6-10f4-4a3a-89fc-06e4cef04d02@github.com> > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Add platform checks to IR - Merge branch 'master' into minmax_identities - Suggestions from review - Min/Max identities ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21439/files - new: https://git.openjdk.org/jdk/pull/21439/files/b4b96143..39f7d047 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=01-02 Stats: 292091 lines in 2825 files changed: 243748 ins; 34186 del; 14157 mod Patch: https://git.openjdk.org/jdk/pull/21439.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21439/head:pull/21439 PR: https://git.openjdk.org/jdk/pull/21439 From jkarthikeyan at openjdk.org Thu Oct 31 16:53:58 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 31 Oct 2024 16:53:58 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: <4cQMEn8vNq0swD5ZNrgPzFfBj7Oo6Mv76llUFjzU0F4=.68c9ff72-535a-4df5-872a-809cc9d7773f@github.com> Message-ID: On Thu, 31 Oct 2024 15:37:24 GMT, Emanuel Peter wrote: >> First: yes, I'm fine with just using CPU features - it will IR test it on fewer platforms than maybe desired, but that is ok, I think. >> >> I guess the issue with `@IR` is that it only allows `AND` or `OR` clauses ... and not the more complicated mix of `&` and `|` from `@requires`. But in theory, you can just have multiple `@IR` rules (that simulates the OR): >> - One for `x64` (applyIfPlatform) and `avx` (applyIfCPUFeature) >> - One IR rule for each of: `aarch64` and `riscv64` (applyIfPlatform) >> >> But again, I'm ok with only checking for CPU features... it is more simple. > > @chhagedorn you may be interested in this conversation as well ;) I hadn't considered using multiple IR checks, that would definitely work as well. I've pushed a commit that just uses the CPU features, to keep it simple. Let me know what you think! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1824820822 From mdoerr at openjdk.org Thu Oct 31 17:08:40 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 31 Oct 2024 17:08:40 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory Message-ID: This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. ------------- Commit messages: - 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory Changes: https://git.openjdk.org/jdk/pull/21812/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21812&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343205 Stats: 20 lines in 1 file changed: 14 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21812/head:pull/21812 PR: https://git.openjdk.org/jdk/pull/21812 From kvn at openjdk.org Thu Oct 31 17:14:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 31 Oct 2024 17:14:37 GMT Subject: RFR: 8335977: Deoptimization fails with assert "object should be reallocated already" [v3] In-Reply-To: References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> Message-ID: On Wed, 30 Oct 2024 17:42:30 GMT, Cesar Soares Lucas wrote: >> Please, review this patch to fix an issue that may occur when serializing debug information related to reduce allocation merges. The problem happens when there are more than one JVMS in a `uncommon_trap` and a _younger_ JVMS doesn't have the RAM inputs as a local/expression/monitor but an older JVMS does. In that situation the loop at line 1173 of output.cpp will set the `is_root` property of the ObjectValue to `false` when processing the younger JVMS even though it may have been set to `true` when visiting the older JVMS. >> >> Tested on: >> - Win, Mac & Linux tier1-4 on x64 & Aarch64. >> - CTW with some thousands of jars. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > fix indentation Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21624#pullrequestreview-2408586719 From cslucas at openjdk.org Thu Oct 31 17:14:38 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 31 Oct 2024 17:14:38 GMT Subject: Integrated: 8335977: Deoptimization fails with assert "object should be reallocated already" In-Reply-To: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> References: <21tDaaQyPOTLo3_G4I7Iw2AZ6R5pBXZAqhvXDq_bBIQ=.a90d1e0f-f418-4ea5-b62b-339bfe6808fb@github.com> Message-ID: <-R-iudoLBHue6NjznybfpAJJ7P4P0nPtN62HsIJRkHg=.c20cacee-9148-4bcc-91fc-bedb6c5b584e@github.com> On Mon, 21 Oct 2024 20:27:10 GMT, Cesar Soares Lucas wrote: > Please, review this patch to fix an issue that may occur when serializing debug information related to reduce allocation merges. The problem happens when there are more than one JVMS in a `uncommon_trap` and a _younger_ JVMS doesn't have the RAM inputs as a local/expression/monitor but an older JVMS does. In that situation the loop at line 1173 of output.cpp will set the `is_root` property of the ObjectValue to `false` when processing the younger JVMS even though it may have been set to `true` when visiting the older JVMS. > > Tested on: > - Win, Mac & Linux tier1-4 on x64 & Aarch64. > - CTW with some thousands of jars. This pull request has now been integrated. Changeset: 7d8bd21e Author: Cesar Soares Lucas URL: https://git.openjdk.org/jdk/commit/7d8bd21eb0187647ec574abf4fac4f99c435c60b Stats: 106 lines in 2 files changed: 99 ins; 0 del; 7 mod 8335977: Deoptimization fails with assert "object should be reallocated already" Co-authored-by: Christian Hagedorn Reviewed-by: thartmann, kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/21624 From kvn at openjdk.org Thu Oct 31 17:30:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 31 Oct 2024 17:30:30 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v4] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 10:10:47 GMT, Tobias Hartmann wrote: >> @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. >> >> I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 >> >> Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 >> >> And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. >> >> I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Moved assert Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21784#pullrequestreview-2408633560 From kvn at openjdk.org Thu Oct 31 17:31:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 31 Oct 2024 17:31:31 GMT Subject: RFR: 8343380: C2: assert(iff->in(1)->is_OpaqueNotNull()) failed: must be OpaqueNotNull In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 12:27:45 GMT, Christian Hagedorn wrote: > The assert added in [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) turns out to be too strong as shown with the test cases. I was unsure about that in the first place when I added it here: > > https://github.com/openjdk/jdk/pull/21608#discussion_r1808732859 > > The assert was more of a best guess and just an additional guarantee that does not provide any benefit. I've found two cases where we have once an `OuterStripMinedLoopEnd` node and once a `ParsePredicate` in `ConnectionGraph::can_reduce_check_users()` which trigger the assert. How we end up with such a graph is explained in the comments at the test cases. > > I don't think it's worth to tweak the assert as we simply bail out afterwards anyway. I therefore propose to simply get rid of the assert again. > > Thanks, > Christian Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21805#pullrequestreview-2408636502 From kvn at openjdk.org Thu Oct 31 17:49:32 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 31 Oct 2024 17:49:32 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 17:03:33 GMT, Martin Doerr wrote: > This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. src/hotspot/share/compiler/compileBroker.cpp line 1027: > 1025: > 1026: int old_c2_count = 0, new_c2_count = 0, old_c1_count = 0, new_c1_count = 0; > 1027: const int c2_tasks_per_thread = 2, c1_tasks_per_thread = 4; Any reason to have such numbers (2 and 4)? Any experiments were done to select the best numbers? src/hotspot/share/compiler/compileBroker.cpp line 1029: > 1027: const int c2_tasks_per_thread = 2, c1_tasks_per_thread = 4; > 1028: > 1029: // Do a quick check first without taking the lock. The later ones are more expensive. Please, expend comment to explain what check is done here. src/hotspot/share/compiler/compileBroker.cpp line 1031: > 1029: // Do a quick check first without taking the lock. The later ones are more expensive. > 1030: if (_c2_compile_queue != nullptr) { > 1031: old_c2_count = _compilers[1]->num_compiler_threads(); Can you use accessors here `get_c1_thread_count()` and `get_c2_thread_count()` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21812#discussion_r1824907203 PR Review Comment: https://git.openjdk.org/jdk/pull/21812#discussion_r1824900534 PR Review Comment: https://git.openjdk.org/jdk/pull/21812#discussion_r1824899251 From coleenp at openjdk.org Thu Oct 31 18:36:29 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 31 Oct 2024 18:36:29 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v4] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 10:10:47 GMT, Tobias Hartmann wrote: >> @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. >> >> I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 >> >> Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 >> >> And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. >> >> I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Moved assert src/hotspot/share/opto/compile.cpp line 3789: > 3787: const TypePtr* tp = n->as_Type()->type()->make_ptr(); > 3788: ciKlass* klass = tp->is_klassptr()->exact_klass(); > 3789: assert(!klass->is_interface() && !klass->is_abstract(), "Interface or abstract class pointers should not be compressed"); Can you make this assert be instead: #include "oops/compressedKlass.hpp" ... assert(CompressedKlassPointers::is_encodable(klass), "should be encodable"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1825005254 From coleenp at openjdk.org Thu Oct 31 18:41:34 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 31 Oct 2024 18:41:34 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v4] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 10:10:47 GMT, Tobias Hartmann wrote: >> @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. >> >> I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 >> >> Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 >> >> And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. >> >> I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Moved assert Can you change the assert to assert that the Klass is in the encodable range? If it's not too hard to get to it from ciKlass? We were trying to not expose that these were the cases that didn't result in encoding. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21784#pullrequestreview-2408831402 From cslucas at openjdk.org Thu Oct 31 21:53:45 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 31 Oct 2024 21:53:45 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" [v3] In-Reply-To: References: Message-ID: > Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: > > - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 > > - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. > > - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. > > After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. > > The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. > > --------- > > ### Tests > > Win, Mac & Linux tier1-4 on x64 & Aarch64. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address PR feedback: include test execution options. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21778/files - new: https://git.openjdk.org/jdk/pull/21778/files/d549e0de..2449e42c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21778&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21778&range=01-02 Stats: 11 lines in 1 file changed: 10 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21778/head:pull/21778 PR: https://git.openjdk.org/jdk/pull/21778 From cslucas at openjdk.org Thu Oct 31 21:53:46 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 31 Oct 2024 21:53:46 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" [v3] In-Reply-To: References: Message-ID: <1T9CpHOAA4aeqcF8ZmZx4964Z-KMc5noaCCW8tT8NDs=.173cf905-4754-4e05-99ff-b08d5b43dc54@github.com> On Thu, 31 Oct 2024 07:23:28 GMT, Tobias Hartmann wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Address PR feedback: include test execution options. > > test/hotspot/jtreg/compiler/escapeAnalysis/TestReduceAllocationAndNonReduciblePhi.java line 31: > >> 29: * its SR inputs is flagged as NSR. >> 30: * @run main/othervm compiler.escapeAnalysis.TestReduceAllocationAndNonReduciblePhi >> 31: * @run main compiler.escapeAnalysis.TestReduceAllocationAndNonReduciblePhi > > Why do you need both? Should `@run main/othervm` add `-Xbatch`? In my experiments the issue was reproducing without any flag. Following your advice I included -Xbatch and the usual CompileCommands as well. I also kept the second "@run" command to run the method also in a no-custom-flags mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21778#discussion_r1825207077 From sparasa at openjdk.org Thu Oct 31 23:32:44 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 31 Oct 2024 23:32:44 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v2] In-Reply-To: References: Message-ID: <6WsHoJBMzT_7J5Cq9QfSoACxr9gUTw2Aycgiu_OPcO0=.fff3e702-2b9d-4ed5-a4d5-7515f7b9e44a@github.com> > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21770/files - new: https://git.openjdk.org/jdk/pull/21770/files/963bdc08..9d8f4193 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=00-01 Stats: 41 lines in 1 file changed: 40 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21770/head:pull/21770 PR: https://git.openjdk.org/jdk/pull/21770 From sparasa at openjdk.org Thu Oct 31 23:46:42 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 31 Oct 2024 23:46:42 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v3] In-Reply-To: References: Message-ID: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add missing comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21770/files - new: https://git.openjdk.org/jdk/pull/21770/files/9d8f4193..5049d3aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=01-02 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21770/head:pull/21770 PR: https://git.openjdk.org/jdk/pull/21770