From kxu at openjdk.org Tue Oct 1 02:12:18 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 1 Oct 2024 02:12:18 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: - extract pattern matching to separate functions - WIP: extract pattern matching to separate functions - WIP: refactor as suggested by review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/0de4feea..6e65e13f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=05-06 Stats: 171 lines in 2 files changed: 44 ins; 54 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From kxu at openjdk.org Tue Oct 1 02:12:21 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 1 Oct 2024 02:12:21 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v6] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 13:33:52 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: >> >> - resolve conflicts >> - resolve conflicts >> - Arithmetic canonicalization v3 (#3) >> >> * 8340144: C1: remove unused Compilation::_max_spills >> >> Reviewed-by: thartmann, shade >> >> * 8340176: Replace usage of -noclassgc with -Xnoclassgc in test/jdk/java/lang/management/MemoryMXBean/LowMemoryTest2.java >> >> Reviewed-by: kevinw, lmesnik >> >> * 8339793: Fix incorrect APX feature enabling with -XX:-UseAPX >> >> Reviewed-by: kvn, thartmann, sviswanathan >> >> * 8340184: Bug in CompressedKlassPointers::is_in_encodable_range >> >> Reviewed-by: coleenp, rkennke, jsjolen >> >> * 8332442: C2: refactor Mod cases in Compile::final_graph_reshaping_main_switch() >> >> Reviewed-by: roland, chagedorn, jkarthikeyan >> >> * 8340119: Remove oopDesc::size_might_change() >> >> Reviewed-by: stefank, iwalulya >> >> * 8340009: Improve the output from assert_different_registers >> >> Reviewed-by: aboldtch, dholmes, shade, mli >> >> * 8340273: Remove CounterHalfLifeTime >> >> Reviewed-by: chagedorn, dholmes >> >> * 8338566: Lazy creation of exception instances is not thread safe >> >> Reviewed-by: shade, kvn, dlong >> >> * 8339648: ZGC: Division by zero in rule_major_allocation_rate >> >> Reviewed-by: aboldtch, lucy, tschatzl >> >> * 8329816: Add SLEEF version 3.6.1 >> >> Reviewed-by: erikj, mli, luhenry >> >> * 8315273: (fs) Path.toRealPath(LinkOption.NOFOLLOW_LINKS) fails when "../../" follows a link (win) >> >> Reviewed-by: djelinski >> >> * 8339574: Behavior of File.is{Directory,File,Hidden} is not documented with respect to symlinks >> >> Reviewed-by: djelinski, alanb >> >> * 8340200: Misspelled constant `AttributesProcessingOption.DROP_UNSTABLE_ATRIBUTES` >> >> Reviewed-by: liach >> >> * 8339934: Simplify Math.scalb(double) method >> >> Reviewed-by: darcy >> >> * 8339790: Support Intel APX setzucc instruction >> >> Reviewed-by: sviswanathan, jkarthikeyan, kvn >> >> * 8340323: Test jdk/classfile/OptionsTest.java fails after JDK-8340200 >> >> Reviewed-by: alanb >> >> * 8338686: App classpath mismatch if a jar from the Class-Path attribute is on the classpath >> >> Reviewed-by: dholmes, iklam >> >> * 8337563: NMT: rename MEMFLAGS to MemTag >> >> ... > > src/hotspot/share/opto/addnode.cpp line 422: > >> 420: // Convert (a + a) + a to 3 * a >> 421: // Look for LHS pattern: AddNode(a, a) >> 422: if (in1_op == Op_Add(bt) && in1->in(1) == in1->in(2)) { > > It seems each of the if blocks in this method could be its own method that returns true and `multiplier` (passed by reference, I suppose) if pattern matching succeeds. Refactored to do so. Thanks for the input! > src/hotspot/share/opto/addnode.cpp line 487: > >> 485: // AddNode(LShiftNode(a, CON1), LShiftNode(a, CON2)/a) >> 486: // AddNode(LShiftNode(a, CON1)/a, LShiftNode(a, CON2)) >> 487: for (int i = 0; i < 2; i++) { > > I wouldn't use a loop here. I would put the loop body into its own method and call it twice, once with `lhs`, `lhs_base` as arguments, once with `rhs`, `rhs_base`. I refactored even further to combine checking for optimized `mul`s and extracting multipliers to use the same logic. This code is now obsolete. > src/hotspot/share/opto/addnode.cpp line 540: > >> 538: >> 539: PhaseIterGVN* igvn = phase->is_IterGVN(); >> 540: if (igvn != nullptr) { > > Why do you need that? > I think it's fine to return a new node from Ideal. You are right. This is leftover code from last version. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782026217 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782026033 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782025208 From jbhateja at openjdk.org Tue Oct 1 05:09:25 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Oct 2024 05:09:25 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v19] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Merge stashing and re-commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/28b29bc6..952920ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=17-18 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From roland at openjdk.org Tue Oct 1 07:23:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 07:23:42 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 02:12:18 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: > > - extract pattern matching to separate functions > - WIP: extract pattern matching to separate functions > - WIP: refactor as suggested by review Thanks for making the changes. It's easier to follow the various steps the way it is now. src/hotspot/share/opto/addnode.cpp line 409: > 407: // Convert a + a + ... + a into a*n > 408: Node* AddNode::convert_serial_additions(PhaseGVN* phase, bool can_reshape, BasicType bt) { > 409: if (find_power_of_two_addition_pattern(this, bt, nullptr) != nullptr) { Can you a comment that explain the need for this (what you replied in the PR comment essentially)? src/hotspot/share/opto/addnode.cpp line 498: > 496: > 497: // swap LShiftNode to lhs for easier matching > 498: if (!lhs->is_LShift()) { Can you use `Op_LShift(bt)` here? src/hotspot/share/opto/addnode.cpp line 503: > 501: > 502: // AddNode(LShiftNode(a, CON), *)? > 503: if (!lhs->is_LShift() || !lhs->in(2)->is_Con()) { Same here. src/hotspot/share/opto/addnode.cpp line 527: > 525: > 526: // AddNode(LShiftNode(a, CON), LShiftNode(a, CON2))? > 527: if (rhs->is_LShift() && lhs->in(1) == rhs->in(1) && rhs->in(2)->is_Con()) { same here. src/hotspot/share/opto/addnode.cpp line 549: > 547: Node* AddNode::find_power_of_two_subtraction_pattern(Node* n, BasicType bt, jlong* multiplier) { > 548: // Look for pattern: SubNode(LShiftNode(a, CON), a) > 549: if (n->Opcode() == Op_Sub(bt) && n->in(1)->is_LShift() && n->in(1)->in(2)->is_Con()) { same here. ------------- PR Review: https://git.openjdk.org/jdk/pull/20754#pullrequestreview-2339315520 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782238602 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782239220 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782239478 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782239740 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1782239936 From dnsimon at openjdk.org Tue Oct 1 08:03:47 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 08:03:47 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 06:05:15 GMT, Doug Simon wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava Closing this so @tzezula can open a new one for the same issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21171#issuecomment-2385061953 From dnsimon at openjdk.org Tue Oct 1 08:03:48 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 08:03:48 GMT Subject: Withdrawn: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 22:48:00 GMT, Doug Simon wrote: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21171 From duke at openjdk.org Tue Oct 1 08:05:42 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 1 Oct 2024 08:05:42 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:47:47 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with two additional commits since the last revision: > > - num_8b_elems_in_vec --> nof_vec_elems > - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2385065561 From duke at openjdk.org Tue Oct 1 08:40:14 2024 From: duke at openjdk.org (Raphael Mosaner) Date: Tue, 1 Oct 2024 08:40:14 GMT Subject: RFR: 8337493: [JVMCI] Number of libgraal threads might be too low Message-ID: The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. ------------- Commit messages: - Use the same number of JVMCI threads as C2 threads per default. Changes: https://git.openjdk.org/jdk/pull/21279/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21279&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337493 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21279/head:pull/21279 PR: https://git.openjdk.org/jdk/pull/21279 From dnsimon at openjdk.org Tue Oct 1 08:47:39 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 08:47:39 GMT Subject: RFR: 8337493: [JVMCI] Number of libgraal threads might be too low In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 08:33:49 GMT, Raphael Mosaner wrote: > The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. > With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. > > This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. LGTM. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21279#pullrequestreview-2339514983 From roland at openjdk.org Tue Oct 1 09:42:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 09:42:37 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert In-Reply-To: <85C55cSsG9xUcs6GV3nCiq-idMJhEOywFN3atCFOO78=.53e95ff8-e9f4-4c86-a97d-5bff2cb009d3@github.com> References: <85C55cSsG9xUcs6GV3nCiq-idMJhEOywFN3atCFOO78=.53e95ff8-e9f4-4c86-a97d-5bff2cb009d3@github.com> Message-ID: <4cGhoDWjhLY3K9PY5CydEPh0mdwDn6EPVMAWQWU4U3M=.60c7cf50-32ca-4bf4-8d53-13c1ae5dabac@github.com> On Mon, 30 Sep 2024 07:02:10 GMT, Tobias Hartmann wrote: >> When converting a `LongCountedLoop` into a loop nest, c2 needs jvm >> state to add predicates to the inner loop. For that, it peels an >> iteration of the loop and uses the state of the safepoint at the end >> of the loop. That's only legal if there's no side effect between the >> safepoint and the backedge that goes back into the loop. The assert >> failure here happens in code that checks that. >> >> That code compares the memory states at the safepoint and at the >> backedge. If they are the same then there's no side effect. To check >> consistency, the `MergeMem` at the safepoint is cloned. As the logic >> iterates over the backedge state, it clears every component of the >> state it encounters from the `MergeMem`. Once done, the cloned >> `MergeMem` should be "empty". In the case of this failure, no side >> effect is found but the cloned `MergeMem` is not empty. That happens >> because of EA: it adds edges to the `MergeMem` at the safepoint that >> it doesn't add to the backedge `Phis`. >> >> So it's the verification code that fails and I propose dealing with >> this by ignoring memory state added by EA in the verification code. > > src/hotspot/share/opto/loopnode.cpp line 708: > >> 706: for (MergeMemStream mms(mem->as_MergeMem()); mms.next_non_empty(); ) { >> 707: // Loop invariant memory state won't be reset by no_side_effect_since_safepoint(). Do it here. >> 708: // Escape Analysis can add state to mm that it doesn't add to the backedge memory Phis, breaking verification > > Where exactly does that happen in EA? When an allocation is non escaping and made scalar replaceable, new slices are allocated for the fields of the allocation and the memory graph is updated so allocation/stores/loads to the new slices are connected together. In the process, `MergeMem` nodes need to be updated as well. In this case, I'm not sure this particular `MergeMem` node needs to be updated by EA but it's harmless in any case. The verification code doesn't expect "more" state to be recorded at the safepoint because of the `MergeMem` than at the backedge. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21009#discussion_r1782460709 From jbhateja at openjdk.org Tue Oct 1 09:51:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Oct 2024 09:51:27 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v14] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/42ca80c5..7327736f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=12-13 Stats: 126 lines in 4 files changed: 60 ins; 65 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Tue Oct 1 09:55:39 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Oct 2024 09:55:39 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Mon, 30 Sep 2024 22:39:09 GMT, Sandhya Viswanathan wrote: >> I think you have to do the masking before conversion - `vec.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle()` is not the same as `vec.toShuffle()` for all inputs. >> >> >> jshell> IntVector indexes = IntVector.fromArray(IntVector.SPECIES_256, new int[] {0, 1, 8, 9, 16, 17, 24, 25}, 0); >> indexes ==> [0, 1, 8, 9, 16, 17, 24, 25] >> >> jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1) >> $19 ==> [0, 1, 8, 9, 0, 1, 8, 9] >> >> jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1).toShuffle() >> $20 ==> Shuffle[0, 1, -8, -7, 0, 1, -8, -7] >> >> jshell> indexes.toShuffle() >> $21 ==> Shuffle[0, 1, -8, -7, -8, -7, -8, -7] > > Thanks for the example. Yes, you have a point there. So we would need to do: > src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); > This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); Yes, this may save additional allocation penalty of result array allocation which may slightly improve fall back performance, but logical operation cannot be directly applied over floating point vectors. so, we will need an explicit conversion to integral vector, which is why I opted for current fallback implementation which is in line with rest of the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1782480053 From duke at openjdk.org Tue Oct 1 11:02:56 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 1 Oct 2024 11:02:56 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread Message-ID: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. ------------- Commit messages: - Using tristate CompilerThread::_can_call_java. - rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava - added CompilerThreadCanCallJavaScope Changes: https://git.openjdk.org/jdk/pull/21285/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340733 Stats: 160 lines in 8 files changed: 134 ins; 2 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From duke at openjdk.org Tue Oct 1 11:14:33 2024 From: duke at openjdk.org (duke) Date: Tue, 1 Oct 2024 11:14:33 GMT Subject: RFR: 8337493: [JVMCI] Number of libgraal threads might be too low In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 08:33:49 GMT, Raphael Mosaner wrote: > The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. > With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. > > This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. @rmosaner Your change (at version 9e0a318831b5df4137104438626f22bb508cbc42) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21279#issuecomment-2385496949 From rcastanedalo at openjdk.org Tue Oct 1 11:24:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Oct 2024 11:24:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 16:56:30 GMT, Vladimir Kozlov wrote: > Good. Thanks, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2385515540 From duke at openjdk.org Tue Oct 1 11:48:39 2024 From: duke at openjdk.org (Raphael Mosaner) Date: Tue, 1 Oct 2024 11:48:39 GMT Subject: Integrated: 8337493: [JVMCI] Number of libgraal threads might be too low In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 08:33:49 GMT, Raphael Mosaner wrote: > The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. > With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. > > This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. This pull request has now been integrated. Changeset: 7cc7c080 Author: Raphael Mosaner Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/7cc7c080b5dbab61914512bf63227944697c0cbe Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8337493: [JVMCI] Number of libgraal threads might be too low Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/21279 From roland at openjdk.org Tue Oct 1 13:22:23 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 13:22:23 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: References: Message-ID: > When converting a `LongCountedLoop` into a loop nest, c2 needs jvm > state to add predicates to the inner loop. For that, it peels an > iteration of the loop and uses the state of the safepoint at the end > of the loop. That's only legal if there's no side effect between the > safepoint and the backedge that goes back into the loop. The assert > failure here happens in code that checks that. > > That code compares the memory states at the safepoint and at the > backedge. If they are the same then there's no side effect. To check > consistency, the `MergeMem` at the safepoint is cloned. As the logic > iterates over the backedge state, it clears every component of the > state it encounters from the `MergeMem`. Once done, the cloned > `MergeMem` should be "empty". In the case of this failure, no side > effect is found but the cloned `MergeMem` is not empty. That happens > because of EA: it adds edges to the `MergeMem` at the safepoint that > it doesn't add to the backedge `Phis`. > > So it's the verification code that fails and I propose dealing with > this by ignoring memory state added by EA in the verification code. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8336702 - test indentation - fix & test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21009/files - new: https://git.openjdk.org/jdk/pull/21009/files/463d6a21..a4263e28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21009&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21009&range=00-01 Stats: 193974 lines in 1550 files changed: 175338 ins; 10446 del; 8190 mod Patch: https://git.openjdk.org/jdk/pull/21009.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21009/head:pull/21009 PR: https://git.openjdk.org/jdk/pull/21009 From roland at openjdk.org Tue Oct 1 13:22:24 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 13:22:24 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: <0sw5s6nN8FKInMD7qNCuBBa4w2uK-FBV505eke63dA4=.1fc70e4e-01e1-4763-ade6-98f841f84b9f@github.com> References: <0sw5s6nN8FKInMD7qNCuBBa4w2uK-FBV505eke63dA4=.1fc70e4e-01e1-4763-ade6-98f841f84b9f@github.com> Message-ID: On Wed, 18 Sep 2024 12:05:57 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8336702 >> - test indentation >> - fix & test > > test/hotspot/jtreg/compiler/longcountedloops/TestSafePointWithEAState.java line 59: > >> 57: float n; >> 58: h(float n) { this.n = n; } >> 59: } > > Java indentation is supposed to be 4 spaces ;) > Adding some explicit brackets would also be nice, but that is more subjective. Right. Fixed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21009#discussion_r1782784463 From yzheng at openjdk.org Tue Oct 1 13:24:10 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 1 Oct 2024 13:24:10 GMT Subject: RFR: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI Message-ID: This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler ------------- Commit messages: - [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI Changes: https://git.openjdk.org/jdk/pull/21287/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21287&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341333 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21287.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21287/head:pull/21287 PR: https://git.openjdk.org/jdk/pull/21287 From roland at openjdk.org Tue Oct 1 13:36:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 13:36:22 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v3] In-Reply-To: References: Message-ID: > The list of interfaces for a `TypeInterfaces` is contained in a `GrowableArray` that's allocated in the type arena. When `hashcons()` deletes a `TypeInterfaces` object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the `GrowableArray`, not the `TypeInterfaces` object. > > This patch changes the array of interfaces stored in `TypeInterfaces` into a pointer to a `GrowableArray`. `TypeInterfaces::make` calls `hashcons` with a temporary copy of the array of interfaces allocated in the current thread's resource area. This way if `hascons` deletes the `TypeInterfaces`, it is the last thing allocated in the type arena and memory can be reclaimed. Memory for the `GrowableArray` is freed as well on return from `TypeInterfaces::make`. If the newly allocated `TypeInterfaces` survives `hashcons` then a permanent array of interfaces is allocated in the type arena and linked from the `TypeInterfaces` object. > > I ran into this issue while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. > > When I opened the PR initially, the fix I proposed was to let `GrowableArray` try to reclaim memory from an arena when destroyed. With that patch, some gtests failed when a `GrowableArray` is created by a copy constructor and the backing store for the array is shared. As a consequence, I reconsidered the fix and thought it was safer to go with a fix that only affects `TypeInterfaces`. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - comment - Merge branch 'master' into JDK-8340824 - more - more - single memory area - Revert "type interfaces footprint" This reverts commit 43e2e91c6aaf029e62760e641e207d9d17a3a943. - type interfaces footprint - Revert "fix" This reverts commit 3598dc08625269aa0a6ecff2a6903c4217b801ee. - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21163/files - new: https://git.openjdk.org/jdk/pull/21163/files/43e2e91c..de23a5a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21163&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21163&range=01-02 Stats: 32999 lines in 625 files changed: 26119 ins; 3741 del; 3139 mod Patch: https://git.openjdk.org/jdk/pull/21163.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21163/head:pull/21163 PR: https://git.openjdk.org/jdk/pull/21163 From roland at openjdk.org Tue Oct 1 13:36:23 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Oct 2024 13:36:23 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 18:51:49 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/type.cpp line 3270: >> >>> 3268: } >>> 3269: >>> 3270: const TypeInterfaces* TypeInterfaces::make(const GrowableArray* interfaces) { >> >> I think you can make `_interface` a `ciInstanceKlass**` and do this: >> >> void* ptr = Type::operator new(sizeof(TypeInterfaces) + sizeof(ciInstanceKlass*) * interfaces->length()) >> >> Then `delete ptr` should drop the whole thing. > > A `GrowableArrayFromArray` would be mostly compatible with the interface of `GrowableArray`, too. Ah! nice. I wasn't aware of `GrowableArrayFromArray`. Updated change follows your suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21163#discussion_r1782833378 From dnsimon at openjdk.org Tue Oct 1 13:56:35 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 13:56:35 GMT Subject: RFR: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:17:53 GMT, Yudi Zheng wrote: > This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler LGTM ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21287#pullrequestreview-2340426626 From yzheng at openjdk.org Tue Oct 1 14:02:46 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 1 Oct 2024 14:02:46 GMT Subject: RFR: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:17:53 GMT, Yudi Zheng wrote: > This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21287#issuecomment-2386049645 From yzheng at openjdk.org Tue Oct 1 14:02:47 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 1 Oct 2024 14:02:47 GMT Subject: Integrated: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI In-Reply-To: References: Message-ID: <5TjZNvwPLhZIj9JMOSlhDJNbZ19sA4k9hsu40hw4Glk=.05bf8bd5-5b85-4d68-a65a-73a0aa8a1f42@github.com> On Tue, 1 Oct 2024 13:17:53 GMT, Yudi Zheng wrote: > This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler This pull request has now been integrated. Changeset: 2120a841 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/2120a8414ef9c34d5875d33ac9a16594908fe403 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/21287 From mbaesken at openjdk.org Tue Oct 1 14:43:46 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 1 Oct 2024 14:43:46 GMT Subject: RFR: 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' Message-ID: When running ubsan-enabled optimized binaries on Linux x86_64, test compiler/startup/StartupOutput.java triggers this ubsan issue : jdk/src/hotspot/share/ci/ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' #0 0x7fe7443fc88d in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1660 #1 0x7fe746c22047 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 #2 0x7fe7447dd429 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 #3 0x7fe7445c614d in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 #4 0x7fe744259ceb in Runtime1::generate_blob(BufferBlob*, int, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:230 #5 0x7fe74425a273 in Runtime1::generate_blob_for(BufferBlob*, Runtime1::StubID) src/hotspot/share/c1/c1_Runtime1.cpp:259 #6 0x7fe74425a273 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:268 #7 0x7fe743fc04a1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 #8 0x7fe743fc04a1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 #9 0x7fe7446aaad7 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 #10 0x7fe7446b83cf in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 #11 0x7fe74516edca in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:758 #12 0x7fe7469d3c9a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 #13 0x7fe746048cd1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 #14 0x7fe74b1e66e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 052f7e2a0045f08cb7e7a291f8066a4b7be2521d) #15 0x7fe74aaf158e in clone (/lib64/libc.so.6+0x11858e) (BuildId: cfb059a57e69ac95d5dadab831626b3bd48a4309) ------------- Commit messages: - JDK-8340109 Changes: https://git.openjdk.org/jdk/pull/21288/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21288&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340109 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21288/head:pull/21288 PR: https://git.openjdk.org/jdk/pull/21288 From mdoerr at openjdk.org Tue Oct 1 14:54:36 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 1 Oct 2024 14:54:36 GMT Subject: RFR: 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 13:37:18 GMT, Richard Reingruber wrote: > With `-XX:+PrintInterpreter` the instructions of the generated interpreter are printed at startup. > > But also after startup when printing an interpreted frame the instructions of the corresponding `InterpreterCodelet` are also printed. This can become a problem with `-Xlog:continuations=trace` because large sections of the interpreter are printed repeatedly and redundantly. > > With this change the instructions are only printed when printing all codelets, i.e. normally once at start-up. > > I've tested with the reproducer from the JBS-Issue. LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21158#pullrequestreview-2340609133 From coleenp at openjdk.org Tue Oct 1 15:01:37 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 1 Oct 2024 15:01:37 GMT Subject: RFR: 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 13:37:18 GMT, Richard Reingruber wrote: > With `-XX:+PrintInterpreter` the instructions of the generated interpreter are printed at startup. > > But also after startup when printing an interpreted frame the instructions of the corresponding `InterpreterCodelet` are also printed. This can become a problem with `-Xlog:continuations=trace` because large sections of the interpreter are printed repeatedly and redundantly. > > With this change the instructions are only printed when printing all codelets, i.e. normally once at start-up. > > I've tested with the reproducer from the JBS-Issue. Looks fine. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21158#pullrequestreview-2340628678 From kvn at openjdk.org Tue Oct 1 15:47:36 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Oct 2024 15:47:36 GMT Subject: RFR: 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 14:37:50 GMT, Matthias Baesken wrote: > When running ubsan-enabled optimized binaries on Linux x86_64, test > compiler/startup/StartupOutput.java > triggers this ubsan issue : > > > jdk/src/hotspot/share/ci/ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fe7443fc88d in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1660 > #1 0x7fe746c22047 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fe7447dd429 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fe7445c614d in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fe744259ceb in Runtime1::generate_blob(BufferBlob*, int, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:230 > #5 0x7fe74425a273 in Runtime1::generate_blob_for(BufferBlob*, Runtime1::StubID) src/hotspot/share/c1/c1_Runtime1.cpp:259 > #6 0x7fe74425a273 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:268 > #7 0x7fe743fc04a1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fe743fc04a1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fe7446aaad7 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fe7446b83cf in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fe74516edca in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:758 > #12 0x7fe7469d3c9a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #13 0x7fe746048cd1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fe74b1e66e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 052f7e2a0045f08cb7e7a291f8066a4b7be2521d) > #15 0x7fe74aaf158e in clone (/lib64/libc.so.6+0x11858e) (BuildId: cfb059a57e69ac95d5dadab831626b3bd48a4309) Good. I would say it is trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21288#pullrequestreview-2340770636 From kvn at openjdk.org Tue Oct 1 16:04:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Oct 2024 16:04:37 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: On Tue, 1 Oct 2024 10:57:58 GMT, Tom?? Zezula wrote: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. `/compiler' part of changes is fine. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21285#pullrequestreview-2340808550 From kvn at openjdk.org Tue Oct 1 16:36:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Oct 2024 16:36:35 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 22:52:18 GMT, Dean Long wrote: >> Instead of bailout in alternative approach we can change `cha_monomorphic_target` to `nullptr` in code which is looking for it in previous lines. `target` will be used for call and we will loose a little performance when JVMTI is used instead of skipping compilation. Am I missing something? > > @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. > > Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2386474610 From rehn at openjdk.org Tue Oct 1 18:00:37 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 1 Oct 2024 18:00:37 GMT Subject: RFR: 8341146: RISC-V: Unnecessary fences used for load-acquire in template interpreter In-Reply-To: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> References: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> Message-ID: On Sun, 29 Sep 2024 10:52:25 GMT, Feilong Jiang wrote: > Hi, please consider. > > RISC-V does not currently have plain load and store opcodes with aq or rl annotations, load-acquire and > store-release operations are implemented using fences instead. Initially, we followed the RISC-V spec > and placed FENCE RW,RW fence in front of load-acquire operation when porting the template interpreter. > The purpose is to enforce a store-release-to-load-acquire ordering (where there must be a FENCE RW,RW > between the store-release and load-acquire). But it turns out these fences are unnecessary for our use > cases in the template interpreter. In fact, we only need to do a single FENCE R,RW after a normal memory > load in order to implement a load-acquire operation. We should remove those unnecessary fences for both > performance reasons and for consistency with the rest of the port (i.e., C1 and C2 JIT). > > Testing: > - [x] JCstress > - [x] hs-tier1 - hs-tier4 > - [x] ~5% improvement on SPECJbb2005 score (-Xint -XX:+UseParallelGC) Thank you! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21248#pullrequestreview-2341039591 From sviswanathan at openjdk.org Tue Oct 1 18:05:44 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Oct 2024 18:05:44 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 1 Oct 2024 09:53:02 GMT, Jatin Bhateja wrote: >> Thanks for the example. Yes, you have a point there. So we would need to do: >> src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); > >> This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); > > Yes, this may save additional allocation penalty of result array allocation which may slightly improve fall back performance, but logical operation cannot be directly applied over floating point vectors. so, we will need an explicit conversion to integral vector, which is why I opted for current fallback implementation which is in line with rest of the code. I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1783278063 From sviswanathan at openjdk.org Tue Oct 1 18:12:39 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Oct 2024 18:12:39 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v14] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <5wF2qLX9Z_tquvURMW0HVnrmMla1awxtz6C0UYI0lh4=.94df7340-1a27-4d93-80fa-d4c561641a97@github.com> On Tue, 1 Oct 2024 09:51:27 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/hotspot/share/opto/vectorIntrinsics.cpp line 2797: > 2795: > 2796: Node* operation = lowerSelectFromOp ? > 2797: LowerSelectFromTwoVectorOperation(gvn(), opd1, opd2, opd3, vt) : Thanks for bringing the lowering right here. It opens up an optimization opportunity: currently for float/double we have two casts for index (e.g. from float -> int at line 2786 and from int -> byte at line 2661 as part of LowerSelectFromTwoVectorOperation. Could this be done by one cast? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1783296741 From vlivanov at openjdk.org Tue Oct 1 21:25:35 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Oct 2024 21:25:35 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 00:35:47 GMT, Dean Long wrote: > JVMTI can add and delete methods Can you elaborate on that point, please? JVMTI spec states that redefinition/retransformation "must not add, remove or rename fields or methods" [1] [2]. [1] https://docs.oracle.com/en/java/javase/23/docs/specs/jvmti.html#RedefineClasses [2] https://docs.oracle.com/en/java/javase/23/docs/specs/jvmti.html#RetransformClasses ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2387101310 From vlivanov at openjdk.org Tue Oct 1 21:29:34 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Oct 2024 21:29:34 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 16:33:45 GMT, Vladimir Kozlov wrote: >> @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. >> >> Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. > > @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? I like @vnkozlov suggestion to null out `cha_monomorphic_target`. Moreover, the validation can be performed inside `ciMethod::find_monomorphic_target()` which is used to compute `cha_monomorphic_target`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2387105860 From kxu at openjdk.org Tue Oct 1 21:31:12 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 1 Oct 2024 21:31:12 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: update comments, use explicit opcode comparisons for LShift nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/6e65e13f..af6f8084 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=06-07 Stats: 8 lines in 1 file changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From vlivanov at openjdk.org Tue Oct 1 21:39:36 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Oct 2024 21:39:36 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 00:35:47 GMT, Dean Long wrote: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. src/hotspot/share/ci/ciMethod.cpp line 800: > 798: Method* m1 = this->get_Method(); > 799: Method* m2 = m->get_Method(); > 800: guarantee(!m1->is_private() && !m1->is_deleted(), "see usage note"); Some changes inside `ciMethod::equals` look irrelevant to checking method equality (e.g., asserting that a method is not private). Alternatively, if you decide to keep the current shape of the fix, the code can be moved closer to the use site as a helper function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1783559452 From sviswanathan at openjdk.org Tue Oct 1 22:51:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Oct 2024 22:51:43 GMT Subject: Integrated: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: 83dcb02d Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/83dcb02d776448aa04f3f41df489bd4355443a4d Stats: 697 lines in 47 files changed: 549 ins; 34 del; 114 mod 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes Reviewed-by: jbhateja, psandoz ------------- PR: https://git.openjdk.org/jdk/pull/20634 From vlivanov at openjdk.org Tue Oct 1 23:38:41 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Oct 2024 23:38:41 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v3] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:36:22 GMT, Roland Westrelin wrote: >> The list of interfaces for a `TypeInterfaces` is contained in a `GrowableArray` that's allocated in the type arena. When `hashcons()` deletes a `TypeInterfaces` object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the `GrowableArray`, not the `TypeInterfaces` object. >> >> This patch changes the array of interfaces stored in `TypeInterfaces` into a pointer to a `GrowableArray`. `TypeInterfaces::make` calls `hashcons` with a temporary copy of the array of interfaces allocated in the current thread's resource area. This way if `hascons` deletes the `TypeInterfaces`, it is the last thing allocated in the type arena and memory can be reclaimed. Memory for the `GrowableArray` is freed as well on return from `TypeInterfaces::make`. If the newly allocated `TypeInterfaces` survives `hashcons` then a permanent array of interfaces is allocated in the type arena and linked from the `TypeInterfaces` object. >> >> I ran into this issue while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. >> >> When I opened the PR initially, the fix I proposed was to let `GrowableArray` try to reclaim memory from an arena when destroyed. With that patch, some gtests failed when a `GrowableArray` is created by a copy constructor and the backing store for the array is shared. As a consequence, I reconsidered the fix and thought it was safer to go with a fix that only affects `TypeInterfaces`. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - comment > - Merge branch 'master' into JDK-8340824 > - more > - more > - single memory area > - Revert "type interfaces footprint" > > This reverts commit 43e2e91c6aaf029e62760e641e207d9d17a3a943. > - type interfaces footprint > - Revert "fix" > > This reverts commit 3598dc08625269aa0a6ecff2a6903c4217b801ee. > - fix Looks good. It feels a bit weird to see `GrowableArray` used to represent a read-only data structure, but I understand that you still benefit from some helper methods it provides. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21163#pullrequestreview-2341626070 From vlivanov at openjdk.org Tue Oct 1 23:55:49 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Oct 2024 23:55:49 GMT Subject: RFR: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access [v8] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 14:45:01 GMT, Tobias Holenstein wrote: >> We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 >> The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` >> >> We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 >> - with **base** = `147 CheckCastPP` >> - `118 ConP === 0 [[[ 106 101 71 ] #null` >> type >> >> Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. >> 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 >> >> 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` >> https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 >> and insert a null check >> https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 >> In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ >> >> https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ >> >> ### Fix (updated on 18th Sep 2024) >> T... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > add second uncast (Vladimirs suggestion) Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20033#pullrequestreview-2341636531 From tholenstein at openjdk.org Tue Oct 1 23:55:50 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 1 Oct 2024 23:55:50 GMT Subject: Integrated: 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access In-Reply-To: References: Message-ID: On Thu, 4 Jul 2024 12:17:51 GMT, Tobias Holenstein wrote: > We failed in `LibraryCallKit::inline_unsafe_access()` while trying to inline `Unsafe::getShortUnaligned`. > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/test/hotspot/jtreg/compiler/parsing/TestUnsafeArrayAccessWithNullBase.java#L86 > The reason is that base (the array) is `ConP #null` hidden behind two `CheckCastPP` with `speculative=byte[int:>=0]` > > We call `Node* adr = make_unsafe_address(base, offset, type, kind == Relaxed);` > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2361 > - with **base** = `147 CheckCastPP` > - `118 ConP === 0 [[[ 106 101 71 ] #null` > type > > Depending on the **offset** we go two different paths in `LibraryCallKit::make_unsafe_address` which both lead to the same error in the end. > 1. For `UNSAFE.getShortUnaligned(array, 1_049_000)` we get kind = `Type::AnyPtr` because `offset >= os::vm_page_size()`. Since we assume base can't be null we insert an assert: > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2111 > > 2. whereas for `UNSAFE.getShortUnaligned(array, 1)` we get kind = `Type:: OopPtr` > https://github.com/openjdk/jdk/blob/c17fa910cf3bad48547a3f0d68a30795ec3194e6/src/hotspot/share/opto/library_call.cpp#L2078 > and insert a null check > https://github.com/openjdk/jdk/blob/34c6e0deac567c0f4ed08aa2824671551d843e95/src/hotspot/share/opto/library_call.cpp#L2090 > In both cases we return call `basic_plus_adr(..)` on a base being `top()` which returns **adr** = `1 Con === 0 [[ ]] #top` > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2386 => `_gvn.type(adr)` is _top_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2394 => `adr_type` is _nullptr_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2405-L2406 => `BasicType bt` is _T_ILLEGAL_ > > https://github.com/openjdk/jdk/blob/3d5d51e228c19aa216451f647023101ae8bdbc79/src/hotspot/share/opto/library_call.cpp#L2424 => we fail here with `SIGSEGV: null pointer dereference` because `alias_type->adr_type()` is _nullptr_ > > ### Fix (updated on 18th Sep 2024) > The fix modifies the `LibraryCallKit::classify_unsafe_addr()`... This pull request has now been integrated. Changeset: 8d6d37fe Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/8d6d37fea133380d4143f5db38ad3790efa84f68 Stats: 117 lines in 3 files changed: 114 ins; 1 del; 2 mod 8320308: C2 compilation crashes in LibraryCallKit::inline_unsafe_access Reviewed-by: thartmann, kvn, vlivanov, epeter, roland ------------- PR: https://git.openjdk.org/jdk/pull/20033 From qamai at openjdk.org Wed Oct 2 01:36:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 2 Oct 2024 01:36:38 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v3] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:36:22 GMT, Roland Westrelin wrote: >> The list of interfaces for a `TypeInterfaces` is contained in a `GrowableArray` that's allocated in the type arena. When `hashcons()` deletes a `TypeInterfaces` object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the `GrowableArray`, not the `TypeInterfaces` object. >> >> This patch changes the array of interfaces stored in `TypeInterfaces` into a pointer to a `GrowableArray`. `TypeInterfaces::make` calls `hashcons` with a temporary copy of the array of interfaces allocated in the current thread's resource area. This way if `hascons` deletes the `TypeInterfaces`, it is the last thing allocated in the type arena and memory can be reclaimed. Memory for the `GrowableArray` is freed as well on return from `TypeInterfaces::make`. If the newly allocated `TypeInterfaces` survives `hashcons` then a permanent array of interfaces is allocated in the type arena and linked from the `TypeInterfaces` object. >> >> I ran into this issue while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. >> >> When I opened the PR initially, the fix I proposed was to let `GrowableArray` try to reclaim memory from an arena when destroyed. With that patch, some gtests failed when a `GrowableArray` is created by a copy constructor and the backing store for the array is shared. As a consequence, I reconsidered the fix and thought it was safer to go with a fix that only affects `TypeInterfaces`. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - comment > - Merge branch 'master' into JDK-8340824 > - more > - more > - single memory area > - Revert "type interfaces footprint" > > This reverts commit 43e2e91c6aaf029e62760e641e207d9d17a3a943. > - type interfaces footprint > - Revert "fix" > > This reverts commit 3598dc08625269aa0a6ecff2a6903c4217b801ee. > - fix Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21163#pullrequestreview-2341768112 From roland at openjdk.org Wed Oct 2 07:13:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 07:13:51 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() [v3] In-Reply-To: References: Message-ID: <9vr6Uk48dB75INt4SYSyQ-qoLfkEg4--WyWjHtI4nWc=.ff42aac4-12d0-49ed-8921-d0b34896ca6c@github.com> On Tue, 1 Oct 2024 23:35:42 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - comment >> - Merge branch 'master' into JDK-8340824 >> - more >> - more >> - single memory area >> - Revert "type interfaces footprint" >> >> This reverts commit 43e2e91c6aaf029e62760e641e207d9d17a3a943. >> - type interfaces footprint >> - Revert "fix" >> >> This reverts commit 3598dc08625269aa0a6ecff2a6903c4217b801ee. >> - fix > > Looks good. > > It feels a bit weird to see `GrowableArray` used to represent a read-only data structure, but I understand that you still benefit from some helper methods it provides. @iwanowww @merykitty thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/21163#issuecomment-2387773938 From roland at openjdk.org Wed Oct 2 07:13:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 07:13:51 GMT Subject: Integrated: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 15:53:06 GMT, Roland Westrelin wrote: > The list of interfaces for a `TypeInterfaces` is contained in a `GrowableArray` that's allocated in the type arena. When `hashcons()` deletes a `TypeInterfaces` object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the `GrowableArray`, not the `TypeInterfaces` object. > > This patch changes the array of interfaces stored in `TypeInterfaces` into a pointer to a `GrowableArray`. `TypeInterfaces::make` calls `hashcons` with a temporary copy of the array of interfaces allocated in the current thread's resource area. This way if `hascons` deletes the `TypeInterfaces`, it is the last thing allocated in the type arena and memory can be reclaimed. Memory for the `GrowableArray` is freed as well on return from `TypeInterfaces::make`. If the newly allocated `TypeInterfaces` survives `hashcons` then a permanent array of interfaces is allocated in the type arena and linked from the `TypeInterfaces` object. > > I ran into this issue while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. > > When I opened the PR initially, the fix I proposed was to let `GrowableArray` try to reclaim memory from an arena when destroyed. With that patch, some gtests failed when a `GrowableArray` is created by a copy constructor and the backing store for the array is shared. As a consequence, I reconsidered the fix and thought it was safer to go with a fix that only affects `TypeInterfaces`. This pull request has now been integrated. Changeset: 90c944fe Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/90c944fefe4a7827c08a8e6a81f137c3157a749b Stats: 89 lines in 2 files changed: 14 ins; 11 del; 64 mod 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() Reviewed-by: vlivanov, qamai ------------- PR: https://git.openjdk.org/jdk/pull/21163 From lucy at openjdk.org Wed Oct 2 07:54:39 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 2 Oct 2024 07:54:39 GMT Subject: RFR: 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 14:37:50 GMT, Matthias Baesken wrote: > When running ubsan-enabled optimized binaries on Linux x86_64, test > compiler/startup/StartupOutput.java > triggers this ubsan issue : > > > jdk/src/hotspot/share/ci/ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fe7443fc88d in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1660 > #1 0x7fe746c22047 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fe7447dd429 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fe7445c614d in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fe744259ceb in Runtime1::generate_blob(BufferBlob*, int, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:230 > #5 0x7fe74425a273 in Runtime1::generate_blob_for(BufferBlob*, Runtime1::StubID) src/hotspot/share/c1/c1_Runtime1.cpp:259 > #6 0x7fe74425a273 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:268 > #7 0x7fe743fc04a1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fe743fc04a1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fe7446aaad7 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fe7446b83cf in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fe74516edca in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:758 > #12 0x7fe7469d3c9a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #13 0x7fe746048cd1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fe74b1e66e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 052f7e2a0045f08cb7e7a291f8066a4b7be2521d) > #15 0x7fe74aaf158e in clone (/lib64/libc.so.6+0x11858e) (BuildId: cfb059a57e69ac95d5dadab831626b3bd48a4309) Looks good. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21288#pullrequestreview-2342102150 From mbaesken at openjdk.org Wed Oct 2 08:00:44 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 2 Oct 2024 08:00:44 GMT Subject: RFR: 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 14:37:50 GMT, Matthias Baesken wrote: > When running ubsan-enabled optimized binaries on Linux x86_64, test > compiler/startup/StartupOutput.java > triggers this ubsan issue : > > > jdk/src/hotspot/share/ci/ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fe7443fc88d in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1660 > #1 0x7fe746c22047 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fe7447dd429 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fe7445c614d in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fe744259ceb in Runtime1::generate_blob(BufferBlob*, int, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:230 > #5 0x7fe74425a273 in Runtime1::generate_blob_for(BufferBlob*, Runtime1::StubID) src/hotspot/share/c1/c1_Runtime1.cpp:259 > #6 0x7fe74425a273 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:268 > #7 0x7fe743fc04a1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fe743fc04a1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fe7446aaad7 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fe7446b83cf in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fe74516edca in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:758 > #12 0x7fe7469d3c9a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #13 0x7fe746048cd1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fe74b1e66e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 052f7e2a0045f08cb7e7a291f8066a4b7be2521d) > #15 0x7fe74aaf158e in clone (/lib64/libc.so.6+0x11858e) (BuildId: cfb059a57e69ac95d5dadab831626b3bd48a4309) Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21288#issuecomment-2387851841 From mbaesken at openjdk.org Wed Oct 2 08:00:44 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 2 Oct 2024 08:00:44 GMT Subject: Integrated: 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' In-Reply-To: References: Message-ID: <_qqrQZuWDvRfqPZR7hoclhiQ6HJIw4sgRxewbxefosY=.b544c413-4a4f-4d1d-a923-f9c88ce0e7a9@github.com> On Tue, 1 Oct 2024 14:37:50 GMT, Matthias Baesken wrote: > When running ubsan-enabled optimized binaries on Linux x86_64, test > compiler/startup/StartupOutput.java > triggers this ubsan issue : > > > jdk/src/hotspot/share/ci/ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' > #0 0x7fe7443fc88d in ciEnv::dump_replay_data_helper(outputStream*) src/hotspot/share/ci/ciEnv.cpp:1660 > #1 0x7fe746c22047 in VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long) src/hotspot/share/utilities/vmError.cpp:1872 > #2 0x7fe7447dd429 in report_fatal(VMErrorType, char const*, int, char const*, ...) src/hotspot/share/utilities/debug.cpp:214 > #3 0x7fe7445c614d in RuntimeStub::new_runtime_stub(char const*, CodeBuffer*, short, int, OopMapSet*, bool, bool) src/hotspot/share/code/codeBlob.cpp:413 > #4 0x7fe744259ceb in Runtime1::generate_blob(BufferBlob*, int, char const*, bool, StubAssemblerCodeGenClosure*) src/hotspot/share/c1/c1_Runtime1.cpp:230 > #5 0x7fe74425a273 in Runtime1::generate_blob_for(BufferBlob*, Runtime1::StubID) src/hotspot/share/c1/c1_Runtime1.cpp:259 > #6 0x7fe74425a273 in Runtime1::initialize(BufferBlob*) src/hotspot/share/c1/c1_Runtime1.cpp:268 > #7 0x7fe743fc04a1 in Compiler::init_c1_runtime() src/hotspot/share/c1/c1_Compiler.cpp:53 > #8 0x7fe743fc04a1 in Compiler::initialize() src/hotspot/share/c1/c1_Compiler.cpp:74 > #9 0x7fe7446aaad7 in CompileBroker::init_compiler_runtime() src/hotspot/share/compiler/compileBroker.cpp:1771 > #10 0x7fe7446b83cf in CompileBroker::compiler_thread_loop() src/hotspot/share/compiler/compileBroker.cpp:1913 > #11 0x7fe74516edca in JavaThread::thread_main_inner() src/hotspot/share/runtime/javaThread.cpp:758 > #12 0x7fe7469d3c9a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #13 0x7fe746048cd1 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > #14 0x7fe74b1e66e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 052f7e2a0045f08cb7e7a291f8066a4b7be2521d) > #15 0x7fe74aaf158e in clone (/lib64/libc.so.6+0x11858e) (BuildId: cfb059a57e69ac95d5dadab831626b3bd48a4309) This pull request has now been integrated. Changeset: efe3573b Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/efe3573b9b4ecec0630fdc1c61c765713a5b68e6 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8340109: Ubsan: ciEnv.cpp:1660:65: runtime error: member call on null pointer of type 'struct CompileTask' Reviewed-by: kvn, lucy ------------- PR: https://git.openjdk.org/jdk/pull/21288 From roland at openjdk.org Wed Oct 2 08:04:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 08:04:37 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v2] In-Reply-To: References: <-TWzHU1sjg49BOttKN_muQ6ICHAMbAnBoNUurRbqHDg=.a27bbf72-a451-4257-914f-d37e181ce257@github.com> Message-ID: On Tue, 17 Sep 2024 09:33:38 GMT, Christian Hagedorn wrote: >>> If this is your intention, then please ignore this message. >> >> Yes, this is my intention. >> >> --- >> >> My previous approach of identifying optimized `Mul->shift + add/sub` (e.g., `a*6` becomes `(a<<1) + (a<<2)` by `MulNode::Ideal()`) was inherently flawed. I was solely determining this with the number of terms. It is not reliable. In the `TestLargeTreeOfSubNodes` example, it replaces already optimized Mul nodes and a new Mul node and repeats the process, causing performance regression (and timeouts). >> >> The new approach matches the exact patterns of optimized `MulNode`s. Additionally, a recursion depth limit of 5 (a rather arbitrary number) is in effect during *iterative* GVN to mitigate the risk of exhausting resources. Untransformed nodes are added to the worklist and will be eventually transformed. >> >> Please note, in the case of `TestLargeTreeOfSubNodes` with flags mentioned above, the compilation is skipped without a large enough `-XX:MaxLabelRootDepth`. This is the same behaviour as the current master. >> >> Please re-review once GHA is confirmed passing. Thanks! > >> Please note, in the case of TestLargeTreeOfSubNodes with flags mentioned above, the compilation is skipped without a large enough -XX:MaxLabelRootDepth. This is the same behaviour as the current master. > > Have you found out why this is the case? I thought that the original fix which added `TestLargeTreeOfSubNodes` wanted to fix the problem of running out of nodes. > > I gave your patch another spin. We still see various failures and timeouts. For example: > > `compiler/intrinsics/sha/TestDigest.java` times out with various flag combinations (for example `-server -Xmixed`). Here is the stack at the timeout: > > > Thread 7 (Thread 0x7fc808490700 (LWP 22433)): > #0 0x00007fc80d648051 in Node::find_integer_type(BasicType) const () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #1 0x00007fc80c793214 in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #2 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > ... > #90 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #91 0x00007fc80c793082 in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #92 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #93 0x00007fc80c793351 in AddNode::convert_serial_additions(PhaseGVN*, bool, BasicType) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so > #94 0x00007fc80c7937c5 in AddNode... @chhagedorn would you mind running the latest version patch through testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2387860251 From fjiang at openjdk.org Wed Oct 2 09:17:39 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 2 Oct 2024 09:17:39 GMT Subject: RFR: 8341146: RISC-V: Unnecessary fences used for load-acquire in template interpreter In-Reply-To: References: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> Message-ID: On Tue, 1 Oct 2024 17:57:53 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> RISC-V does not currently have plain load and store opcodes with aq or rl annotations, load-acquire and >> store-release operations are implemented using fences instead. Initially, we followed the RISC-V spec >> and placed FENCE RW,RW fence in front of load-acquire operation when porting the template interpreter. >> The purpose is to enforce a store-release-to-load-acquire ordering (where there must be a FENCE RW,RW >> between the store-release and load-acquire). But it turns out these fences are unnecessary for our use >> cases in the template interpreter. In fact, we only need to do a single FENCE R,RW after a normal memory >> load in order to implement a load-acquire operation. We should remove those unnecessary fences for both >> performance reasons and for consistency with the rest of the port (i.e., C1 and C2 JIT). >> >> Testing: >> - [x] JCstress >> - [x] hs-tier1 - hs-tier4 >> - [x] ~5% improvement on SPECJbb2005 score (-Xint -XX:+UseParallelGC) > > Thank you! Thanks! @robehn @RealFYang ------------- PR Comment: https://git.openjdk.org/jdk/pull/21248#issuecomment-2387996811 From fjiang at openjdk.org Wed Oct 2 09:17:40 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 2 Oct 2024 09:17:40 GMT Subject: Integrated: 8341146: RISC-V: Unnecessary fences used for load-acquire in template interpreter In-Reply-To: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> References: <_FRFrY50v0MTOOsTdZDhqcXjPORwH-f2YrVOkjU6Y0Q=.de5f994c-febd-46bc-be07-cfd67ae7a85d@github.com> Message-ID: On Sun, 29 Sep 2024 10:52:25 GMT, Feilong Jiang wrote: > Hi, please consider. > > RISC-V does not currently have plain load and store opcodes with aq or rl annotations, load-acquire and > store-release operations are implemented using fences instead. Initially, we followed the RISC-V spec > and placed FENCE RW,RW fence in front of load-acquire operation when porting the template interpreter. > The purpose is to enforce a store-release-to-load-acquire ordering (where there must be a FENCE RW,RW > between the store-release and load-acquire). But it turns out these fences are unnecessary for our use > cases in the template interpreter. In fact, we only need to do a single FENCE R,RW after a normal memory > load in order to implement a load-acquire operation. We should remove those unnecessary fences for both > performance reasons and for consistency with the rest of the port (i.e., C1 and C2 JIT). > > Testing: > - [x] JCstress > - [x] hs-tier1 - hs-tier4 > - [x] ~5% improvement on SPECJbb2005 score (-Xint -XX:+UseParallelGC) This pull request has now been integrated. Changeset: a4ca6267 Author: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/a4ca6267e17815153f8fa119db19b97b1da2bd84 Stats: 9 lines in 1 file changed: 0 ins; 9 del; 0 mod 8341146: RISC-V: Unnecessary fences used for load-acquire in template interpreter Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/21248 From mli at openjdk.org Wed Oct 2 10:15:54 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 2 Oct 2024 10:15:54 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/9566d51f...14483b83 Hi, have some comments on riscv part code. I'm not sure if the same comments also apply to other code, please have a look if necessary. src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 55: > 53: } > 54: for (RegSetIterator reg = no_preserve.begin(); *reg != noreg; ++reg) { > 55: stub->dont_preserve(*reg); Could `no_preserve` and `preserve` overlap? If false, then seems it's not necessary to do `dont_preserve` for `no_preserve` If true, seems it's not safe to `dont_preserve` these regs? I'm not sure. src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 169: > 167: predicate(UseG1GC && n->as_LoadStore()->barrier_data() != 0); > 168: match(Set res (CompareAndExchangeP mem (Binary oldval newval))); > 169: effect(TEMP res, TEMP tmp1, TEMP tmp2); should `res` be `TEMP_DEF`? src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 201: > 199: predicate(UseG1GC && needs_acquiring_load_reserved(n) && n->as_LoadStore()->barrier_data() != 0); > 200: match(Set res (CompareAndExchangeP mem (Binary oldval newval))); > 201: effect(TEMP res, TEMP tmp1, TEMP tmp2); should `res` be `TEMP_DEF`? src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 233: > 231: predicate(UseG1GC && n->as_LoadStore()->barrier_data() != 0); > 232: match(Set res (CompareAndExchangeN mem (Binary oldval newval))); > 233: effect(TEMP res, TEMP tmp1, TEMP tmp2, TEMP tmp3); should `res` be `TEMP_DEF`? src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 263: > 261: predicate(UseG1GC && needs_acquiring_load_reserved(n) && n->as_LoadStore()->barrier_data() != 0); > 262: match(Set res (CompareAndExchangeN mem (Binary oldval newval))); > 263: effect(TEMP res, TEMP tmp1, TEMP tmp2, TEMP tmp3); should `res` be `TEMP_DEF`? And same comment for following instructs? ------------- PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2342455263 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784240549 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784209154 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784210589 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784211728 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784212185 From thartmann at openjdk.org Wed Oct 2 10:44:38 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 10:44:38 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:22:23 GMT, Roland Westrelin wrote: >> When converting a `LongCountedLoop` into a loop nest, c2 needs jvm >> state to add predicates to the inner loop. For that, it peels an >> iteration of the loop and uses the state of the safepoint at the end >> of the loop. That's only legal if there's no side effect between the >> safepoint and the backedge that goes back into the loop. The assert >> failure here happens in code that checks that. >> >> That code compares the memory states at the safepoint and at the >> backedge. If they are the same then there's no side effect. To check >> consistency, the `MergeMem` at the safepoint is cloned. As the logic >> iterates over the backedge state, it clears every component of the >> state it encounters from the `MergeMem`. Once done, the cloned >> `MergeMem` should be "empty". In the case of this failure, no side >> effect is found but the cloned `MergeMem` is not empty. That happens >> because of EA: it adds edges to the `MergeMem` at the safepoint that >> it doesn't add to the backedge `Phis`. >> >> So it's the verification code that fails and I propose dealing with >> this by ignoring memory state added by EA in the verification code. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8336702 > - test indentation > - fix & test Looks good to me. Testing passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21009#pullrequestreview-2342590569 From thartmann at openjdk.org Wed Oct 2 10:44:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 10:44:39 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: <4cGhoDWjhLY3K9PY5CydEPh0mdwDn6EPVMAWQWU4U3M=.60c7cf50-32ca-4bf4-8d53-13c1ae5dabac@github.com> References: <85C55cSsG9xUcs6GV3nCiq-idMJhEOywFN3atCFOO78=.53e95ff8-e9f4-4c86-a97d-5bff2cb009d3@github.com> <4cGhoDWjhLY3K9PY5CydEPh0mdwDn6EPVMAWQWU4U3M=.60c7cf50-32ca-4bf4-8d53-13c1ae5dabac@github.com> Message-ID: On Tue, 1 Oct 2024 09:40:14 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 708: >> >>> 706: for (MergeMemStream mms(mem->as_MergeMem()); mms.next_non_empty(); ) { >>> 707: // Loop invariant memory state won't be reset by no_side_effect_since_safepoint(). Do it here. >>> 708: // Escape Analysis can add state to mm that it doesn't add to the backedge memory Phis, breaking verification >> >> Where exactly does that happen in EA? > > When an allocation is non escaping and made scalar replaceable, new slices are allocated for the fields of the allocation and the memory graph is updated so allocation/stores/loads to the new slices are connected together. In the process, `MergeMem` nodes need to be updated as well. In this case, I'm not sure this particular `MergeMem` node needs to be updated by EA but it's harmless in any case. The verification code doesn't expect "more" state to be recorded at the safepoint because of the `MergeMem` than at the backedge. Okay, thanks for the details! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21009#discussion_r1784285660 From thartmann at openjdk.org Wed Oct 2 11:02:36 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 11:02:36 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v12] In-Reply-To: <0Ce3kXIzVDXuzpKGBzekEEOuJftpvkCvLwDkVOtpPR0=.ec2c1c1b-2b78-497d-97d3-a613ef736d2f@github.com> References: <0Ce3kXIzVDXuzpKGBzekEEOuJftpvkCvLwDkVOtpPR0=.ec2c1c1b-2b78-497d-97d3-a613ef736d2f@github.com> Message-ID: <_6h1VZCWQ25jOovnzdnQkR1OljZGcmx7SEY7ezhGE-g=.8805d48d-01f1-4f89-b396-4f7660919d6a@github.com> On Tue, 24 Sep 2024 16:53:51 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > left over Great work Daniel! The changes look good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19646#pullrequestreview-2342618110 From roland at openjdk.org Wed Oct 2 11:26:47 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 11:26:47 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop Message-ID: The patch includes 2 test cases for this: test1() causes the assert failure in the bug description, test2() causes an incorrect execution where a load floats above a store that it should be dependent on. In the test cases, `field` is accessed on object `a` of type `A`. When the field is accessed, the type that c2 has for `a` is `A` with interface `I`. The holder of the field is class `A` which implements no interface. The reason the type of `a` and the type of the holder are slightly different is because `a` is the result of a merge of objects of subclasses `B` and `C` which implements `I`. The root cause of the bug is that `Compile::flatten_alias_type()` doesn't change `A` + interface `I` into `A`, the actual holder of the field. So `field` in `A` + interface `I` and `field` in `A` get different slices which is wrong. At parse time, the logic that creates the `Store` node uses: C->alias_type(field)->adr_type() to compute the slice which is the slice for `field` in `A`. So the slice used at parse time is the right one but during igvn, when the slice is computed from the input address, a different slice (the one for `A` + interface `I`) is used. That causes load/store nodes when they are processed by igvn to use the wrong memory state. In `Compile::flatten_alias_type()`: if (!ik->equals(canonical_holder) || tj->offset() != offset) { if( is_known_inst ) { tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); } else { tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); } } only flattens the type if it's not the canonical holder but it should test that the type doesn't implement interfaces that the canonical holder doesn't. To keep the logic simple, the fix I propose creates a new type whenever there's a chance that a type implements extra interfaces (the type is not exact). I also added asserts in `GraphKit::make_load()` and `GraphKit::store_to_memory()` to make sure the slice that is passed and the address type agree. Those asserts fire with the new test cases. When running testing, I found that they also catch a few cases in `library_call.cpp` where an incorrect slice is passed. As further clean up, maybe we want to drop the slice argument to `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to their callers) given it's redundant with the address type and error prone. ------------- Commit messages: - test cleanup - fix & test Changes: https://git.openjdk.org/jdk/pull/21303/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340214 Stats: 121 lines in 4 files changed: 109 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21303/head:pull/21303 PR: https://git.openjdk.org/jdk/pull/21303 From thartmann at openjdk.org Wed Oct 2 11:29:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 11:29:40 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v19] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Mon, 30 Sep 2024 13:36:19 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments in TestParallelIvInIntCountedLoop.java Changes requested by thartmann (Reviewer). test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 315: > 313: } > 314: > 315: return a; Shouldn't there also be tests for the `int a` `long i` variant? test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 319: > 317: > 318: private static void testCorrectness() { > 319: Random rng = new Random(); You should use `Utils.getRandomInstance()` instead which logs the seed for better reproducibility. Also add `@key randomness` to the test header. test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 321: > 319: Random rng = new Random(); > 320: > 321: // Since we can't easily determined expected values if loop varibles overflow, we make sure i is less than (MAX_VALUE - stride). Suggestion: // Since we can't easily determine expected values if loop variables overflow, we make sure i is less than (MAX_VALUE - stride). test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 325: > 323: > 324: for (int i : iterations) { > 325: Asserts.assertEQ(i, testIntCountedLoopWithIntIV(i)); Code in this loop is not guaranteed to be even C2 compiled because IR verification will be executed in a separate VM. IR framework tests that also want to verify the output, should be written like this: https://github.com/openjdk/jdk/blob/9bd478593cc92a716151d1373f3426f1d92143bb/test/hotspot/jtreg/testlibrary_tests/ir_framework/examples/CustomRunTestExample.java#L84-L97 ------------- PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-2342647468 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1784324768 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1784331782 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1784325142 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1784333260 From rcastanedalo at openjdk.org Wed Oct 2 11:42:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 2 Oct 2024 11:42:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: <7L7jYDlFa0WnVvgiyNHI9KZrcffYwNnBB899AuMS56Q=.40b031e7-07b8-4a15-b319-c53b38a17a49@github.com> On Wed, 2 Oct 2024 10:10:12 GMT, Hamlin Li wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: >> >> - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion >> - riscv port refactor >> - Remove temporary support code >> - Merge jdk-24+17 >> - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization >> - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions >> - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes >> - Merge jdk-24+16 >> - Ensure that detected encode-and-store patterns are matched >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - ... and 43 more: https://git.openjdk.org/jdk/compare/486c5b0d...14483b83 > > src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 55: > >> 53: } >> 54: for (RegSetIterator reg = no_preserve.begin(); *reg != noreg; ++reg) { >> 55: stub->dont_preserve(*reg); > > Could `no_preserve` and `preserve` overlap? > If false, then seems it's not necessary to do `dont_preserve` for `no_preserve` > If true, seems it's not safe to `dont_preserve` these regs? I'm not sure. In the G1 case, the use of `dont_preserve` is an optimization to avoid spilling and reloading, in the slow path of the pre-barrier, registers (`res`) that are not live at that point. It is not necessary for correctness, but saves a few bytes in the generated code. If `res` was not marked as `dont_preserve`, it would be included in the pre-barrier stub's preserve set (`BarrierStubC2::preserve_set()`) because it is live out of the entire AD instruction (as computed by `BarrierSetC2::compute_liveness_at_stubs()`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784346898 From rcastanedalo at openjdk.org Wed Oct 2 11:53:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 2 Oct 2024 11:53:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 09:58:29 GMT, Hamlin Li wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: >> >> - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion >> - riscv port refactor >> - Remove temporary support code >> - Merge jdk-24+17 >> - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization >> - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions >> - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes >> - Merge jdk-24+16 >> - Ensure that detected encode-and-store patterns are matched >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - ... and 43 more: https://git.openjdk.org/jdk/compare/0dc16d16...14483b83 > > src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 169: > >> 167: predicate(UseG1GC && n->as_LoadStore()->barrier_data() != 0); >> 168: match(Set res (CompareAndExchangeP mem (Binary oldval newval))); >> 169: effect(TEMP res, TEMP tmp1, TEMP tmp2); > > should `res` be `TEMP_DEF`? It could, but the effect would be the same (see [JDK-8058880](https://bugs.openjdk.org/browse/JDK-8058880)). I went with `TEMP` for the x64 and aarch64 platforms for consistency with the analogous ZGC ADL code, see e.g. https://github.com/openjdk/jdk/blob/855c8a7def21025bc2fc47594f7285a55924c213/src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad#L182-L204. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784358586 From chagedorn at openjdk.org Wed Oct 2 12:00:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Oct 2024 12:00:44 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v12] In-Reply-To: <0Ce3kXIzVDXuzpKGBzekEEOuJftpvkCvLwDkVOtpPR0=.ec2c1c1b-2b78-497d-97d3-a613ef736d2f@github.com> References: <0Ce3kXIzVDXuzpKGBzekEEOuJftpvkCvLwDkVOtpPR0=.ec2c1c1b-2b78-497d-97d3-a613ef736d2f@github.com> Message-ID: <2F_C30zYunRTYqFh4cphJcHHyosVVyiKjESHiBGjRlE=.b7f5f6a5-9848-4972-8a7d-ccf38c42be7d@github.com> On Tue, 24 Sep 2024 16:53:51 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > left over Nice stress mode! Some small comments but otherwise, looks good to me, too. src/hotspot/share/opto/compile.hpp line 792: > 790: > 791: #ifdef ASSERT > 792: bool phase_verify_ideal_loop() { return _phase_verify_ideal_loop; } can be made `const`: Suggestion: bool phase_verify_ideal_loop() const { return _phase_verify_ideal_loop; } src/hotspot/share/opto/compile.hpp line 838: > 836: const CompilationFailureInfo* first_failure_details() const { return _first_failure_details; } > 837: > 838: bool failing(DEBUG_ONLY(bool no_stress_bailout = false)) { It's somehow difficult to read what `failing(false/true)` now exactly mean. When having `failing(true)`, don't we get the same behavior as if we call `failing_internal()`? If `failing_internal()` is false, then we would only return false because we are not entering if (StressBailout && !no_stress_bailout) { return fail_randomly(); } So, I'm wondering if we cannot just use `failing_internal()` instead of `failing(true)` and remove the parameter completely? src/hotspot/share/opto/compile.hpp line 843: > 841: } > 842: #ifdef ASSERT > 843: // Disable stress code for PhaseIdealLoop verification Can you expand the comment here and add the reason why? From a comment above, you mentioned that it is not easy to make it work. I guess it's fine to just mention that here. ------------- PR Review: https://git.openjdk.org/jdk/pull/19646#pullrequestreview-2342682921 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1784348743 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1784363994 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1784351478 From chagedorn at openjdk.org Wed Oct 2 12:33:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Oct 2024 12:33:38 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 11:21:43 GMT, Roland Westrelin wrote: > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. Looks reasonable. This assert has proven to be quite valuable to find problems in the memory graph that we would otherwise miss. It was also one of the few assert that triggered when having a corrupted graph due to missing Assertion Predicates. I'm wondering if we need more such memory graph checks in general. Anyway, that's just a thought for some future RFE. src/hotspot/share/opto/compile.cpp line 1468: > 1466: ciInstanceKlass *canonical_holder = ik->get_canonical_holder(offset); > 1467: assert(offset < canonical_holder->layout_helper_size_in_bytes(), ""); > 1468: assert(tj->offset() == offset, "not change to offset expected"); Suggestion: assert(tj->offset() == offset, "no change to offset expected"); test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java line 56: > 54: A a; > 55: if (flag) { > 56: a = b; Indentation is off: Suggestion: a = b; ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2342747463 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784414779 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784400108 From thartmann at openjdk.org Wed Oct 2 12:33:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 12:33:39 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 11:21:43 GMT, Roland Westrelin wrote: > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. Looks reasonable to me. I submitted testing and will report back once it passed. > As further clean up, maybe we want to drop the slice argument to GraphKit::make_load() and GraphKit::store_to_memory() (and to their callers) given it's redundant with the address type and error prone. Yes, let's do that. Please file a starter RFE. src/hotspot/share/opto/compile.cpp line 1468: > 1466: ciInstanceKlass *canonical_holder = ik->get_canonical_holder(offset); > 1467: assert(offset < canonical_holder->layout_helper_size_in_bytes(), ""); > 1468: assert(tj->offset() == offset, "not change to offset expected"); Suggestion: assert(tj->offset() == offset, "no change to offset expected"); src/hotspot/share/opto/compile.cpp line 1475: > 1473: assert(tj == TypeInstPtr::make(to->ptr(), canonical_holder, is_known_inst, nullptr, offset, instance_id), "exact type should be canonical type"); > 1474: } else { > 1475: assert(xk || !is_known_inst, "Known instance should be exact type"); Maybe add a comment here and explain the two cases when we create a new type. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2342672361 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784341679 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784422309 From mli at openjdk.org Wed Oct 2 12:57:48 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 2 Oct 2024 12:57:48 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: <7L7jYDlFa0WnVvgiyNHI9KZrcffYwNnBB899AuMS56Q=.40b031e7-07b8-4a15-b319-c53b38a17a49@github.com> References: <7L7jYDlFa0WnVvgiyNHI9KZrcffYwNnBB899AuMS56Q=.40b031e7-07b8-4a15-b319-c53b38a17a49@github.com> Message-ID: On Wed, 2 Oct 2024 11:40:18 GMT, Roberto Casta?eda Lozano wrote: > If `res` was not marked as `dont_preserve`, it would be included in the pre-barrier stub's preserve set (`BarrierStubC2::preserve_set()`) because it is live out of the entire AD instruction (as computed by `BarrierSetC2::compute_liveness_at_stubs()`). Thanks for explanation! I did not realize this, if that's the case, then it's good. >> src/hotspot/cpu/riscv/gc/g1/g1_riscv.ad line 169: >> >>> 167: predicate(UseG1GC && n->as_LoadStore()->barrier_data() != 0); >>> 168: match(Set res (CompareAndExchangeP mem (Binary oldval newval))); >>> 169: effect(TEMP res, TEMP tmp1, TEMP tmp2); >> >> should `res` be `TEMP_DEF`? > > It could, but the effect would be the same (see [JDK-8058880](https://bugs.openjdk.org/browse/JDK-8058880)). I went with `TEMP` for the x64 and aarch64 platforms for consistency with the analogous ZGC ADL code, see e.g. https://github.com/openjdk/jdk/blob/855c8a7def21025bc2fc47594f7285a55924c213/src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad#L182-L204. I saw the riscv one in z_riscv.ad is: `effect(TEMP oldval_tmp, TEMP newval_tmp, TEMP tmp1, TEMP_DEF res);`, maybe it's good to change riscv one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784479784 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1784479526 From roland at openjdk.org Wed Oct 2 13:02:11 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 13:02:11 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v2] In-Reply-To: References: Message-ID: <9prDCh5_yHkuEwmfeUfE_v8AZch3DkQjBkRXMIqy820=.85e297f0-6589-4731-a825-7665d26af08b@github.com> > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/compile.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21303/files - new: https://git.openjdk.org/jdk/pull/21303/files/09f2e987..913a82b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21303/head:pull/21303 PR: https://git.openjdk.org/jdk/pull/21303 From roland at openjdk.org Wed Oct 2 13:11:18 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 13:11:18 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v3] In-Reply-To: References: Message-ID: <_VDcL5UcOCF1JkAVrpqVkOmJkHfxEbp0NxF1kZHTkXU=.23df893b-c0cf-4a0a-8c3a-49375e91e0f0@github.com> > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - review - Merge branch 'master' into JDK-8340214 - Update src/hotspot/share/opto/compile.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java Co-authored-by: Christian Hagedorn - test cleanup - fix & test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21303/files - new: https://git.openjdk.org/jdk/pull/21303/files/913a82b4..46042b26 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=01-02 Stats: 3240 lines in 131 files changed: 2545 ins; 329 del; 366 mod Patch: https://git.openjdk.org/jdk/pull/21303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21303/head:pull/21303 PR: https://git.openjdk.org/jdk/pull/21303 From roland at openjdk.org Wed Oct 2 13:23:36 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 13:23:36 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v3] In-Reply-To: References: Message-ID: <994Q512PXsbEDCXktORppesCubBHFpD0wcI4EnQDQdc=.9b6f8d01-139c-4b1e-b218-c2906a2eccd5@github.com> On Wed, 2 Oct 2024 12:30:30 GMT, Tobias Hartmann wrote: > > As further clean up, maybe we want to drop the slice argument to GraphKit::make_load() and GraphKit::store_to_memory() (and to their callers) given it's redundant with the address type and error prone. > > Yes, let's do that. Please file a starter RFE. https://bugs.openjdk.org/browse/JDK-8341411 > src/hotspot/share/opto/compile.cpp line 1475: > >> 1473: assert(tj == TypeInstPtr::make(to->ptr(), canonical_holder, is_known_inst, nullptr, offset, instance_id), "exact type should be canonical type"); >> 1474: } else { >> 1475: assert(xk || !is_known_inst, "Known instance should be exact type"); > > Maybe add a comment here and explain the two cases when we create a new type. Done in new commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21303#issuecomment-2388631754 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784521453 From thartmann at openjdk.org Wed Oct 2 13:34:38 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 13:34:38 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v3] In-Reply-To: <_VDcL5UcOCF1JkAVrpqVkOmJkHfxEbp0NxF1kZHTkXU=.23df893b-c0cf-4a0a-8c3a-49375e91e0f0@github.com> References: <_VDcL5UcOCF1JkAVrpqVkOmJkHfxEbp0NxF1kZHTkXU=.23df893b-c0cf-4a0a-8c3a-49375e91e0f0@github.com> Message-ID: On Wed, 2 Oct 2024 13:11:18 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8340214 > - Update src/hotspot/share/opto/compile.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn > - test cleanup > - fix & test Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2342966421 From thartmann at openjdk.org Wed Oct 2 13:34:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 13:34:39 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v3] In-Reply-To: <994Q512PXsbEDCXktORppesCubBHFpD0wcI4EnQDQdc=.9b6f8d01-139c-4b1e-b218-c2906a2eccd5@github.com> References: <994Q512PXsbEDCXktORppesCubBHFpD0wcI4EnQDQdc=.9b6f8d01-139c-4b1e-b218-c2906a2eccd5@github.com> Message-ID: On Wed, 2 Oct 2024 13:21:34 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/compile.cpp line 1475: >> >>> 1473: assert(tj == TypeInstPtr::make(to->ptr(), canonical_holder, is_known_inst, nullptr, offset, instance_id), "exact type should be canonical type"); >>> 1474: } else { >>> 1475: assert(xk || !is_known_inst, "Known instance should be exact type"); >> >> Maybe add a comment here and explain the two cases when we create a new type. > > Done in new commit. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784537173 From chagedorn at openjdk.org Wed Oct 2 13:52:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Oct 2024 13:52:38 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v3] In-Reply-To: <_VDcL5UcOCF1JkAVrpqVkOmJkHfxEbp0NxF1kZHTkXU=.23df893b-c0cf-4a0a-8c3a-49375e91e0f0@github.com> References: <_VDcL5UcOCF1JkAVrpqVkOmJkHfxEbp0NxF1kZHTkXU=.23df893b-c0cf-4a0a-8c3a-49375e91e0f0@github.com> Message-ID: On Wed, 2 Oct 2024 13:11:18 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8340214 > - Update src/hotspot/share/opto/compile.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn > - test cleanup > - fix & test Still good, one more minor thing. test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java line 68: > 66: A a; > 67: if (flag) { > 68: a = b; Suggestion: a = b; ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2343020437 PR Review Comment: https://git.openjdk.org/jdk/pull/21303#discussion_r1784567122 From roland at openjdk.org Wed Oct 2 13:58:14 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 2 Oct 2024 13:58:14 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21303/files - new: https://git.openjdk.org/jdk/pull/21303/files/46042b26..6cdd2337 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21303&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21303/head:pull/21303 PR: https://git.openjdk.org/jdk/pull/21303 From vlivanov at openjdk.org Wed Oct 2 18:34:45 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Oct 2024 18:34:45 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v6] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:20:13 GMT, Jorn Vernee wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>
>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - use resolve_global_jobject on s390 > - Merge branch 'master' into LoadVMTraget > - remove PC save/restore on s390 > - use fatal() > - add RISC-V as target platform > - Adjust ppc & RISC-V code > - Add s390 changes > - Merge branch 'master' into LoadVMTraget > - Don't save/restore LR/CR + resolve_jobject on s390 > - eyeball other platforms > - ... and 14 more: https://git.openjdk.org/jdk/compare/2faf8b8d...b703b162 Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20479#pullrequestreview-2343806028 From jvernee at openjdk.org Wed Oct 2 18:58:44 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 2 Oct 2024 18:58:44 GMT Subject: RFR: 8337753: Target class of upcall stub may be unloaded [v6] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:20:13 GMT, Jorn Vernee wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` >> >> However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. >> >> The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. >> >>
>> Performance numbers >> x64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op >> >> >> aarch64: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op >> >>
>> >> As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. >> >> Testing: tier 1-4 > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - use resolve_global_jobject on s390 > - Merge branch 'master' into LoadVMTraget > - remove PC save/restore on s390 > - use fatal() > - add RISC-V as target platform > - Adjust ppc & RISC-V code > - Add s390 changes > - Merge branch 'master' into LoadVMTraget > - Don't save/restore LR/CR + resolve_jobject on s390 > - eyeball other platforms > - ... and 14 more: https://git.openjdk.org/jdk/compare/2faf8b8d...b703b162 Thanks for all the reviews! I will do one more round of testing before integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20479#issuecomment-2389467122 From rcastanedalo at openjdk.org Wed Oct 2 19:43:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 2 Oct 2024 19:43:50 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: <7L7jYDlFa0WnVvgiyNHI9KZrcffYwNnBB899AuMS56Q=.40b031e7-07b8-4a15-b319-c53b38a17a49@github.com> Message-ID: On Wed, 2 Oct 2024 12:55:13 GMT, Hamlin Li wrote: >> It could, but the effect would be the same (see [JDK-8058880](https://bugs.openjdk.org/browse/JDK-8058880)). I went with `TEMP` for the x64 and aarch64 platforms for consistency with the analogous ZGC ADL code, see e.g. https://github.com/openjdk/jdk/blob/855c8a7def21025bc2fc47594f7285a55924c213/src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad#L182-L204. > > I saw the riscv one in z_riscv.ad is: `effect(TEMP oldval_tmp, TEMP newval_tmp, TEMP tmp1, TEMP_DEF res);`, maybe it's good to change riscv one? I suggest to postpone these types of refactorings to follow-up enhancements, given that the pull request in its current form is stable, thoroughly tested, and approved by reviewers. I intend to integrate it within the following 24 hours, provided final test results look good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1785135652 From kxu at openjdk.org Wed Oct 2 19:57:55 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 2 Oct 2024 19:57:55 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v20] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: fix typos Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/6cad8c19..4e2735ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=18-19 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Wed Oct 2 19:57:56 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 2 Oct 2024 19:57:56 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v19] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Wed, 2 Oct 2024 11:18:52 GMT, Tobias Hartmann wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments in TestParallelIvInIntCountedLoop.java > > test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 315: > >> 313: } >> 314: >> 315: return a; > > Shouldn't there also be tests for the `int a` `long i` variant? `long i` will be it a long-counted loop, which hs doesn't perform parallel iv at this time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1785147608 From duke at openjdk.org Wed Oct 2 22:49:43 2024 From: duke at openjdk.org (duke) Date: Wed, 2 Oct 2024 22:49:43 GMT Subject: Withdrawn: 8321008: RISC-V: C2 MulAddVS2VI In-Reply-To: References: Message-ID: <8Xm4kpGgp2U2NFhSdCCHJ_u2UrP-2lLtYxkScRL4x9w=.144122d3-89a6-484e-9bf1-74909cc00712@github.com> On Tue, 23 Apr 2024 15:02:10 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > > The motivation is to implement `MulAddVS2VI`. > But to enable `MulAddVS2VI`, `MulAddS2I` is prerequisite, although `MulAddS2I` does not bring extra benefit on riscv as we don't have an specific instruction of muladd on riscv. > So, this patch implement both `MulAddVS2VI` and `MulAddS2I`. > > > Thanks > > ## Performance > ### Summary > #### MulAddS2I > When +UseSuperWord > * There is performance gain in MulAddS2I.testa/b/c. > * There is performance regression in in MulAddS2I.testd-testi. > > When -UseSuperWord > * There is performance regression in all tests. > > #### VectorReduction > There is no performance regression in VectorReduction > > ### when +UseSuperWord > data > > Benchmark on bananapi, +UseSuperWord | (COUNT) | (COUNT_DOUBLE) | (COUNT_FLOAT) | (ITER) | (RANGE) | (seed) | Mode | Cnt | Score +intrinsic | Error | Units | Score -intrinsic | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > MulAddS2I.testa | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 65863.434 | 12082.469 | ns/op | 92576.189 | 1.406 > MulAddS2I.testb | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 74741.045 | 14608.942 | ns/op | 104428.457 | 1.397 > MulAddS2I.testc | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 42013.168 | 6029.504 | ns/op | 69380.849 | 1.651 > MulAddS2I.testd | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 99644.082 | 3078.374 | ns/op | 84316.883 | 0.846 > MulAddS2I.teste | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 98910.181 | 3170.046 | ns/op | 86023.681 | 0.87 > MulAddS2I.testf | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 101752.531 | 10994.494 | ns/op | 85473.52 | 0.84 > MulAddS2I.testg | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 99513.05 | 2919.032 | ns/op | 86680.144 | 0.871 > MulAddS2I.testh | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 100753.291 | 3449.613 | ns/op | 84424.63 | 0.838 > MulAddS2I.testi | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 100626.168 | 2924.72 | ns/op | 85477.079 | 0.849 > MulAddS2I.testj | N/A | N/A | N/A | 8191 | 16384 | 0 | avgt | 10 | 100990.584 | 3756.096 | ns/op | 87010.947 | 0.862 > MulAddS2I.testk | N/A | N/A | N/A | 8191... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18919 From jbhateja at openjdk.org Thu Oct 3 05:09:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 05:09:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v15] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Review comments resolution. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338023 - Review comments resolutions. - Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. - Incorporating review and documentation suggestions. - Jcheck clearance - Review comments resolution. - Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. - Documentation suggestions from Paul. - Review resolutions. - ... and 8 more: https://git.openjdk.org/jdk/compare/bdfb41f9...6215ab91 ------------- Changes: https://git.openjdk.org/jdk/pull/20508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=14 Stats: 2804 lines in 89 files changed: 2785 ins; 18 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Thu Oct 3 05:09:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 05:09:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v14] In-Reply-To: <5wF2qLX9Z_tquvURMW0HVnrmMla1awxtz6C0UYI0lh4=.94df7340-1a27-4d93-80fa-d4c561641a97@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <5wF2qLX9Z_tquvURMW0HVnrmMla1awxtz6C0UYI0lh4=.94df7340-1a27-4d93-80fa-d4c561641a97@github.com> Message-ID: On Tue, 1 Oct 2024 18:10:10 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2797: > >> 2795: >> 2796: Node* operation = lowerSelectFromOp ? >> 2797: LowerSelectFromTwoVectorOperation(gvn(), opd1, opd2, opd3, vt) : > > Thanks for bringing the lowering right here. It opens up an optimization opportunity: currently for float/double we have two casts for index (e.g. from float -> int at line 2786 and from int -> byte at line 2661 as part of LowerSelectFromTwoVectorOperation. Could this be done by one cast? This is not sub-optimal, Float to sub-word cast is two step process where we first convert float value to integer following by integer down casting to sub-word. So resulting JIT code will still be same if we directly emit F2X or the way its handled currently. All existing targets support F2X take this route. But its good to be safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1785634731 From jbhateja at openjdk.org Thu Oct 3 05:09:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 05:09:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 1 Oct 2024 18:03:06 GMT, Sandhya Viswanathan wrote: >>> This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); >> >> Yes, this may save additional allocation penalty of result array allocation which may slightly improve fall back performance, but logical operation cannot be directly applied over floating point vectors. so, we will need an explicit conversion to integral vector, which is why I opted for current fallback implementation which is in line with rest of the code. > > I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. You will also need additional handling for NPOT vector sizes which is handled by existing fallback implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1785634658 From mli at openjdk.org Thu Oct 3 06:50:48 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Oct 2024 06:50:48 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: <7L7jYDlFa0WnVvgiyNHI9KZrcffYwNnBB899AuMS56Q=.40b031e7-07b8-4a15-b319-c53b38a17a49@github.com> Message-ID: <4S2raWNwXSaEN1p2bAXEUKlHdqSY9AqrR7cBZDhs2QI=.e6ecddb3-be2b-4bda-88ac-8cd9fcb1301b@github.com> On Wed, 2 Oct 2024 19:41:26 GMT, Roberto Casta?eda Lozano wrote: >> I saw the riscv one in z_riscv.ad is: `effect(TEMP oldval_tmp, TEMP newval_tmp, TEMP tmp1, TEMP_DEF res);`, maybe it's good to change riscv one? > > I suggest to postpone these types of refactorings to follow-up enhancements, given that the pull request in its current form is stable, thoroughly tested, and approved by reviewers. I intend to integrate it within the following 24 hours, provided final test results look good. Sounds good too. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1785711504 From chagedorn at openjdk.org Thu Oct 3 06:58:40 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Oct 2024 06:58:40 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 13:58:14 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2344788165 From aboldtch at openjdk.org Thu Oct 3 07:16:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 3 Oct 2024 07:16:04 GMT Subject: RFR: 8341451: Remove C2HandleAnonOMOwnerStub Message-ID: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> [JDK-8319796](https://bugs.openjdk.org/browse/JDK-8319796) has been implemented on all platforms which had previous C2 implementations of LM_LIGHTWEIGHT that made use of C2HandleAnonOMOwnerStub. The declaration is still left in the shared code, an a couple of platforms have the definitions still lingering. I propose we remove them. ------------- Commit messages: - 8341451: Remove C2HandleAnonOMOwnerStub Changes: https://git.openjdk.org/jdk/pull/21319/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21319&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341451 Stats: 70 lines in 3 files changed: 0 ins; 70 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21319/head:pull/21319 PR: https://git.openjdk.org/jdk/pull/21319 From fyang at openjdk.org Thu Oct 3 08:09:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Oct 2024 08:09:35 GMT Subject: RFR: 8341451: Remove C2HandleAnonOMOwnerStub In-Reply-To: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> References: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> Message-ID: On Thu, 3 Oct 2024 07:09:53 GMT, Axel Boldt-Christmas wrote: > [JDK-8319796](https://bugs.openjdk.org/browse/JDK-8319796) has been implemented on all platforms which had previous C2 implementations of LM_LIGHTWEIGHT that made use of C2HandleAnonOMOwnerStub. > > The declaration is still left in the shared code, an a couple of platforms have the definitions still lingering. I propose we remove them. Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21319#pullrequestreview-2344924640 From chagedorn at openjdk.org Thu Oct 3 08:33:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Oct 2024 08:33:41 GMT Subject: RFR: 8341451: Remove C2HandleAnonOMOwnerStub In-Reply-To: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> References: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> Message-ID: On Thu, 3 Oct 2024 07:09:53 GMT, Axel Boldt-Christmas wrote: > [JDK-8319796](https://bugs.openjdk.org/browse/JDK-8319796) has been implemented on all platforms which had previous C2 implementations of LM_LIGHTWEIGHT that made use of C2HandleAnonOMOwnerStub. > > The declaration is still left in the shared code, an a couple of platforms have the definitions still lingering. I propose we remove them. Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21319#pullrequestreview-2344975438 From rcastanedalo at openjdk.org Thu Oct 3 08:35:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 3 Oct 2024 08:35:54 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/0cf6df31...14483b83 Thanks to everyone who contributed to this JEP, integrating now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2390833194 From rcastanedalo at openjdk.org Thu Oct 3 08:39:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 3 Oct 2024 08:39:57 GMT Subject: Integrated: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 09:49:25 GMT, Roberto Casta?eda Lozano wrote: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... This pull request has now been integrated. Changeset: 0b467e90 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/0b467e902d591ae9feeec1669918d1588987cd1c Stats: 7372 lines in 58 files changed: 5924 ins; 985 del; 463 mod 8334060: Implementation of Late Barrier Expansion for G1 Co-authored-by: Roberto Casta?eda Lozano Co-authored-by: Erik ?sterlund Co-authored-by: Siyao Liu Co-authored-by: Kim Barrett Co-authored-by: Amit Kumar Co-authored-by: Martin Doerr Co-authored-by: Feilong Jiang Co-authored-by: Sergey Nazarkin Reviewed-by: kvn, tschatzl, fyang, ayang, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/19746 From jvernee at openjdk.org Thu Oct 3 12:05:46 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 3 Oct 2024 12:05:46 GMT Subject: Integrated: 8337753: Target class of upcall stub may be unloaded In-Reply-To: References: Message-ID: <2THc5A3PP0cegVF4ySYMLsgc4FO2ieqBgOEI02XgxOk=.0f92be1b-ddbc-4486-ac22-2c303f442ba2@github.com> On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee wrote: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This `Method*` is read from the `LambdaForm::vmentry` field associated with the target method handle at the time when the upcall stub is generated. The MH instance itself is stashed in a global JNI ref. So, there should be a reachability chain to the holder class: `MH (receiver) -> LF (form) -> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the `vmentry` field of the `LambdaForm` of the target method handle of an upcall stub, it is possible that the `vmentry` is updated _after_ we embed the corresponding `Method`* into an upcall stub (or rather, the latest update is not visible to the thread generating the upcall stub). Technically, it is fine to keep using a 'stale' `vmentry`, but the problem is that now the reachability chain is broken, since the upcall stub only extracts the target `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from the target method handle each time. Luckily, this does not really seem to impact performance. > >
> Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ? 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ? 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ? 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ? 0.554 ns/op > >
> > As for the added TestUpcallStress test, it takes about 800 seconds to run this test on the dev machine I'm using, so I've set the timeout quite high. Since it runs for so long, I've dropped it from the default `jdk_foreign` test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 This pull request has now been integrated. Changeset: 6af13580 Author: Jorn Vernee URL: https://git.openjdk.org/jdk/commit/6af13580c2086afefde489275bc2353c2320ff3f Stats: 333 lines in 23 files changed: 255 ins; 26 del; 52 mod 8337753: Target class of upcall stub may be unloaded Reviewed-by: amitkumar, vlivanov, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/20479 From kbarrett at openjdk.org Thu Oct 3 12:56:48 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 3 Oct 2024 12:56:48 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB Message-ID: Please review this change to TypeRawPtr::add_offset to prevent a compiler from inferring things based on prior pointer arithmetic not invoking UB. As noted in the bug report, clang is actually doing this. To accomplish this, changed to integral arithmetic. Also added over/underflow checks. Also made a couple of minor touchups. Replaced an implicit conversion to bool with an explicit compare to nullptr (per style guide). Removed a no longer needed dummy return after a (now) noreturn function. Testing: mach5 tier1-7 That testing was with calls to "fatal" for the over/underflow cases and the sum==0 case. There were no hits. I'm not sure how to construct a test that would hit those. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/21324/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21324&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341178 Stats: 14 lines in 1 file changed: 9 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21324.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21324/head:pull/21324 PR: https://git.openjdk.org/jdk/pull/21324 From kxu at openjdk.org Thu Oct 3 16:31:15 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 3 Oct 2024 16:31:15 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v21] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: correctly verify outputs with custom @Run methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/4e2735ae..32bedd00 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=19-20 Stats: 201 lines in 1 file changed: 122 ins; 60 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Thu Oct 3 16:47:44 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 3 Oct 2024 16:47:44 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v19] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Wed, 2 Oct 2024 11:27:08 GMT, Tobias Hartmann wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments in TestParallelIvInIntCountedLoop.java > > test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java line 325: > >> 323: >> 324: for (int i : iterations) { >> 325: Asserts.assertEQ(i, testIntCountedLoopWithIntIV(i)); > > Code in this loop is not guaranteed to be even C2 compiled because IR verification will be executed in a separate VM. IR framework tests that also want to verify the output, should be written like this: > > https://github.com/openjdk/jdk/blob/9bd478593cc92a716151d1373f3426f1d92143bb/test/hotspot/jtreg/testlibrary_tests/ir_framework/examples/CustomRunTestExample.java#L84-L97 Updated to use custom run methods instead. Thanks for the info! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1786536420 From kvn at openjdk.org Thu Oct 3 17:12:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 3 Oct 2024 17:12:44 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB In-Reply-To: References: Message-ID: <3iWhHL3A0KJIVBfPcMP--NRbbN00pEtFqWrURscE84E=.68f0d6be-58bc-4c12-af83-f7f9f34601ed@github.com> On Thu, 3 Oct 2024 12:50:55 GMT, Kim Barrett wrote: > Please review this change to TypeRawPtr::add_offset to prevent a compiler from > inferring things based on prior pointer arithmetic not invoking UB. As noted in > the bug report, clang is actually doing this. > > To accomplish this, changed to integral arithmetic. Also added over/underflow > checks. > > Also made a couple of minor touchups. Replaced an implicit conversion to bool > with an explicit compare to nullptr (per style guide). Removed a no longer > needed dummy return after a (now) noreturn function. > > Testing: mach5 tier1-7 > That testing was with calls to "fatal" for the over/underflow cases and the > sum==0 case. There were no hits. I'm not sure how to construct a test that > would hit those. Looks reasonable. Just one nit comment. src/hotspot/share/opto/type.cpp line 3136: > 3134: > 3135: const TypeRawPtr *TypeRawPtr::make( address bits ) { > 3136: assert( bits != nullptr, "Use TypePtr for null" ); Please, remove spaces after open and before close `()`. ------------- PR Review: https://git.openjdk.org/jdk/pull/21324#pullrequestreview-2346113508 PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1786529286 From shade at openjdk.org Thu Oct 3 17:15:03 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 3 Oct 2024 17:15:03 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/9bb3ef4e...14483b83 src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 335: > 333: assert(!use_ReduceInitialCardMarks(), > 334: "post-barriers are only needed for tightly-coupled initialization stores when ReduceInitialCardMarks is disabled"); > 335: access.set_barrier_data(access.barrier_data() ^ G1C2BarrierPre); I have been looking at this code after integration, and I wonder if `^` is really correct here? Was the intend to remove `G1C2BarrierPre` from the barrier data? What happens if `get_store_barrier` does not actually set it? Do we flip the bit back? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1786573527 From sviswanathan at openjdk.org Thu Oct 3 17:30:45 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 17:30:45 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v19] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 05:09:25 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Merge stashing and re-commit src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 140: > 138: * @param b the second operand. > 139: * @return the saturating addition of the operands. > 140: * @see VectorOperators#SADD This should be SUADD. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 167: > 165: * @param b the second operand. > 166: * @return the saturating difference of the operands. > 167: * @see VectorOperators#SSUB This should be SUSUB. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1786595393 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1786595850 From sviswanathan at openjdk.org Thu Oct 3 17:53:42 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 17:53:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 3 Oct 2024 05:04:35 GMT, Jatin Bhateja wrote: >> I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. > > You will also need additional handling for NPOT vector sizes which is handled by existing fallback implementation. The intrinsic is limited to power of two. We can safely do src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2) for integral types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1786637638 From sviswanathan at openjdk.org Thu Oct 3 18:18:44 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 18:18:44 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v15] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 3 Oct 2024 05:09:22 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338023 > - Review comments resolutions. > - Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. > - Incorporating review and documentation suggestions. > - Jcheck clearance > - Review comments resolution. > - Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > - Documentation suggestions from Paul. > - Review resolutions. > - ... and 8 more: https://git.openjdk.org/jdk/compare/bdfb41f9...6215ab91 Thanks for making the changes. It looks to me that the following checks at lines 2963-2071 in file vectorIntrinsics.cpp is now only needed when lowerSelectFromOp is false. Could you please verify and update accordingly? if (is_floating_point_type(elem_bt)) { if (!arch_supports_vector(Op_AndV, num_elem, index_elem_bt, VecMaskNotUsed) || !arch_supports_vector(cast_vopc, num_elem, index_elem_bt, VecMaskNotUsed) || !arch_supports_vector(Op_Replicate, num_elem, index_elem_bt, VecMaskNotUsed)) { log_if_needed(" ** index wrapping not supported: vlen=%d etype=%s" , num_elem, type2name(elem_bt)); return false; // not supported } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2392036048 From sviswanathan at openjdk.org Thu Oct 3 18:41:40 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 18:41:40 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <95BWoQiYfM-c7esOvzluxwrXbh_sQD9MAUm9-5JhULc=.c3f1f31e-5b13-4698-9481-e02a763b1ce6@github.com> On Thu, 3 Oct 2024 05:04:35 GMT, Jatin Bhateja wrote: >> I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. > > You will also need additional handling for NPOT vector sizes which is handled by existing fallback implementation. Agree, so we can't assume power of two in fallback. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1786691519 From jbhateja at openjdk.org Thu Oct 3 19:05:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 19:05:14 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v16] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Sharpening intrinsic exit check. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/6215ab91..1cca8e24 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=14-15 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Thu Oct 3 19:13:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 19:13:22 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v20] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Typographic error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/952920ae..f5b5e6f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=18-19 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From psandoz at openjdk.org Thu Oct 3 19:21:43 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 3 Oct 2024 19:21:43 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v20] In-Reply-To: References: Message-ID: On Thu, 3 Oct 2024 19:13:22 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Typographic error src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 46: > 44: * @return the smaller of {@code a} and {@code b}. > 45: * @see VectorOperators#UMIN > 46: * @since 24 Remove `@since 24` in the documentation of each method and place in the documentation on the class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1786732581 From jbhateja at openjdk.org Thu Oct 3 19:55:03 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 19:55:03 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v21] In-Reply-To: References: Message-ID: <59ZQPsSgxrGE2E4vGKs0PvO7KJIJdAhKCkZb8OPv4qI=.7762bee0-fcb0-4ab5-ae29-1069d7d64ca4@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Doc fixups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/f5b5e6f5..3beac2db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=19-20 Stats: 26 lines in 1 file changed: 2 ins; 24 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From psandoz at openjdk.org Thu Oct 3 19:55:04 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 3 Oct 2024 19:55:04 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v21] In-Reply-To: <59ZQPsSgxrGE2E4vGKs0PvO7KJIJdAhKCkZb8OPv4qI=.7762bee0-fcb0-4ab5-ae29-1069d7d64ca4@github.com> References: <59ZQPsSgxrGE2E4vGKs0PvO7KJIJdAhKCkZb8OPv4qI=.7762bee0-fcb0-4ab5-ae29-1069d7d64ca4@github.com> Message-ID: On Thu, 3 Oct 2024 19:51:31 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Doc fixups src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 30: > 28: * The class {@code VectorMath} contains methods for performing > 29: * scalar numeric operations in support of vector numeric operations. > 30: * @author Paul Sandoz We no longer use the `@author` tag on newly added classes, can you please remove it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1786769928 From jbhateja at openjdk.org Thu Oct 3 19:55:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 19:55:04 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v20] In-Reply-To: References: Message-ID: On Thu, 3 Oct 2024 19:18:38 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Typographic error > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 46: > >> 44: * @return the smaller of {@code a} and {@code b}. >> 45: * @see VectorOperators#UMIN >> 46: * @since 24 > > Remove `@since 24` in the documentation of each method and place in the documentation on the class. DONE ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1786767732 From sviswanathan at openjdk.org Thu Oct 3 21:07:40 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 21:07:40 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v16] In-Reply-To: <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> Message-ID: On Thu, 3 Oct 2024 19:05:14 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Sharpening intrinsic exit check. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2346694947 From jbhateja at openjdk.org Fri Oct 4 00:01:59 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 4 Oct 2024 00:01:59 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v22] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update VectorMath.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/3beac2db..550eeb9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=20-21 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From kbarrett at openjdk.org Fri Oct 4 04:56:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 4 Oct 2024 04:56:35 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB In-Reply-To: <3iWhHL3A0KJIVBfPcMP--NRbbN00pEtFqWrURscE84E=.68f0d6be-58bc-4c12-af83-f7f9f34601ed@github.com> References: <3iWhHL3A0KJIVBfPcMP--NRbbN00pEtFqWrURscE84E=.68f0d6be-58bc-4c12-af83-f7f9f34601ed@github.com> Message-ID: On Thu, 3 Oct 2024 16:38:34 GMT, Vladimir Kozlov wrote: >> Please review this change to TypeRawPtr::add_offset to prevent a compiler from >> inferring things based on prior pointer arithmetic not invoking UB. As noted in >> the bug report, clang is actually doing this. >> >> To accomplish this, changed to integral arithmetic. Also added over/underflow >> checks. >> >> Also made a couple of minor touchups. Replaced an implicit conversion to bool >> with an explicit compare to nullptr (per style guide). Removed a no longer >> needed dummy return after a (now) noreturn function. >> >> Testing: mach5 tier1-7 >> That testing was with calls to "fatal" for the over/underflow cases and the >> sum==0 case. There were no hits. I'm not sure how to construct a test that >> would hit those. > > src/hotspot/share/opto/type.cpp line 3136: > >> 3134: >> 3135: const TypeRawPtr *TypeRawPtr::make( address bits ) { >> 3136: assert( bits != nullptr, "Use TypePtr for null" ); > > Please, remove spaces after open and before close `()`. I'm not fond of those spaces, but they follow the style used throughout this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1787152954 From duke at openjdk.org Fri Oct 4 06:30:11 2024 From: duke at openjdk.org (Daniel Skantz) Date: Fri, 4 Oct 2024 06:30:11 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v13] In-Reply-To: References: Message-ID: > This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). > > We check two invariants. > a) Bailouts should be successful starting from any given `failing()` check. > b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). > > a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. > > The added flag should not have any effect in debug mode. > > Testing: > > T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: use failing_internal instead; add a const; clarify skip ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19646/files - new: https://git.openjdk.org/jdk/pull/19646/files/d91bc068..cb748fb8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=11-12 Stats: 19 lines in 8 files changed: 1 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/19646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19646/head:pull/19646 PR: https://git.openjdk.org/jdk/pull/19646 From aboldtch at openjdk.org Fri Oct 4 06:58:39 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 4 Oct 2024 06:58:39 GMT Subject: Integrated: 8341451: Remove C2HandleAnonOMOwnerStub In-Reply-To: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> References: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> Message-ID: <-0xVtaAqP_jhjHJ9G7Jgxm59BXbu6X4t0Z2b0JO94us=.b5b55e04-7046-42c6-ab5c-367aa70e0492@github.com> On Thu, 3 Oct 2024 07:09:53 GMT, Axel Boldt-Christmas wrote: > [JDK-8319796](https://bugs.openjdk.org/browse/JDK-8319796) has been implemented on all platforms which had previous C2 implementations of LM_LIGHTWEIGHT that made use of C2HandleAnonOMOwnerStub. > > The declaration is still left in the shared code, an a couple of platforms have the definitions still lingering. I propose we remove them. This pull request has now been integrated. Changeset: 3f420fac Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/3f420fac842153372e17222e7153cbc71c5789a7 Stats: 70 lines in 3 files changed: 0 ins; 70 del; 0 mod 8341451: Remove C2HandleAnonOMOwnerStub Reviewed-by: fyang, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/21319 From aboldtch at openjdk.org Fri Oct 4 06:58:38 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 4 Oct 2024 06:58:38 GMT Subject: RFR: 8341451: Remove C2HandleAnonOMOwnerStub In-Reply-To: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> References: <7-q6m1OQynLYt0BjjtuD3fqFVOBywLQv9tzVjJs_XKI=.ffa07e67-eedf-4dcf-a18a-cf02434e0503@github.com> Message-ID: On Thu, 3 Oct 2024 07:09:53 GMT, Axel Boldt-Christmas wrote: > [JDK-8319796](https://bugs.openjdk.org/browse/JDK-8319796) has been implemented on all platforms which had previous C2 implementations of LM_LIGHTWEIGHT that made use of C2HandleAnonOMOwnerStub. > > The declaration is still left in the shared code, an a couple of platforms have the definitions still lingering. I propose we remove them. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21319#issuecomment-2392957973 From rrich at openjdk.org Fri Oct 4 08:28:41 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 4 Oct 2024 08:28:41 GMT Subject: RFR: 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 13:37:18 GMT, Richard Reingruber wrote: > With `-XX:+PrintInterpreter` the instructions of the generated interpreter are printed at startup. > > But also after startup when printing an interpreted frame the instructions of the corresponding `InterpreterCodelet` are also printed. This can become a problem with `-Xlog:continuations=trace` because large sections of the interpreter are printed repeatedly and redundantly. > > With this change the instructions are only printed when printing all codelets, i.e. normally once at start-up. > > I've tested with the reproducer from the JBS-Issue. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21158#issuecomment-2393129872 From rrich at openjdk.org Fri Oct 4 08:28:42 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 4 Oct 2024 08:28:42 GMT Subject: Integrated: 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets In-Reply-To: References: Message-ID: <8fDEjlAycetKYDoWvOI9_2IeeX4xVH_DGDmZWDLmMCM=.089fa2b5-94c2-4bf1-8318-20cd7a86a6a9@github.com> On Tue, 24 Sep 2024 13:37:18 GMT, Richard Reingruber wrote: > With `-XX:+PrintInterpreter` the instructions of the generated interpreter are printed at startup. > > But also after startup when printing an interpreted frame the instructions of the corresponding `InterpreterCodelet` are also printed. This can become a problem with `-Xlog:continuations=trace` because large sections of the interpreter are printed repeatedly and redundantly. > > With this change the instructions are only printed when printing all codelets, i.e. normally once at start-up. > > I've tested with the reproducer from the JBS-Issue. This pull request has now been integrated. Changeset: a63ac5a6 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/a63ac5a699a5d40c76d14f94a502b8003753f4dd Stats: 10 lines in 3 files changed: 7 ins; 0 del; 3 mod 8340792: -XX:+PrintInterpreter: instructions should only be printed if printing all InterpreterCodelets Reviewed-by: mdoerr, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/21158 From mli at openjdk.org Fri Oct 4 08:45:43 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Oct 2024 08:45:43 GMT Subject: RFR: 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp Message-ID: Hi, Can you help to review this simple patch to add add t3-t6? I also modified some particularly obvious places which use x28-x31 but indeed intending t3-t6, but keep others as it is, because it's more consistent with context code. Thanks! ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/21349/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21349&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340880 Stats: 14 lines in 2 files changed: 4 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21349/head:pull/21349 PR: https://git.openjdk.org/jdk/pull/21349 From rcastanedalo at openjdk.org Fri Oct 4 09:20:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Oct 2024 09:20:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Thu, 3 Oct 2024 17:12:04 GMT, Aleksey Shipilev wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: >> >> - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion >> - riscv port refactor >> - Remove temporary support code >> - Merge jdk-24+17 >> - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization >> - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions >> - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes >> - Merge jdk-24+16 >> - Ensure that detected encode-and-store patterns are matched >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - ... and 43 more: https://git.openjdk.org/jdk/compare/0165cb32...14483b83 > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 335: > >> 333: assert(!use_ReduceInitialCardMarks(), >> 334: "post-barriers are only needed for tightly-coupled initialization stores when ReduceInitialCardMarks is disabled"); >> 335: access.set_barrier_data(access.barrier_data() ^ G1C2BarrierPre); > > I have been looking at this code after integration, and I wonder if `^` is really correct here? Was the intend to remove `G1C2BarrierPre` from the barrier data? What happens if `get_store_barrier` does not actually set it? Do we flip the bit back? Yes, the intend (and actual effect) is to remove `G1C2BarrierPre` from the barrier data. Using an XOR (`^`) is correct because at that program point `G1C2BarrierPre` is guaranteed to be set. This is because an `access` corresponding to a tightly-coupled initialization store is always of type `C2OptAccess`, hence `!access.is_parse_access()` and `get_store_barrier(access)` trivially returns `G1C2BarrierPre | G1C2BarrierPost`. Having said this, it would be clearly less convoluted to simply clear `G1C2BarrierPre` instead of flipping it. I will file a RFE, thanks. As a side note, this complexity is necessary to handle `!ReduceInitialCardMarks`. I keep wondering if the benefit of being able to disable `ReduceInitialCardMarks` [1,2,3] is worth the significant complexity required in the GC-C2 interface to deal with it. [1] https://docs.oracle.com/en/java/javase/23/gctuning/garbage-first-garbage-collector-tuning.html [2] https://bugs.openjdk.org/browse/JDK-8166899 [3] https://bugs.openjdk.org/browse/JDK-8167077 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1787425169 From kbarrett at openjdk.org Fri Oct 4 09:17:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 4 Oct 2024 09:17:35 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB In-Reply-To: References: <3iWhHL3A0KJIVBfPcMP--NRbbN00pEtFqWrURscE84E=.68f0d6be-58bc-4c12-af83-f7f9f34601ed@github.com> Message-ID: <_pWIfp7Z686EEpIHxA1w1RCNHCO-_QP1_ZZbk5BPijQ=.8d026cef-6f44-46a4-94eb-510e281f8f9e@github.com> On Fri, 4 Oct 2024 04:53:47 GMT, Kim Barrett wrote: >> src/hotspot/share/opto/type.cpp line 3136: >> >>> 3134: >>> 3135: const TypeRawPtr *TypeRawPtr::make( address bits ) { >>> 3136: assert( bits != nullptr, "Use TypePtr for null" ); >> >> Please, remove spaces after open and before close `()`. > > I'm not fond of those spaces, but they follow the style used throughout this file. Although it looks like only 1/3 of the asserts in this file have extra whitespace, including the one being touched here. So sure, I can remove the extraneous whitespace from this function, since touching it anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1787421941 From kbarrett at openjdk.org Fri Oct 4 09:27:52 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 4 Oct 2024 09:27:52 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: Message-ID: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> > Please review this change to TypeRawPtr::add_offset to prevent a compiler from > inferring things based on prior pointer arithmetic not invoking UB. As noted in > the bug report, clang is actually doing this. > > To accomplish this, changed to integral arithmetic. Also added over/underflow > checks. > > Also made a couple of minor touchups. Replaced an implicit conversion to bool > with an explicit compare to nullptr (per style guide). Removed a no longer > needed dummy return after a (now) noreturn function. > > Testing: mach5 tier1-7 > That testing was with calls to "fatal" for the over/underflow cases and the > sum==0 case. There were no hits. I'm not sure how to construct a test that > would hit those. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: remove surrounding whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21324/files - new: https://git.openjdk.org/jdk/pull/21324/files/48833715..cc1f2ac8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21324&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21324&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21324.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21324/head:pull/21324 PR: https://git.openjdk.org/jdk/pull/21324 From rcastanedalo at openjdk.org Fri Oct 4 09:37:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Oct 2024 09:37:53 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 09:17:47 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 335: >> >>> 333: assert(!use_ReduceInitialCardMarks(), >>> 334: "post-barriers are only needed for tightly-coupled initialization stores when ReduceInitialCardMarks is disabled"); >>> 335: access.set_barrier_data(access.barrier_data() ^ G1C2BarrierPre); >> >> I have been looking at this code after integration, and I wonder if `^` is really correct here? Was the intend to remove `G1C2BarrierPre` from the barrier data? What happens if `get_store_barrier` does not actually set it? Do we flip the bit back? > > Yes, the intend (and actual effect) is to remove `G1C2BarrierPre` from the barrier data. Using an XOR (`^`) is correct because at that program point `G1C2BarrierPre` is guaranteed to be set. This is because an `access` corresponding to a tightly-coupled initialization store is always of type `C2OptAccess`, hence `!access.is_parse_access()` and `get_store_barrier(access)` trivially returns `G1C2BarrierPre | G1C2BarrierPost`. Having said this, it would be clearly less convoluted to simply clear `G1C2BarrierPre` instead of flipping it. I will file a RFE, thanks. > > As a side note, this complexity is necessary to handle `!ReduceInitialCardMarks`. I keep wondering if the benefit of being able to disable `ReduceInitialCardMarks` [1,2,3] is worth the significant complexity required in the GC-C2 interface to deal with it. > > [1] https://docs.oracle.com/en/java/javase/23/gctuning/garbage-first-garbage-collector-tuning.html > [2] https://bugs.openjdk.org/browse/JDK-8166899 > [3] https://bugs.openjdk.org/browse/JDK-8167077 Reported here: [JDK-8341525](https://bugs.openjdk.org/browse/JDK-8341525). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1787448241 From duke at openjdk.org Fri Oct 4 14:25:45 2024 From: duke at openjdk.org (Daniel Skantz) Date: Fri, 4 Oct 2024 14:25:45 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v12] In-Reply-To: <2F_C30zYunRTYqFh4cphJcHHyosVVyiKjESHiBGjRlE=.b7f5f6a5-9848-4972-8a7d-ccf38c42be7d@github.com> References: <0Ce3kXIzVDXuzpKGBzekEEOuJftpvkCvLwDkVOtpPR0=.ec2c1c1b-2b78-497d-97d3-a613ef736d2f@github.com> <2F_C30zYunRTYqFh4cphJcHHyosVVyiKjESHiBGjRlE=.b7f5f6a5-9848-4972-8a7d-ccf38c42be7d@github.com> Message-ID: On Wed, 2 Oct 2024 11:55:42 GMT, Christian Hagedorn wrote: >> Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: >> >> left over > > src/hotspot/share/opto/compile.hpp line 838: > >> 836: const CompilationFailureInfo* first_failure_details() const { return _first_failure_details; } >> 837: >> 838: bool failing(DEBUG_ONLY(bool no_stress_bailout = false)) { > > It's somehow difficult to read what `failing(false/true)` now exactly mean. When having `failing(true)`, don't we get the same behavior as if we call `failing_internal()`? If `failing_internal()` is false, then we would only return false because we are not entering > > if (StressBailout && !no_stress_bailout) { > return fail_randomly(); > } > > So, I'm wondering if we cannot just use `failing_internal()` instead of `failing(true)` and remove the parameter completely? Thanks for the suggestions! Updated the PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1787800936 From duke at openjdk.org Fri Oct 4 15:04:49 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 15:04:49 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v2] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: <34e0yknYMI9CrosG-mI80E2BB6BkyIYF_uXhwwZtHbQ=.3ae4c4d9-47f6-4e1d-8fe8-cc9dc30b6baf@github.com> > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Libgraal does not allow _can_call_java. - rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava - added CompilerThreadCanCallJavaScope ------------- Changes: https://git.openjdk.org/jdk/pull/21285/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=01 Stats: 132 lines in 6 files changed: 116 ins; 4 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From duke at openjdk.org Fri Oct 4 15:18:38 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 15:18:38 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v2] In-Reply-To: <34e0yknYMI9CrosG-mI80E2BB6BkyIYF_uXhwwZtHbQ=.3ae4c4d9-47f6-4e1d-8fe8-cc9dc30b6baf@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <34e0yknYMI9CrosG-mI80E2BB6BkyIYF_uXhwwZtHbQ=.3ae4c4d9-47f6-4e1d-8fe8-cc9dc30b6baf@github.com> Message-ID: <6fWXm3zv1NNYxvEd6zlefj1CH7U9gVxatL2i18wM8jA=.3dc9115e-32bd-4903-83e2-4e253fb61062@github.com> On Fri, 4 Oct 2024 15:04:49 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Libgraal does not allow _can_call_java. > - rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava > - added CompilerThreadCanCallJavaScope I have simplified the `_can_call_java` transitions. The only feature in the libjvmci compiler that requires Java calls is Truffle compiler, which utilizes JNI to invoke the Truffle runtime methods. Given that we now have `CompilerThreadCanCallJavaScope`, which Truffle can use to explicitly enable Java calls, we can safely disable Java calls by default for the libjvmci compiler. For the Java JVMCI compiler, we still need to permit Java calls to accommodate upcalls to the Graal compiler and for InterpreterRuntime while running the Java JVMCI compiler. The simplification eliminates the need for `TriBool` for `_can_call_java`; it can remain a `bool`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21285#issuecomment-2393939688 From duke at openjdk.org Fri Oct 4 15:25:13 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 15:25:13 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: UseJVMCINativeLibrary check must be guarded by JVMCI_ONLY. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21285/files - new: https://git.openjdk.org/jdk/pull/21285/files/dfd72497..f687c82e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From dnsimon at openjdk.org Fri Oct 4 15:34:36 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 4 Oct 2024 15:34:36 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> Message-ID: On Fri, 4 Oct 2024 15:25:13 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > UseJVMCINativeLibrary check must be guarded by JVMCI_ONLY. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 193: > 191: __block_can_call_java = CompilerThread::cast(thread)->can_call_java(); \ > 192: } else { \ > 193: __block_can_call_java = false; \ For non-CompilerThreads, `__block_can_call_java` should be true I think since they are not affected by `-Xcomp` or `-Xbatch`. A TruffleCompileThread is such a thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21285#discussion_r1787892422 From duke at openjdk.org Fri Oct 4 16:02:38 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 16:02:38 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> Message-ID: On Fri, 4 Oct 2024 15:31:52 GMT, Doug Simon wrote: >> Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: >> >> UseJVMCINativeLibrary check must be guarded by JVMCI_ONLY. > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 193: > >> 191: __block_can_call_java = CompilerThread::cast(thread)->can_call_java(); \ >> 192: } else { \ >> 193: __block_can_call_java = false; \ > > For non-CompilerThreads, `__block_can_call_java` should be true I think since they are not affected by `-Xcomp` or `-Xbatch`. A TruffleCompileThread is such a thread. For non-compiler thread the new value is never used because [CompilerThreadCanCallJava::update](https://github.com/openjdk/jdk/blob/f687c82ef9ede1d9d02ca0965c896bcf658c450a/src/hotspot/share/jvmci/jvmci.cpp#L58) does not modify the `CompilerThread::_can_call_java` value in this case. However, using `true` may improve readability. I will change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21285#discussion_r1787925558 From kvn at openjdk.org Fri Oct 4 16:06:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 4 Oct 2024 16:06:37 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: <_UfXDCsUh7XlJS1APDR6uEdAFZyDktB56D3l5idS0OA=.f6ed2d6d-251d-4237-ad0d-3dd8298e538b@github.com> On Fri, 4 Oct 2024 09:27:52 GMT, Kim Barrett wrote: >> Please review this change to TypeRawPtr::add_offset to prevent a compiler from >> inferring things based on prior pointer arithmetic not invoking UB. As noted in >> the bug report, clang is actually doing this. >> >> To accomplish this, changed to integral arithmetic. Also added over/underflow >> checks. >> >> Also made a couple of minor touchups. Replaced an implicit conversion to bool >> with an explicit compare to nullptr (per style guide). Removed a no longer >> needed dummy return after a (now) noreturn function. >> >> Testing: mach5 tier1-7 >> That testing was with calls to "fatal" for the over/underflow cases and the >> sum==0 case. There were no hits. I'm not sure how to construct a test that >> would hit those. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove surrounding whitespace Good. Side note: please enable GHA testing for your repo. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21324#pullrequestreview-2348418050 PR Comment: https://git.openjdk.org/jdk/pull/21324#issuecomment-2394024252 From duke at openjdk.org Fri Oct 4 16:07:14 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 16:07:14 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v4] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: Set __block_can_call_java to true for non compiler threads. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21285/files - new: https://git.openjdk.org/jdk/pull/21285/files/f687c82e..346f8982 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From duke at openjdk.org Fri Oct 4 16:34:54 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 16:34:54 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v5] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: <4X5xzYDU45UnuXtoC0sEJ7dF5seNqQDJxhdNrdktsV8=.03a8e2b5-7e43-44b7-92b4-b4440dc26770@github.com> > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: Simplified C2V_BLOCK. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21285/files - new: https://git.openjdk.org/jdk/pull/21285/files/346f8982..e07d4448 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=03-04 Stats: 8 lines in 1 file changed: 0 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From qamai at openjdk.org Sun Oct 6 08:32:20 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 6 Oct 2024 08:32:20 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: [vectorapi] Refactor VectorShuffle implementation ------------- Changes: https://git.openjdk.org/jdk/pull/21042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=01 Stats: 5013 lines in 64 files changed: 2737 ins; 1068 del; 1208 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From qamai at openjdk.org Sun Oct 6 10:11:48 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 6 Oct 2024 10:11:48 GMT Subject: RFR: 8341102: Add element type information to vector types [v3] In-Reply-To: References: Message-ID: > Hi, > > This patch adds the type information of each element in a `TypeVect`. This helps constant folding vectors as well as strength reduction of several complex operations such as `Rearrange`. Some notable points: > > - I only implement `ConV` rule on x86, looking at other architectures it seems that I would not only need to implement the `ConV` implementations, but several other rules that match `ReplicateNode` of a constant. > - I changed the implementation of an array constant in `constanttable`, I think working with `jbyte` is easier as it allows `memcpy` and at this point, we are close to the metal anyway. > - Constant folding for a `VectorUnboxNode`, this is special because an element of a normal stable array is only constant if it is non-zero, so implementing constant folding on a load node seems less trivial. > - Memory fences because `Vector::payload` is a final field and we should respect that. > - Several places expect a `const Type*` when in reality it expects a `BasicType`, I refactor that so that the intent is clearer and there is less room for possible errors, this is needed because `byte`, `short` and `int` share the same kind of `const Type*`. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: add element types to vector types ------------- Changes: https://git.openjdk.org/jdk/pull/21229/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21229&range=02 Stats: 1431 lines in 39 files changed: 887 ins; 330 del; 214 mod Patch: https://git.openjdk.org/jdk/pull/21229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21229/head:pull/21229 PR: https://git.openjdk.org/jdk/pull/21229 From qamai at openjdk.org Sun Oct 6 10:27:35 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 6 Oct 2024 10:27:35 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Sun, 6 Oct 2024 08:32:20 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > [vectorapi] Refactor VectorShuffle implementation I have adapted the patch in accordance with https://github.com/openjdk/jdk/pull/20634, I moved the index wrapping into C2 instead of making it a separate step as I think it seems clearer. Also, I think in the future we can eliminate this step so putting it in C2 would make the progress easier. Please take a look, thanks a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2395383093 From chagedorn at openjdk.org Mon Oct 7 05:27:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Oct 2024 05:27:44 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v13] In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 06:30:11 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > use failing_internal instead; add a const; clarify skip Thanks for the update, looks good! Some minor code style comments regarding existing code that you touched which I think you could also fix while at it. src/hotspot/share/opto/compile.cpp line 4375: > 4373: > 4374: Compile::TracePhase::~TracePhase() { > 4375: if (_compile->failing_internal()) return; // timing code, not stressing bailouts. While at it, I suggest to add braces: Suggestion: if (_compile->failing_internal()) { return; // timing code, not stressing bailouts. } Same below at some places. src/hotspot/share/opto/graphKit.cpp line 343: > 341: // regions do not appear except in this function, and in use_exception_state. > 342: void GraphKit::combine_exception_states(SafePointNode* ex_map, SafePointNode* phi_map) { > 343: if (failing_internal()) return; // dying anyway... Suggestion: if (failing_internal()) { return; // dying anyway... } src/hotspot/share/opto/graphKit.cpp line 2059: > 2057: bool must_throw, > 2058: bool keep_exact_action) { > 2059: if (failing_internal()) stop(); Suggestion: if (failing_internal()) { stop(); } src/hotspot/share/opto/loopnode.cpp line 4938: > 4936: > 4937: PhaseIdealLoop phase_verify(_igvn, this); > 4938: if (C->failing_internal()) return; Suggestion: if (C->failing_internal()) { return; } src/hotspot/share/opto/output.cpp line 3394: > 3392: > 3393: // Emitting into the scratch buffer should not fail > 3394: assert (!C->failing_internal() || C->failure_is_artificial(), "Must not have pending failure. Reason is: %s", C->failure_reason()); Suggestion: assert(!C->failing_internal() || C->failure_is_artificial(), "Must not have pending failure. Reason is: %s", C->failure_reason()); src/hotspot/share/opto/parse.hpp line 429: > 427: > 428: // Must this parse be aborted? > 429: bool failing() { return C->failing_internal(); } // might have cascading effects, not stressing bailouts for now. Can be made const: Suggestion: bool failing() const { return C->failing_internal(); } // might have cascading effects, not stressing bailouts for now. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19646#pullrequestreview-2350875835 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789512799 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789513200 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789513401 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789513596 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789513864 PR Review Comment: https://git.openjdk.org/jdk/pull/19646#discussion_r1789512066 From thartmann at openjdk.org Mon Oct 7 05:46:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Oct 2024 05:46:40 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: On Fri, 4 Oct 2024 09:27:52 GMT, Kim Barrett wrote: >> Please review this change to TypeRawPtr::add_offset to prevent a compiler from >> inferring things based on prior pointer arithmetic not invoking UB. As noted in >> the bug report, clang is actually doing this. >> >> To accomplish this, changed to integral arithmetic. Also added over/underflow >> checks. >> >> Also made a couple of minor touchups. Replaced an implicit conversion to bool >> with an explicit compare to nullptr (per style guide). Removed a no longer >> needed dummy return after a (now) noreturn function. >> >> Testing: mach5 tier1-7 >> That testing was with calls to "fatal" for the over/underflow cases and the >> sum==0 case. There were no hits. I'm not sure how to construct a test that >> would hit those. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove surrounding whitespace What about using `intptr_t` for `TypeRawPtr::_bits` instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21324#issuecomment-2395956241 From thartmann at openjdk.org Mon Oct 7 06:03:35 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Oct 2024 06:03:35 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: <4ckwGsbjH0VDbGmBBTuG4HUc6ARwbISQ7L8xsVCeqDs=.28d696f2-47ba-4323-855d-e71369242876@github.com> On Wed, 2 Oct 2024 13:58:14 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21303#pullrequestreview-2350926679 From thartmann at openjdk.org Mon Oct 7 06:38:37 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Oct 2024 06:38:37 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 21:31:12 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments, use explicit opcode comparisons for LShift nodes Testing all passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2396023249 From roland at openjdk.org Mon Oct 7 07:55:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 7 Oct 2024 07:55:41 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 21:31:12 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments, use explicit opcode comparisons for LShift nodes Marked as reviewed by roland (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20754#pullrequestreview-2351182436 From roland at openjdk.org Mon Oct 7 07:55:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 7 Oct 2024 07:55:41 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 06:36:06 GMT, Tobias Hartmann wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments, use explicit opcode comparisons for LShift nodes > > Testing all passed. @TobiHartmann @chhagedorn thanks for running tests ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2396180050 From thartmann at openjdk.org Mon Oct 7 07:58:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Oct 2024 07:58:39 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v21] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <1R6tWNQF3WSJ4joPaoWR3N_JKy0W-BB0Zntg00N-mlU=.e8b25703-93ea-42be-ba1d-46e06c17987c@github.com> On Thu, 3 Oct 2024 16:31:15 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > correctly verify outputs with custom @Run methods `compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java` times out in our testing both with `-XX:StressLongCountedLoop=200000000` and with `-XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0`: "main" #1 [2771172] prio=5 os_prio=0 cpu=500187.70ms elapsed=503.08s allocated=6554K defined_classes=227 tid=0x0000ffff9002d550 nid=2771172 runnable [0x0000ffff972bf000] java.lang.Thread.State: RUNNABLE Thread: 0x0000ffff9002d550 [0x2a48e4] State: _at_safepoint _at_poll_safepoint 1 JavaThread state: _thread_blocked at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:93) at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.runTestIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:103) at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base at 24-internal/DirectMethodHandle$Holder) at java.lang.invoke.LambdaForm$MH/0x0000ffff58460870.invoke(java.base at 24-internal/LambdaForm$MH) at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base at 24-internal/Invokers$Holder) at jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(java.base at 24-internal/DirectMethodHandleAccessor.java:154) at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(java.base at 24-internal/DirectMethodHandleAccessor.java:104) at java.lang.reflect.Method.invoke(java.base at 24-internal/Method.java:573) at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159) at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:98) at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:89) at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861) at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252) at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2396187667 From luhenry at openjdk.org Mon Oct 7 08:22:39 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 7 Oct 2024 08:22:39 GMT Subject: RFR: 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 08:39:56 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this simple patch to add add t3-t6? > I also modified some particularly obvious places which use x28-x31 but indeed intending t3-t6, but keep others as it is, because it's more consistent with context code. > > Thanks! Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21349#pullrequestreview-2351243290 From fyang at openjdk.org Mon Oct 7 08:29:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 7 Oct 2024 08:29:35 GMT Subject: RFR: 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 08:39:56 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this simple patch to add add t3-t6? > I also modified some particularly obvious places which use x28-x31 but indeed intending t3-t6, but keep others as it is, because it's more consistent with context code. > > Thanks! Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21349#pullrequestreview-2351259208 From duke at openjdk.org Mon Oct 7 08:32:22 2024 From: duke at openjdk.org (Daniel Skantz) Date: Mon, 7 Oct 2024 08:32:22 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v14] In-Reply-To: References: Message-ID: > This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). > > We check two invariants. > a) Bailouts should be successful starting from any given `failing()` check. > b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). > > a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. > > The added flag should not have any effect in debug mode. > > Testing: > > T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19646/files - new: https://git.openjdk.org/jdk/pull/19646/files/cb748fb8..b6eb9a84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19646&range=12-13 Stats: 14 lines in 5 files changed: 8 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19646/head:pull/19646 PR: https://git.openjdk.org/jdk/pull/19646 From chagedorn at openjdk.org Mon Oct 7 08:33:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Oct 2024 08:33:41 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 21:31:12 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments, use explicit opcode comparisons for LShift nodes Good updates, it's now easy to follow the logic and understand the code. I have some more comments/suggestions. src/hotspot/share/opto/addnode.cpp line 429: > 427: ? (Node*) phase->intcon((jint) multiplier) // intentional type narrowing to allow overflow at max_jint > 428: : (Node*) phase->longcon(multiplier); > 429: return MulNode::make(con, in(2), bt); Could you use `in2` here? Suggestion: return MulNode::make(con, in2, bt); src/hotspot/share/opto/addnode.cpp line 437: > 435: // Match `a + a`, extract `a` and `2` > 436: Node* AddNode::find_simple_addition_pattern(Node* n, BasicType bt, jlong* multiplier) { > 437: // Look for pattern: AddNode(a, a) Could also be added as method comment above. Same for other `find*` methods. src/hotspot/share/opto/addnode.cpp line 446: > 444: } > 445: > 446: // Match `a << CON`, extract `a` and `1 << CON` "extract" was a bit confusing at first. So, what you mean is return `a` and set `multiplier` to `1 << CON`. Maybe you want to update the comment to make this more explicit? Maybe something like that: // Try to match `a << CON`. On success, return `a` and set `1 << CON` as `multiplier`. You could do the same for the other `find*` methods. src/hotspot/share/opto/addnode.cpp line 547: > 545: > 546: return nullptr; > 547: } I think you could remove the new lines for more compactness here: Suggestion: } return nullptr; } return nullptr; } src/hotspot/share/opto/addnode.cpp line 567: > 565: > 566: // We can't simply return the lshift node even if ((a << CON) - a) + a cancels out. Ideal() must return a new node. > 567: *multiplier = ((jlong) 1 << con->get_int()) - 1; Can't this be an `Identity()` transformation where you can return existing nodes? src/hotspot/share/opto/addnode.hpp line 46: > 44: virtual uint hash() const; > 45: > 46: private: Can be removed since these methods are already private by default here since it's a `class` and not a `struct`. Suggestion: test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 61: > 59: @Arguments(values = {Argument.RANDOM_EACH}) > 60: @IR(counts = { IRNode.ADD_I, "1" }) > 61: @IR(failOn = {IRNode.LSHIFT_I}) Generally, for single strings, you can remove the braces: Suggestion: @IR(failOn = IRNode.LSHIFT_I) test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 64: > 62: private static void addTo2(int a) { > 63: int sum = a + a; // Simple additions like a + a should be kept as-is > 64: verifyResult(a, 2, sum); Generally, we should move all verification code out of the `@Test` methods to avoid side effects and worrying about whether the result checking is now compiled or not (we must ensure that the result checking code is interpreted to catch wrong executions with miscompiled code). I suggest the following (not tested): Introduce a `@Run` method, which is never compiled, for your `@Test` methods. You can still call methods from there but then you should ensure that they are not compiled either with `@DontCompile`: static final Random RANDOM = Utils.getRandomInstance(); ... @DontCompile private static void verifyResult(int base, int factor, int observed) { ... } ... @Test @IR(counts = {IRNode.ADD_I, "1", IRNode.LSHIFT_I, "1"}) private static int addTo3(int a) { return a + a + a; // a*3 => (a<<1) + a } @Run(test = "addTo3") void runAddTo3() { int a = RANDOM.nextInt(); int result = addTo3(a); verifyResult(a, 3, result); } Since the tests are all very similar and require the same setup and verification, you could even go a step further and provide a single shared `@Run` method which is possible: @Test @IR(counts = {IRNode.ADD_I, "1", IRNode.LSHIFT_I, "1"}) private static int addTo3(int a) { return a + a + a; // a*3 => (a<<1) + a } @Test @IR(failOn = IRNode.ADD_I) @IR(counts = {IRNode.LSHIFT_I, "1"}) private static int addTo4(int a) { return a + a + a + a; // a*4 => a<<2 } @Run(test = {"addTo3", "addTo4"}) // List all @Test methods here and make sure you call all of them below. void runTests() { int a = RANDOM.nextInt(); verifyResult(a, 3, addTo3(a)); verifyResult(a, 4, addTo4(a)); } This also allows you to run with some more edge case values like `a == 0` or `a == min_int` etc. which gives us even some more confidence. test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 70: > 68: @Arguments(values = {Argument.RANDOM_EACH}) > 69: @IR(counts = { IRNode.ADD_I, "1" }) > 70: @IR(counts = {IRNode.LSHIFT_I, "1"}) Generally, you can merge these together: Suggestion: @IR(counts = {IRNode.ADD_I, "1", IRNode.LSHIFT_I, "1"}) ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20754#pullrequestreview-2351092284 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789654678 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789657284 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789679835 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789700721 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789725214 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789722502 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789759013 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789744964 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1789760417 From chagedorn at openjdk.org Mon Oct 7 09:01:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Oct 2024 09:01:41 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v14] In-Reply-To: References: Message-ID: <5OZ8ScEw3a_dazfA83RTIOPQdBbn8ZctXj8mMbvlZv0=.23fe019f-7b74-4e58-9a77-ca183f5e4a9c@github.com> On Mon, 7 Oct 2024 08:32:22 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Looks good, thanks for the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19646#pullrequestreview-2351336877 From duke at openjdk.org Mon Oct 7 09:10:52 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 09:10:52 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v4] In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: <-VhlbMrk05I4TJjr4U_ejcmX02d8ywyaUyQlv8diCHE=.ccd4b85a-3cd4-4f04-95a0-ae9dd59a8c0f@github.com> On Tue, 24 Sep 2024 16:43:57 GMT, Ant?n Seoane wrote: >> Hi all, >> >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: >> >> - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. >> - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. >> - Additionally, defaults may target a specific log level. >> >> Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. >> >> In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. >> >> Please consider this PR, and thanks! > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Removed whitespace After some review and discussion, I am closing this PR and opening a new (simplified) version of this that aligns with the needed use cases in [8341622: Tag-specific disabled default decorators for UnifiedLogging](https://github.com/openjdk/jdk/pull/21383). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2396352901 From duke at openjdk.org Mon Oct 7 09:10:52 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 09:10:52 GMT Subject: Withdrawn: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: <6FI0_lFOFEAktdR8fDEyglCSi_mL_zZv8QJdDvTJ5L8=.e0a68b15-9677-48df-8b0a-f263b2357bc5@github.com> On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20988 From mli at openjdk.org Mon Oct 7 09:32:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 7 Oct 2024 09:32:39 GMT Subject: RFR: 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:20:21 GMT, Ludovic Henry wrote: >> Hi, >> >> Can you help to review this simple patch to add add t3-t6? >> I also modified some particularly obvious places which use x28-x31 but indeed intending t3-t6, but keep others as it is, because it's more consistent with context code. >> >> Thanks! > > Marked as reviewed by luhenry (Committer). Thanks @luhenry @RealFYang for your reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21349#issuecomment-2396403264 From mli at openjdk.org Mon Oct 7 09:35:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 7 Oct 2024 09:35:39 GMT Subject: Integrated: 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 08:39:56 GMT, Hamlin Li wrote: > Hi, > > Can you help to review this simple patch to add add t3-t6? > I also modified some particularly obvious places which use x28-x31 but indeed intending t3-t6, but keep others as it is, because it's more consistent with context code. > > Thanks! This pull request has now been integrated. Changeset: 28977972 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/28977972a0129892543222eada4dc99f4cd62574 Stats: 14 lines in 2 files changed: 4 ins; 0 del; 10 mod 8340880: RISC-V: add t3-t6 alias into assemler_riscv.hpp Reviewed-by: luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/21349 From rcastanedalo at openjdk.org Mon Oct 7 09:44:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 7 Oct 2024 09:44:38 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:56:40 GMT, Ant?n Seoane wrote: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Labeling the PR as `hotspot-compiler` because it proposes disabling default decorators of `jit+inlining`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21383#issuecomment-2396429323 From jsjolen at openjdk.org Mon Oct 7 11:20:42 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 7 Oct 2024 11:20:42 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:56:40 GMT, Ant?n Seoane wrote: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. src/hotspot/share/logging/logDecorators.cpp line 110: > 108: bool LogDecorators::has_disabled_default_decorators(const LogSelection& selection, const DefaultUndecoratedSelection* defaults, size_t defaults_count) { > 109: for (size_t i = 0; i < defaults_count; ++i) { > 110: auto current_default = defaults[i]; Please expand with deduced type. src/hotspot/share/logging/logSelectionList.cpp line 62: > 60: } > 61: } > 62: return LogDecorators(0); Here I'd like to see either an explanation of 0 as `LogDecorators(0 /* comment */)` or a meaningful name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1790018720 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1790025195 From duke at openjdk.org Mon Oct 7 11:46:19 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 11:46:19 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v2] In-Reply-To: References: Message-ID: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Review changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/3e0a0613..e1878be5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From duke at openjdk.org Mon Oct 7 11:46:19 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 11:46:19 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v2] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:17:18 GMT, Johan Sj?len wrote: >> Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: >> >> Review changes > > src/hotspot/share/logging/logSelectionList.cpp line 62: > >> 60: } >> 61: } >> 62: return LogDecorators(0); > > Here I'd like to see either an explanation of 0 as `LogDecorators(0 /* comment */)` or a meaningful name. I have used the mask_from_decorators function, I think it should be cleaner now ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1790064483 From aboldtch at openjdk.org Mon Oct 7 11:46:19 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 7 Oct 2024 11:46:19 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v2] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:42:37 GMT, Ant?n Seoane wrote: >> src/hotspot/share/logging/logSelectionList.cpp line 62: >> >>> 60: } >>> 61: } >>> 62: return LogDecorators(0); >> >> Here I'd like to see either an explanation of 0 as `LogDecorators(0 /* comment */)` or a meaningful name. > > I have used the mask_from_decorators function, I think it should be cleaner now There is `LogDecorators::None` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1790065559 From jsjolen at openjdk.org Mon Oct 7 11:57:37 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 7 Oct 2024 11:57:37 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v2] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:46:19 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Review changes Code is OK, please consider Axel's advice and see if it's applicable. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2351732452 From duke at openjdk.org Mon Oct 7 13:14:21 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 13:14:21 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v3] In-Reply-To: References: Message-ID: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 35 additional commits since the last revision: - Merge commit '19642bd3833fa96eb4bc7a8a11e902782e0b7844' into ul-defaults-simplified - Review changes - Final changes - Renaming, test adaptation - Renaming - Temporarily commenting out testing code - Preliminary simplification of UL tag-specific defaults to only target defaults on/off - Removed whitespace - Initialization of _decorators field in logDecorators - Test adaptations to new focus - ... and 25 more: https://git.openjdk.org/jdk/compare/02d8dd79...fdf6ac02 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/e1878be5..fdf6ac02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=01-02 Stats: 209846 lines in 1840 files changed: 187264 ins; 12599 del; 9983 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From duke at openjdk.org Mon Oct 7 13:23:45 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 13:23:45 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v3] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:43:23 GMT, Axel Boldt-Christmas wrote: >> I have used the mask_from_decorators function, I think it should be cleaner now > > There is `LogDecorators::None` LogDecorators::None is defined in the .cpp, so I'd either have to make it "visible" or use the alternative NoDecorators. Both options are fine for me ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1790215272 From duke at openjdk.org Mon Oct 7 13:26:56 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 7 Oct 2024 13:26:56 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v4] In-Reply-To: References: Message-ID: <60YluOey0QgBIIcU-QQk58zr3lioQUjkNETfoyR5yCA=.164295af-bc30-4ed6-96d4-cdb272134abf@github.com> > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Update full name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/fdf6ac02..deef63ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=02-03 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From kbarrett at openjdk.org Mon Oct 7 15:16:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 7 Oct 2024 15:16:43 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: On Mon, 7 Oct 2024 05:43:47 GMT, Tobias Hartmann wrote: > What about using `intptr_t` for `TypeRawPtr::_bits` instead? That has more fannout, into code I'm not familiar with. The proposed change fixes the immediate "miscompilation". A change of the type could be done as a further enhancement, if that makes sense to do. I'd rather leave that to someone from the compiler team. If that approach is what's wanted to fix the immediate problem, then I'm going to want to hand this issue off. Also, uintptr_t might be more appropriate than intptr_t. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21324#issuecomment-2397213908 From thartmann at openjdk.org Mon Oct 7 16:22:05 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Oct 2024 16:22:05 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching Message-ID: C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way. Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body: https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary: https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned. I propose to simply align it in `PatchingStub::emit_code`. The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking. AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms. Thanks, Tobias ------------- Commit messages: - Increased timeout - Removed platform specific asserts from shared code - 8340313: Crash due to invalid oop in nmethod after C1 patching Changes: https://git.openjdk.org/jdk/pull/21389/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21389&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340313 Stats: 152 lines in 3 files changed: 147 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21389/head:pull/21389 PR: https://git.openjdk.org/jdk/pull/21389 From kxu at openjdk.org Mon Oct 7 18:44:37 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 7 Oct 2024 18:44:37 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 07:28:35 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments, use explicit opcode comparisons for LShift nodes > > src/hotspot/share/opto/addnode.cpp line 446: > >> 444: } >> 445: >> 446: // Match `a << CON`, extract `a` and `1 << CON` > > "extract" was a bit confusing at first. So, what you mean is return `a` and set `multiplier` to `1 << CON`. Maybe you want to update the comment to make this more explicit? Maybe something like that: > > // Try to match `a << CON`. On success, return `a` and set `1 << CON` as `multiplier`. > > You could do the same for the other `find*` methods. Updated comments. Thanks for the suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1790709000 From kxu at openjdk.org Mon Oct 7 18:50:57 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 7 Oct 2024 18:50:57 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: - remove matching power-of-2 subtractions since it's already handled by Identity() - verify results with custom test methods - update comments to be more descriptive, remove unused can_reshape argument ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/af6f8084..ecee68ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=07-08 Stats: 234 lines in 3 files changed: 62 ins; 91 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From kxu at openjdk.org Mon Oct 7 18:50:58 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 7 Oct 2024 18:50:58 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:02:35 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments, use explicit opcode comparisons for LShift nodes > > src/hotspot/share/opto/addnode.cpp line 567: > >> 565: >> 566: // We can't simply return the lshift node even if ((a << CON) - a) + a cancels out. Ideal() must return a new node. >> 567: *multiplier = ((jlong) 1 << con->get_int()) - 1; > > Can't this be an `Identity()` transformation where you can return existing nodes? Good point. I realized `(x - y) + y => x` is already handled by `AddINode::Identity` and `AddLNode::Identify`. I don't need to repeat here. > test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 64: > >> 62: private static void addTo2(int a) { >> 63: int sum = a + a; // Simple additions like a + a should be kept as-is >> 64: verifyResult(a, 2, sum); > > Generally, we should move all verification code out of the `@Test` methods to avoid side effects and worrying about whether the result checking is now compiled or not (we must ensure that the result checking code is interpreted to catch wrong executions with miscompiled code). > > I suggest the following (not tested): > > Introduce a `@Run` method, which is never compiled, for your `@Test` methods. You can still call methods from there but then you should ensure that they are not compiled either with `@DontCompile`: > > static final Random RANDOM = Utils.getRandomInstance(); > > ... > > @DontCompile > private static void verifyResult(int base, int factor, int observed) { ... } > > ... > > @Test > @IR(counts = {IRNode.ADD_I, "1", IRNode.LSHIFT_I, "1"}) > private static int addTo3(int a) { > return a + a + a; // a*3 => (a<<1) + a > } > > @Run(test = "addTo3") > void runAddTo3() { > int a = RANDOM.nextInt(); > int result = addTo3(a); > verifyResult(a, 3, result); > } > > Since the tests are all very similar and require the same setup and verification, you could even go a step further and provide a single shared `@Run` method which is possible: > > @Test > @IR(counts = {IRNode.ADD_I, "1", IRNode.LSHIFT_I, "1"}) > private static int addTo3(int a) { > return a + a + a; // a*3 => (a<<1) + a > } > > @Test > @IR(failOn = IRNode.ADD_I) > @IR(counts = {IRNode.LSHIFT_I, "1"}) > private static int addTo4(int a) { > return a + a + a + a; // a*4 => a<<2 > } > > @Run(test = {"addTo3", "addTo4"}) // List all @Test methods here and make sure you call all of them below. > void runTests() { > int a = RANDOM.nextInt(); > verifyResult(a, 3, addTo3(a)); > verifyResult(a, 4, addTo4(a)); > } > > > This also allows you to run with some more edge case values like `a == 0` or `a == min_int` etc. which gives us even some more confidence. Thanks for the idea. Converted to custom `@Run` methods and test with `a = 0, 1, min, max, rand` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1790711922 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1790713703 From kvn at openjdk.org Mon Oct 7 19:18:36 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 7 Oct 2024 19:18:36 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: <-W7RQShH8RK7mJ0_FNh-7nYeqCC_1IFiFRiATETFAaw=.9a5c31a8-e781-42f1-b877-7d5122b67730@github.com> On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann wrote: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned. > > I propose to simply align it in `PatchingStub::emit_code`. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking. > > AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms. > > Thanks, > Tobias src/hotspot/cpu/x86/c1_CodeStubs_x86.cpp line 334: > 332: // 8-byte align the address of the oop immediate to guarantee atomicity > 333: // when patching since the GC might walk nmethod oops concurrently. > 334: __ align(8, __ offset() + NativeMovConstReg::data_offset_rex); In 32-bit VM oops are 4 bytes so 8 bytes is overkill but I am fine with unified alignment. Should we align mov_metadata() too or it is guarantee aligned already? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21389#discussion_r1790750290 From dlong at openjdk.org Mon Oct 7 21:30:38 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 7 Oct 2024 21:30:38 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: On Fri, 4 Oct 2024 09:27:52 GMT, Kim Barrett wrote: >> Please review this change to TypeRawPtr::add_offset to prevent a compiler from >> inferring things based on prior pointer arithmetic not invoking UB. As noted in >> the bug report, clang is actually doing this. >> >> To accomplish this, changed to integral arithmetic. Also added over/underflow >> checks. >> >> Also made a couple of minor touchups. Replaced an implicit conversion to bool >> with an explicit compare to nullptr (per style guide). Removed a no longer >> needed dummy return after a (now) noreturn function. >> >> Testing: mach5 tier1-7 >> That testing was with calls to "fatal" for the over/underflow cases and the >> sum==0 case. There were no hits. I'm not sure how to construct a test that >> would hit those. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove surrounding whitespace src/hotspot/share/opto/type.cpp line 3226: > 3224: return this; > 3225: case TypePtr::Null: > 3226: return make( (address)offset ); Shouldn't this assert that _bits == 0? Looking at the code, however, I can't find anywhere that we actually create a TypeRawPtr with TypePtr::Null. We could probably remove this case and let it fall through to the default ShouldNotReachHere(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1790898473 From kbarrett at openjdk.org Mon Oct 7 22:01:27 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 7 Oct 2024 22:01:27 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: On Mon, 7 Oct 2024 21:27:58 GMT, Dean Long wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> remove surrounding whitespace > > src/hotspot/share/opto/type.cpp line 3226: > >> 3224: return this; >> 3225: case TypePtr::Null: >> 3226: return make( (address)offset ); > > Shouldn't this assert that _bits == 0? Looking at the code, however, I can't find anywhere that we actually create a TypeRawPtr with TypePtr::Null. We could probably remove this case and let it fall through to the default ShouldNotReachHere(). Initialization of `TypePtr::NULL_PTR` here: https://github.com/openjdk/jdk/blob/4d50cbb5a73ad1f84ecd6a895045ecfdb0835adc/src/hotspot/share/opto/type.cpp#L538 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1790914960 From dlong at openjdk.org Mon Oct 7 22:08:35 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 7 Oct 2024 22:08:35 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: <5LaZcZeK2ShpVHDF5LuK1m_Z0RLeQOIdHYXd9t_Vl5c=.ed093c6b-3949-4757-ba4b-a963067ce0ab@github.com> On Mon, 7 Oct 2024 21:45:31 GMT, Kim Barrett wrote: >> src/hotspot/share/opto/type.cpp line 3226: >> >>> 3224: return this; >>> 3225: case TypePtr::Null: >>> 3226: return make( (address)offset ); >> >> Shouldn't this assert that _bits == 0? Looking at the code, however, I can't find anywhere that we actually create a TypeRawPtr with TypePtr::Null. We could probably remove this case and let it fall through to the default ShouldNotReachHere(). > > Initialization of `TypePtr::NULL_PTR` here: > https://github.com/openjdk/jdk/blob/4d50cbb5a73ad1f84ecd6a895045ecfdb0835adc/src/hotspot/share/opto/type.cpp#L538 I saw that too, but it creates a TypePtr, not a TypeRawPtr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1790935162 From dlong at openjdk.org Mon Oct 7 23:33:02 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 7 Oct 2024 23:33:02 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann wrote: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned. > > I propose to simply align it in `PatchingStub::emit_code`. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking. > > AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms. > > Thanks, > Tobias Wouldn't it be better to get rid of the concurrency? We could grab CodeCache_lock and Patching_lock in the same block, so we serialize the patching and register_nmethod. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2398150822 From duke at openjdk.org Tue Oct 8 06:15:01 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 8 Oct 2024 06:15:01 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v14] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:32:22 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19646#issuecomment-2398935540 From duke at openjdk.org Tue Oct 8 06:15:01 2024 From: duke at openjdk.org (duke) Date: Tue, 8 Oct 2024 06:15:01 GMT Subject: RFR: 8330157: C2: Add a stress flag for bailouts [v14] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:32:22 GMT, Daniel Skantz wrote: >> This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). >> >> We check two invariants. >> a) Bailouts should be successful starting from any given `failing()` check. >> b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). >> >> a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. >> >> The added flag should not have any effect in debug mode. >> >> Testing: >> >> T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Christian Hagedorn @danielogh Your change (at version b6eb9a843e18b05ff2a23a3faecbe28c9118aa79) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19646#issuecomment-2398937479 From thartmann at openjdk.org Tue Oct 8 06:16:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 06:16:02 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v2] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Fri, 27 Sep 2024 19:46:15 GMT, Igor Veresov wrote: >> This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` >> >> The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. >> >> Compare the good graph shape (with control dependency set to `UnknownControl`): >> Good graph >> >> With the graph produce with the nodes pinned: >> Bad graph >> >> With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result > Summary: Make sure insert_anti_dependencies() starts from the right root Looks good to me otherwise. You might want to run performance testing just to make sure. src/hotspot/share/opto/gcm.cpp line 750: > 748: Node* initial_mem = load->in(MemNode::Memory); > 749: > 750: // We don't optimize memory graph for pinned loads, so we may need to raise the Suggestion: // We don't optimize the memory graph for pinned loads, so we may need to raise the ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21222#pullrequestreview-2353479727 PR Review Comment: https://git.openjdk.org/jdk/pull/21222#discussion_r1791256151 From thartmann at openjdk.org Tue Oct 8 06:16:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 06:16:03 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v2] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> <9kF42tnA1qVGkpW1hRaObBtueDCnO0wgZfhpOArNLnI=.11131d42-f12f-4559-9c47-24aa4e527782@github.com> Message-ID: On Fri, 27 Sep 2024 19:13:48 GMT, Vladimir Kozlov wrote: >> Also if there is a MergeMem as a root for some weird reason then `insert_anti_dependencies()` may very well miss an interfering store. So we'd have to do this loop for correctness. > > Okay Would it still make sense to assert `load->control_dependency() == Pinned` here for now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21222#discussion_r1791250944 From kbarrett at openjdk.org Tue Oct 8 06:25:57 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 8 Oct 2024 06:25:57 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: <5LaZcZeK2ShpVHDF5LuK1m_Z0RLeQOIdHYXd9t_Vl5c=.ed093c6b-3949-4757-ba4b-a963067ce0ab@github.com> References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> <5LaZcZeK2ShpVHDF5LuK1m_Z0RLeQOIdHYXd9t_Vl5c=.ed093c6b-3949-4757-ba4b-a963067ce0ab@github.com> Message-ID: On Mon, 7 Oct 2024 22:06:24 GMT, Dean Long wrote: >> Initialization of `TypePtr::NULL_PTR` here: >> https://github.com/openjdk/jdk/blob/4d50cbb5a73ad1f84ecd6a895045ecfdb0835adc/src/hotspot/share/opto/type.cpp#L538 > > I saw that too, but it creates a TypePtr, not a TypeRawPtr. Oh, you are right. And TypeRawPtr::make asserts the PTR is neither Constant nor Null. Which makes both switch cases under modification here supposedly unreachable. That would explain why I never hit either after running lots of tests. All of the change proposed here can be eliminated, and instead change both cases to fall through to the default ShouldNotReachHere(). (And that would be another way to remove the -Wzero-as-null-pointer-constant warning that was how I got here in the first place. :) ) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1791266904 From rcastanedalo at openjdk.org Tue Oct 8 07:02:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 07:02:50 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node Message-ID: This cleanup removes C2's `StoreCM` (store card mark) node and all its special handling code. This node used to model card mark stores in early-expanded G1 post-barriers, and is no longer needed after [JEP 475](https://openjdk.org/jeps/475). __Testing:__ tier1-5 (linux-x64, linux-aarch64, windows-x64, macosx-x64, and macosx-aarch64; release and debug mode). ------------- Commit messages: - Remove StoreCM node Changes: https://git.openjdk.org/jdk/pull/21385/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21385&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341619 Stats: 388 lines in 23 files changed: 0 ins; 376 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21385.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21385/head:pull/21385 PR: https://git.openjdk.org/jdk/pull/21385 From chagedorn at openjdk.org Tue Oct 8 07:13:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 07:13:01 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 18:50:57 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: > > - remove matching power-of-2 subtractions since it's already handled by Identity() > - verify results with custom test methods > - update comments to be more descriptive, remove unused can_reshape argument Thanks for the updates! Good conversion of the tests. ll give this another spinning in our testing. src/hotspot/share/opto/addnode.cpp line 439: > 437: > 438: // Try to match `a + a`. On success, return `a` and set `2` as `multiplier`. > 439: // The method matches `n` to for pattern: AddNode(a, a). Suggestion: // The method matches `n` for pattern: AddNode(a, a). test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 65: > 63: "mulAndAddToZero", // > 64: "mulAndAddToMinus1", // > 65: "mulAndAddToMinus42" // Why did you add the trailing `//`? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20754#pullrequestreview-2353577415 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1791311803 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1791313522 From chagedorn at openjdk.org Tue Oct 8 07:13:03 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 07:13:03 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v8] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 18:44:41 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/addnode.cpp line 567: >> >>> 565: >>> 566: // We can't simply return the lshift node even if ((a << CON) - a) + a cancels out. Ideal() must return a new node. >>> 567: *multiplier = ((jlong) 1 << con->get_int()) - 1; >> >> Can't this be an `Identity()` transformation where you can return existing nodes? > > Good point. I realized `(x - y) + y => x` is already handled by `AddINode::Identity` and `AddLNode::Identify`. I don't need to repeat here. That's great! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1791317004 From chagedorn at openjdk.org Tue Oct 8 07:16:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 07:16:02 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:28:19 GMT, Roberto Casta?eda Lozano wrote: > This cleanup removes C2's `StoreCM` (store card mark) node and all its special handling code. This node used to model card mark stores in early-expanded G1 post-barriers, and is no longer needed after [JEP 475](https://openjdk.org/jeps/475). > > __Testing:__ tier1-5 (linux-x64, linux-aarch64, windows-x64, macosx-x64, and macosx-aarch64; release and debug mode). Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21385#pullrequestreview-2353597839 From rcastanedalo at openjdk.org Tue Oct 8 07:21:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 07:21:57 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 07:13:42 GMT, Christian Hagedorn wrote: > Looks good! Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21385#issuecomment-2399042363 From thartmann at openjdk.org Tue Oct 8 08:57:59 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 08:57:59 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: References: Message-ID: <66NAr7bZnmYCELHL00gm1ge8PgrXUF5MVC_I8--pLxw=.4350170b-b782-48c1-bd15-15df5f71d91b@github.com> On Mon, 7 Oct 2024 11:28:19 GMT, Roberto Casta?eda Lozano wrote: > This cleanup removes C2's `StoreCM` (store card mark) node and all its special handling code. This node used to model card mark stores in early-expanded G1 post-barriers, and is no longer needed after [JEP 475](https://openjdk.org/jeps/475). > > __Testing:__ tier1-5 (linux-x64, linux-aarch64, windows-x64, macosx-x64, and macosx-aarch64; release and debug mode). Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21385#pullrequestreview-2353857802 From chagedorn at openjdk.org Tue Oct 8 09:51:00 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 09:51:00 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v3] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 07:42:54 GMT, Christian Hagedorn wrote: >> This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. >> >> ### Predicate Interfaces and Implementing Classes >> - `Predicate` interface is implemented by four predicate classes: >> - `ParsePredicate` (existing class) >> - `RuntimePredicate` (existing and updated class) >> - `TemplateAssertionPredicate` (new class) >> - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) >> >> ### Predicate Iterator with Visitor classes >> There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: >> - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. >> - Replaces the old now retired `ParsePredicateIterator`. >> - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. >> - Replaces the old now retired `PredicateEntryIterator`. >> - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. >> >> #### To Be Replaced soon >> There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. >> >> ### More Information >> More information about specific classes and changes can be found as code comments and PR comments. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add missing public for UnifiedPredicateVisitor Anyone for a second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21161#issuecomment-2399381127 From rcastanedalo at openjdk.org Tue Oct 8 11:04:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 11:04:59 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: <66NAr7bZnmYCELHL00gm1ge8PgrXUF5MVC_I8--pLxw=.4350170b-b782-48c1-bd15-15df5f71d91b@github.com> References: <66NAr7bZnmYCELHL00gm1ge8PgrXUF5MVC_I8--pLxw=.4350170b-b782-48c1-bd15-15df5f71d91b@github.com> Message-ID: On Tue, 8 Oct 2024 08:55:07 GMT, Tobias Hartmann wrote: > Looks good to me too. Thanks, Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21385#issuecomment-2399535146 From rcastanedalo at openjdk.org Tue Oct 8 11:14:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 11:14:59 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v4] In-Reply-To: <60YluOey0QgBIIcU-QQk58zr3lioQUjkNETfoyR5yCA=.164295af-bc30-4ed6-96d4-cdb272134abf@github.com> References: <60YluOey0QgBIIcU-QQk58zr3lioQUjkNETfoyR5yCA=.164295af-bc30-4ed6-96d4-cdb272134abf@github.com> Message-ID: On Mon, 7 Oct 2024 13:26:56 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Update full name Thanks for working on this, Ant?n! The new test files look good to me, I also agree on hiding decorators by default for `jit+inlining`. I just have a few minor comments. test/hotspot/gtest/logging/test_logDefaultDecorators.cpp line 29: > 27: #include "logging/logDecorators.hpp" > 28: #include "runtime/os.hpp" > 29: #include "unittest.hpp" Please sort the included files alphabetically (except for `precompiled.hpp` which should go first) for consistency with the other test files in the directory. Also, `runtime/os.hpp` is unused. Suggestion: #include "precompiled.hpp" #include "jvm.h" #include "logging/logDecorators.hpp" #include "logging/logTag.hpp" #include "unittest.hpp" test/hotspot/jtreg/runtime/logging/DefaultLogDecoratorsTest.java line 27: > 25: * @test > 26: * @requires vm.flagless > 27: * @summary Running -Xlog with tags which have default decorators should pick them This summary reflects the old proposal in JDK-8340363, please update. test/hotspot/jtreg/runtime/logging/DefaultLogDecoratorsTest.java line 51: > 49: for (String string : xlog) { > 50: argsList.add(string); > 51: } Suggestion: List argsList = new ArrayList(Arrays.asList(xlog)); test/hotspot/jtreg/runtime/logging/DefaultLogDecoratorsTest.java line 69: > 67: doTest(false, "-Xlog:jit+inlining*=trace:decorators.log"); > 68: > 69: Nit: unnecessary extra line (same for the other lines below). ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2353875217 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1791493896 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1791501653 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1791672411 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1791512229 From duke at openjdk.org Tue Oct 8 11:57:01 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 8 Oct 2024 11:57:01 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v4] In-Reply-To: References: <60YluOey0QgBIIcU-QQk58zr3lioQUjkNETfoyR5yCA=.164295af-bc30-4ed6-96d4-cdb272134abf@github.com> Message-ID: On Tue, 8 Oct 2024 09:07:08 GMT, Roberto Casta?eda Lozano wrote: >> Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: >> >> Update full name > > test/hotspot/jtreg/runtime/logging/DefaultLogDecoratorsTest.java line 27: > >> 25: * @test >> 26: * @requires vm.flagless >> 27: * @summary Running -Xlog with tags which have default decorators should pick them > > This summary reflects the old proposal in JDK-8340363, please update. Oh, I missed that! Thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1791734747 From duke at openjdk.org Tue Oct 8 12:12:31 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 8 Oct 2024 12:12:31 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v5] In-Reply-To: References: Message-ID: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Applying review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/deef63ff..a80d5fe1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=03-04 Stats: 11 lines in 2 files changed: 0 ins; 8 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From rcastanedalo at openjdk.org Tue Oct 8 12:53:03 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 12:53:03 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v5] In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 12:12:31 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Applying review comments Thanks for your work and for addressing the comments, Ant?n! The test code and the decision to hide decorators by default for `jit+inlining` look good to me. Note that this is only a partial review; a second review of the `src/hotspot/share/logging` changes is still required. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2354397406 From thartmann at openjdk.org Tue Oct 8 14:34:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 14:34:03 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann wrote: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned. > > I propose to simply align it in `PatchingStub::emit_code`. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking. > > AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms. > > Thanks, > Tobias Thanks for looking at this, Vladimir and Dean! > Wouldn't it be better to get rid of the concurrency? We could grab CodeCache_lock and Patching_lock in the same block, so we serialize the patching and register_nmethod. Yes, that would be an alternative solution. I went with the alignment because I thought it has the least impact. I'll ping the GC team, maybe they want to have a say in this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2400023164 From thartmann at openjdk.org Tue Oct 8 14:34:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 14:34:04 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: <-W7RQShH8RK7mJ0_FNh-7nYeqCC_1IFiFRiATETFAaw=.9a5c31a8-e781-42f1-b877-7d5122b67730@github.com> References: <-W7RQShH8RK7mJ0_FNh-7nYeqCC_1IFiFRiATETFAaw=.9a5c31a8-e781-42f1-b877-7d5122b67730@github.com> Message-ID: On Mon, 7 Oct 2024 19:16:27 GMT, Vladimir Kozlov wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that although the patch body is not executed, one thread can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another thread walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC and then encounters a half-written oop if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> Updating the oop immediate is not atomic because the address of the immediate is not 8-byte aligned. >> >> I propose to simply align it in `PatchingStub::emit_code`. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remove them again because they are x64 specific and I don't think it's worth the effort of adding a platform independent way of alignment checking. >> >> AArch64 is not affected because we always deopt instead of patching but other platforms might be affected as well. Should this fix be accepted, I'll ping the maintainers of the respective platforms. >> >> Thanks, >> Tobias > > src/hotspot/cpu/x86/c1_CodeStubs_x86.cpp line 334: > >> 332: // 8-byte align the address of the oop immediate to guarantee atomicity >> 333: // when patching since the GC might walk nmethod oops concurrently. >> 334: __ align(8, __ offset() + NativeMovConstReg::data_offset_rex); > > In 32-bit VM oops are 4 bytes so 8 bytes is overkill but I am fine with unified alignment. > Should we align mov_metadata() too or it is guarantee aligned already? I don't think we need to guarantee atomicity for metadata because it's not observed concurrently as far as I know, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21389#discussion_r1791996064 From thartmann at openjdk.org Tue Oct 8 15:20:01 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Oct 2024 15:20:01 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> Message-ID: On Mon, 7 Oct 2024 15:13:47 GMT, Kim Barrett wrote: > > What about using `intptr_t` for `TypeRawPtr::_bits` instead? > > That has more fannout, into code I'm not familiar with. The proposed change fixes the immediate "miscompilation". A change of the type could be done as a further enhancement, if that makes sense to do. I'd rather leave that to someone from the compiler team. If that approach is what's wanted to fix the immediate problem, then I'm going to want to hand this issue off. Also, uintptr_t might be more appropriate than intptr_t. Okay, that's fine with me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21324#issuecomment-2400143674 From iveresov at openjdk.org Tue Oct 8 15:39:37 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 8 Oct 2024 15:39:37 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v3] In-Reply-To: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: > This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` > > The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. > > Compare the good graph shape (with control dependency set to `UnknownControl`): > Good graph > > With the graph produce with the nodes pinned: > Bad graph > > With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/gcm.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21222/files - new: https://git.openjdk.org/jdk/pull/21222/files/e9295d93..e80084ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21222&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21222&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21222.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21222/head:pull/21222 PR: https://git.openjdk.org/jdk/pull/21222 From iveresov at openjdk.org Tue Oct 8 15:41:59 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 8 Oct 2024 15:41:59 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v2] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Tue, 8 Oct 2024 06:13:41 GMT, Tobias Hartmann wrote: > Looks good to me otherwise. You might want to run performance testing just to make sure. While working on it I inserted a printf in it and the loop almost never happens since Loads are typically normalized. So, I don't think there is any impact on performance and I don't think checking for controlled dependency is necessary. It would also require us to carry this information to mach nodes... ------------- PR Comment: https://git.openjdk.org/jdk/pull/21222#issuecomment-2400194839 From chagedorn at openjdk.org Tue Oct 8 15:48:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 15:48:02 GMT Subject: RFR: 8341131: Some jdk/jfr/event/compiler tests shouldn't be executed with Xcomp In-Reply-To: <0s7WMhOzLNXKcDjZkg5NEzvaeCvswLtIbTVEdsezr8M=.56c5ea42-ac74-4be1-87ef-353373f5bd6b@github.com> References: <0s7WMhOzLNXKcDjZkg5NEzvaeCvswLtIbTVEdsezr8M=.56c5ea42-ac74-4be1-87ef-353373f5bd6b@github.com> Message-ID: On Sat, 28 Sep 2024 03:47:50 GMT, Leonid Mesnik wrote: > Few jdk/jfr/event/compiler tests sensitive to compile flags and shouldn't be executed with Xcomp. Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21239#pullrequestreview-2354895186 From chagedorn at openjdk.org Tue Oct 8 15:52:08 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Oct 2024 15:52:08 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 18:50:57 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: > > - remove matching power-of-2 subtractions since it's already handled by Identity() > - verify results with custom test methods > - update comments to be more descriptive, remove unused can_reshape argument Testing looked good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2400216991 From kvn at openjdk.org Tue Oct 8 16:27:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 8 Oct 2024 16:27:00 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: References: Message-ID: <-rmpO6zs0OfYl616EhcyeL1Izlx-kpx84VUmFjTZ3HM=.7ede4276-5e3e-479e-ae13-c8b2e8f34275@github.com> On Mon, 7 Oct 2024 11:28:19 GMT, Roberto Casta?eda Lozano wrote: > This cleanup removes C2's `StoreCM` (store card mark) node and all its special handling code. This node used to model card mark stores in early-expanded G1 post-barriers, and is no longer needed after [JEP 475](https://openjdk.org/jeps/475). > > __Testing:__ tier1-5 (linux-x64, linux-aarch64, windows-x64, macosx-x64, and macosx-aarch64; release and debug mode). Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21385#pullrequestreview-2354992010 From rcastanedalo at openjdk.org Tue Oct 8 16:36:05 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 16:36:05 GMT Subject: RFR: 8341619: C2: remove unused StoreCM node In-Reply-To: <-rmpO6zs0OfYl616EhcyeL1Izlx-kpx84VUmFjTZ3HM=.7ede4276-5e3e-479e-ae13-c8b2e8f34275@github.com> References: <-rmpO6zs0OfYl616EhcyeL1Izlx-kpx84VUmFjTZ3HM=.7ede4276-5e3e-479e-ae13-c8b2e8f34275@github.com> Message-ID: <7Q8caYWlp3OGt-DLjZG65wnx1dKA4tllGuY2G8lmX50=.3f88d91c-53e4-4af5-8e9c-c697405dccb5@github.com> On Tue, 8 Oct 2024 16:24:32 GMT, Vladimir Kozlov wrote: > Good. Thanks, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21385#issuecomment-2400338019 From psandoz at openjdk.org Tue Oct 8 16:40:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 8 Oct 2024 16:40:17 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v22] In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 00:01:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update VectorMath.java Java changes look good (see comments to fix some typos). Needs another HotSpot reviewer. Marked as reviewed by psandoz (Reviewer). src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 275: > 273: * @param b the second operand. > 274: * @return the saturating addition of the operands. > 275: * @see VectorOperators#SADD Suggestion: * @see VectorOperators#SUADD src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 301: > 299: * @param b the second operand. > 300: * @return the saturating difference of the operands. > 301: * @see VectorOperators#SSUB Suggestion: * @see VectorOperators#SUSUB src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 413: > 411: * @param b the second operand. > 412: * @return the saturating addition of the operands. > 413: * @see VectorOperators#SADD Suggestion: * @see VectorOperators#SUADD src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 439: > 437: * @param b the second operand. > 438: * @return the saturating difference of the operands. > 439: * @see VectorOperators#SSUB Suggestion: * @see VectorOperators#SUSUB src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 551: > 549: * @param b the second operand. > 550: * @return the saturating addition of the operands. > 551: * @see VectorOperators#SADD Suggestion: * @see VectorOperators#SUADD src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java line 577: > 575: * @param b the second operand. > 576: * @return the saturating difference of the operands. > 577: * @see VectorOperators#SSUB Suggestion: * @see VectorOperators#SUSUB ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2354993291 PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2355019508 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792178593 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792178872 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792179260 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792179485 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792179780 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1792180281 From psandoz at openjdk.org Tue Oct 8 17:13:10 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 8 Oct 2024 17:13:10 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v16] In-Reply-To: <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> Message-ID: On Thu, 3 Oct 2024 19:05:14 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Sharpening intrinsic exit check. test/jdk/jdk/incubator/vector/templates/Unit-header.template line 408: > 406: for (j = 0; j < vector_len; j++) { > 407: idx = i + j; > 408: wrapped_index =(((int)order[idx]) & (2 * vector_len -1)); This assumes a power of two, can we change to use `Math.floorMod`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1792232986 From lmesnik at openjdk.org Tue Oct 8 17:47:06 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 8 Oct 2024 17:47:06 GMT Subject: Integrated: 8341131: Some jdk/jfr/event/compiler tests shouldn't be executed with Xcomp In-Reply-To: <0s7WMhOzLNXKcDjZkg5NEzvaeCvswLtIbTVEdsezr8M=.56c5ea42-ac74-4be1-87ef-353373f5bd6b@github.com> References: <0s7WMhOzLNXKcDjZkg5NEzvaeCvswLtIbTVEdsezr8M=.56c5ea42-ac74-4be1-87ef-353373f5bd6b@github.com> Message-ID: <4IPHgtEK4RfBTA2R1pMLvCCdzCB8A6jQLpQZFfebiCU=.400c4381-ee42-44ed-93dc-e24ba53b0b36@github.com> On Sat, 28 Sep 2024 03:47:50 GMT, Leonid Mesnik wrote: > Few jdk/jfr/event/compiler tests sensitive to compile flags and shouldn't be executed with Xcomp. This pull request has now been integrated. Changeset: 7312eea3 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/7312eea382eed048b6abdb6409c006fc8e8f45b4 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod 8341131: Some jdk/jfr/event/compiler tests shouldn't be executed with Xcomp Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/21239 From dlong at openjdk.org Tue Oct 8 18:36:12 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Oct 2024 18:36:12 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> <5LaZcZeK2ShpVHDF5LuK1m_Z0RLeQOIdHYXd9t_Vl5c=.ed093c6b-3949-4757-ba4b-a963067ce0ab@github.com> Message-ID: On Tue, 8 Oct 2024 06:23:47 GMT, Kim Barrett wrote: >> I saw that too, but it creates a TypePtr, not a TypeRawPtr. > > Oh, you are right. And TypeRawPtr::make asserts the PTR is neither Constant nor Null. Which makes > both switch cases under modification here supposedly unreachable. That would explain why I never hit > either after running lots of tests. All of the change proposed here can be eliminated, and instead change > both cases to fall through to the default ShouldNotReachHere(). (And that would be another way to > remove the -Wzero-as-null-pointer-constant warning that was how I got here in the first place. :) ) There's TypeRawPtr::make(enum PTR ptr) which doesn't allow Constant or Null, but we are using TypeRawPtr::make(address bits) here. We may need to keep the Constant case. I wouldn't be surprised if there was a way to trigger that path using Unsafe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1792333243 From dlong at openjdk.org Tue Oct 8 19:15:57 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Oct 2024 19:15:57 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: <5t5vgIdcLZyg10tHz23C9NV1c4mFvsDrSDhBp49Ugk0=.b5ba825b-2373-4af5-ba69-062450711bbb@github.com> On Wed, 25 Sep 2024 22:52:18 GMT, Dean Long wrote: >> Instead of bailout in alternative approach we can change `cha_monomorphic_target` to `nullptr` in code which is looking for it in previous lines. `target` will be used for call and we will loose a little performance when JVMTI is used instead of skipping compilation. Am I missing something? > > @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. > > Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. > @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? Either one is fine with me. I could make a separate draft PR with the alternative solution if that helps reviewers decide. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2400626113 From dlong at openjdk.org Tue Oct 8 19:21:58 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Oct 2024 19:21:58 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 21:23:28 GMT, Vladimir Ivanov wrote: > > JVMTI can add and delete methods > > Can you elaborate on that point, please? JVMTI spec states that redefinition/retransformation "must not add, remove or rename fields or methods" [1] [2]. > > [1] https://docs.oracle.com/en/java/javase/23/docs/specs/jvmti.html#RedefineClasses [2] https://docs.oracle.com/en/java/javase/23/docs/specs/jvmti.html#RetransformClasses It's because of the AllowRedefinitionToAddDeleteMethods flag: https://github.com/openjdk/jdk/blob/7312eea382eed048b6abdb6409c006fc8e8f45b4/src/hotspot/share/prims/jvmtiRedefineClasses.cpp#L928 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2400635053 From jbhateja at openjdk.org Tue Oct 8 19:25:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 8 Oct 2024 19:25:24 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v23] In-Reply-To: References: Message-ID: <0LkBvvNPq5jmWfOdjItIXGedRDtpiivJM06BAx7vP0I=.c5417544-edef-4623-beaa-08cd7c565361@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge branch 'JDK-8338201' of http://github.com/jatin-bhateja/jdk into JDK-8338201 - Update VectorMath.java - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 - Typographical error fixups - Doc fixups - Typographic error - Merge stashing and re-commit - Tuning extra spaces. - Tests for newly added VectorMath.* operations - Test cleanups. - ... and 16 more: https://git.openjdk.org/jdk/compare/7312eea3...ce76c3e5 ------------- Changes: https://git.openjdk.org/jdk/pull/20507/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=22 Stats: 9206 lines in 51 files changed: 8778 ins; 27 del; 401 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From qamai at openjdk.org Tue Oct 8 19:50:32 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 8 Oct 2024 19:50:32 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* Message-ID: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Hi, This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. Please take a look and leave your reviews, Thanks a lot. ------------- Commit messages: - more cleanup - copyright - fix tests - cleanup TypeVect Changes: https://git.openjdk.org/jdk/pull/21414/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21414&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341784 Stats: 188 lines in 18 files changed: 4 ins; 73 del; 111 mod Patch: https://git.openjdk.org/jdk/pull/21414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21414/head:pull/21414 PR: https://git.openjdk.org/jdk/pull/21414 From dlong at openjdk.org Tue Oct 8 20:30:34 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Oct 2024 20:30:34 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v2] In-Reply-To: References: Message-ID: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request incrementally with one additional commit since the last revision: simplification based on reviewer comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21148/files - new: https://git.openjdk.org/jdk/pull/21148/files/3b258664..0705b33e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=00-01 Stats: 45 lines in 3 files changed: 11 ins; 33 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From dlong at openjdk.org Tue Oct 8 20:30:34 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 8 Oct 2024 20:30:34 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 16:33:45 GMT, Vladimir Kozlov wrote: >> @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. >> >> Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. > > @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? > I like @vnkozlov suggestion to null out `cha_monomorphic_target`. Moreover, the validation can be performed inside `ciMethod::find_monomorphic_target()` which is used to compute `cha_monomorphic_target`. I like this idea. I pushed a new version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2400759341 From iveresov at openjdk.org Tue Oct 8 20:40:59 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 8 Oct 2024 20:40:59 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v3] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Tue, 8 Oct 2024 15:39:37 GMT, Igor Veresov wrote: >> This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` >> >> The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. >> >> Compare the good graph shape (with control dependency set to `UnknownControl`): >> Good graph >> >> With the graph produce with the nodes pinned: >> Bad graph >> >> With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/gcm.cpp > > Co-authored-by: Tobias Hartmann Oh, I guess I need one of you guys to approve it again after I fixed the comment per Tobias' recommendation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21222#issuecomment-2400776905 From kvn at openjdk.org Tue Oct 8 22:24:59 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 8 Oct 2024 22:24:59 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v2] In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 20:30:34 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > simplification based on reviewer comments This looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21148#pullrequestreview-2355641562 From kvn at openjdk.org Tue Oct 8 22:24:59 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 8 Oct 2024 22:24:59 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v3] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Tue, 8 Oct 2024 15:39:37 GMT, Igor Veresov wrote: >> This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` >> >> The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. >> >> Compare the good graph shape (with control dependency set to `UnknownControl`): >> Good graph >> >> With the graph produce with the nodes pinned: >> Bad graph >> >> With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/gcm.cpp > > Co-authored-by: Tobias Hartmann Re-approved. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21222#pullrequestreview-2355642340 From iveresov at openjdk.org Tue Oct 8 22:34:01 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 8 Oct 2024 22:34:01 GMT Subject: RFR: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result [v3] In-Reply-To: References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Tue, 8 Oct 2024 15:39:37 GMT, Igor Veresov wrote: >> This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` >> >> The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. >> >> Compare the good graph shape (with control dependency set to `UnknownControl`): >> Good graph >> >> With the graph produce with the nodes pinned: >> Bad graph >> >> With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/gcm.cpp > > Co-authored-by: Tobias Hartmann Thanks, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21222#issuecomment-2400931753 From kvn at openjdk.org Tue Oct 8 23:07:58 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 8 Oct 2024 23:07:58 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* In-Reply-To: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: On Tue, 8 Oct 2024 19:46:12 GMT, Quan Anh Mai wrote: > Hi, > > This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. > > Please take a look and leave your reviews, > Thanks a lot. Looks reasonable. Need to test it internally. src/hotspot/share/opto/type.cpp line 2531: > 2529: > 2530: //------------------------------meet------------------------------------------- > 2531: // Compute the MEET of two types. It returns a new Type object. It never returns new type now. ------------- PR Review: https://git.openjdk.org/jdk/pull/21414#pullrequestreview-2355671466 PR Review Comment: https://git.openjdk.org/jdk/pull/21414#discussion_r1792593799 From iveresov at openjdk.org Tue Oct 8 23:25:05 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 8 Oct 2024 23:25:05 GMT Subject: Integrated: 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result In-Reply-To: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> References: <7-uUPaQL7Kaav2qNqw5sTyNau-qnx_GhWePXLiWordI=.b5bafb40-4058-4cb9-8abd-110a68e06162@github.com> Message-ID: On Fri, 27 Sep 2024 16:02:29 GMT, Igor Veresov wrote: > This is essentially a defensive forward port of a solution for an issue discovered in 11, where a use of pinned load (for which we disable almost all transforms in `LoadNode::Ideal()`) leads to a graph shape that is not expected by `insert_anti_dependences()` > > The `insert_anti_dependences()` assumes that the memory input for a load is the root of the memory subgraph that it has to search for the possibly conflicting store. Usually this is true if we run all the memory optimizations but with pinned we don't. > > Compare the good graph shape (with control dependency set to `UnknownControl`): > Good graph > > With the graph produce with the nodes pinned: > Bad graph > > With the "bad graph" loads and store don't share the same memory graph root, and therefore are not considered by `insert_anti_dependences()`. The solution, I think, could be to walk up the memory chain of the load, skipping MergeMems, in order to get to the real root and then run the precedence edge insertion algorithm from there. This pull request has now been integrated. Changeset: 7eab0a50 Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/7eab0a506adffac7bed940cc020e37754f0adbdb Stats: 59 lines in 2 files changed: 59 ins; 0 del; 0 mod 8337066: Repeated call of StringBuffer.reverse with double byte string returns wrong result Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21222 From sviswanathan at openjdk.org Wed Oct 9 00:14:00 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 00:14:00 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 16:44:57 GMT, hanklo6 wrote: > Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > ### Generate test instructions > With `binutils = 2.43` > * `python3 x86-asmtest.py > asmtest.out.h` > ### Run test > * `make test TEST="gtest:AssemblerX86"` test/hotspot/gtest/x86/test_assemblerx86.cpp line 1: > 1: #include "precompiled.hpp" Need to add copyright header to this file at the beginning. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1792637049 From kxu at openjdk.org Wed Oct 9 02:52:38 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Oct 2024 02:52:38 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v10] In-Reply-To: References: Message-ID: <708mNBltFukvzi1tAy1jileWyCS80mGr6UJ2vBAds9E=.7b13b64f-eb7e-440d-9439-fca40b327032@github.com> > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. Kangcheng Xu has updated the pull request incrementally with two additional commits since the last revision: - remove trailing empty comments - fix comment grammar Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20754/files - new: https://git.openjdk.org/jdk/pull/20754/files/ecee68ce..b5bc4f92 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20754&range=08-09 Stats: 29 lines in 2 files changed: 0 ins; 0 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/20754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20754/head:pull/20754 PR: https://git.openjdk.org/jdk/pull/20754 From kxu at openjdk.org Wed Oct 9 02:52:38 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Oct 2024 02:52:38 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 15:48:55 GMT, Christian Hagedorn wrote: > Testing looked good. Thank you @chhagedorn. Could please grant an approval once again (after updates only to comments) so we can merge this? Thanks! > test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 65: > >> 63: "mulAndAddToZero", // >> 64: "mulAndAddToMinus1", // >> 65: "mulAndAddToMinus42" // > > Why did you add the trailing `//`? Those are added to prevent formatter from collapsing these lines to one. I've gone ahead to remove them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2401163656 PR Review Comment: https://git.openjdk.org/jdk/pull/20754#discussion_r1792714150 From chagedorn at openjdk.org Wed Oct 9 05:29:59 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Oct 2024 05:29:59 GMT Subject: RFR: 8325495: C2: implement optimization for series of Add of unique value [v10] In-Reply-To: <708mNBltFukvzi1tAy1jileWyCS80mGr6UJ2vBAds9E=.7b13b64f-eb7e-440d-9439-fca40b327032@github.com> References: <708mNBltFukvzi1tAy1jileWyCS80mGr6UJ2vBAds9E=.7b13b64f-eb7e-440d-9439-fca40b327032@github.com> Message-ID: On Wed, 9 Oct 2024 02:52:38 GMT, Kangcheng Xu wrote: >> This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. >> >> As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). >> >> Some notable examples of this transformation include: >> - `a + a + a` => `a*3` => `(a<<1) + a` >> - `a + a + a + a` => `a*4` => `a<<2` >> - `a*3 + a` => `a*4` => `a<<2` >> - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` >> >> See included IR unit tests for more. > > Kangcheng Xu has updated the pull request incrementally with two additional commits since the last revision: > > - remove trailing empty comments > - fix comment grammar > > Co-authored-by: Christian Hagedorn Still good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20754#pullrequestreview-2356049327 From thartmann at openjdk.org Wed Oct 9 07:03:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Oct 2024 07:03:02 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v3] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 07:42:54 GMT, Christian Hagedorn wrote: >> This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. >> >> ### Predicate Interfaces and Implementing Classes >> - `Predicate` interface is implemented by four predicate classes: >> - `ParsePredicate` (existing class) >> - `RuntimePredicate` (existing and updated class) >> - `TemplateAssertionPredicate` (new class) >> - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) >> >> ### Predicate Iterator with Visitor classes >> There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: >> - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. >> - Replaces the old now retired `ParsePredicateIterator`. >> - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. >> - Replaces the old now retired `PredicateEntryIterator`. >> - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. >> >> #### To Be Replaced soon >> There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. >> >> ### More Information >> More information about specific classes and changes can be found as code comments and PR comments. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add missing public for UnifiedPredicateVisitor Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21161#pullrequestreview-2356226816 From duke at openjdk.org Wed Oct 9 07:04:09 2024 From: duke at openjdk.org (Daniel Skantz) Date: Wed, 9 Oct 2024 07:04:09 GMT Subject: Integrated: 8330157: C2: Add a stress flag for bailouts In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 07:14:20 GMT, Daniel Skantz wrote: > This patch adds a diagnostic/stress flag for C2 bailouts. It can be used to support testing of existing bailouts to prevent issues like [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445), and can test for issues only seen at runtime such as [JDK-8326376](https://bugs.openjdk.org/browse/JDK-8326376). It can also be useful if we want to add more bailouts ([JDK-8318900](https://bugs.openjdk.org/browse/JDK-8318900)). > > We check two invariants. > a) Bailouts should be successful starting from any given `failing()` check. > b) The VM should not record a bailout when one is pending (in which case we have continued to optimize for too long). > > a), b) are checked by randomly starting a bailout at calls to `failing()` with a user-given probability. > > The added flag should not have any effect in debug mode. > > Testing: > > T1-5, with flag and without it. We want to check that this does not cause any test failures without the flag set, and no unexpected failures with it. Tests failing because of timeout or because an error is printed to output when compilation fails can be expected in some cases. This pull request has now been integrated. Changeset: d3f3c6a8 Author: Daniel Skantz URL: https://git.openjdk.org/jdk/commit/d3f3c6a8cdf862df3a72f60c824ce50d37231061 Stats: 201 lines in 17 files changed: 167 ins; 0 del; 34 mod 8330157: C2: Add a stress flag for bailouts Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/19646 From roland at openjdk.org Wed Oct 9 07:19:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 07:19:00 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: References: Message-ID: <6ZPkkz5717Lsy7F4KF4qKkWJkM0qoOXgdcCUeFlwvm0=.e6b22c97-df8b-4b20-8cfc-7041d71450b2@github.com> On Tue, 1 Oct 2024 13:22:23 GMT, Roland Westrelin wrote: >> When converting a `LongCountedLoop` into a loop nest, c2 needs jvm >> state to add predicates to the inner loop. For that, it peels an >> iteration of the loop and uses the state of the safepoint at the end >> of the loop. That's only legal if there's no side effect between the >> safepoint and the backedge that goes back into the loop. The assert >> failure here happens in code that checks that. >> >> That code compares the memory states at the safepoint and at the >> backedge. If they are the same then there's no side effect. To check >> consistency, the `MergeMem` at the safepoint is cloned. As the logic >> iterates over the backedge state, it clears every component of the >> state it encounters from the `MergeMem`. Once done, the cloned >> `MergeMem` should be "empty". In the case of this failure, no side >> effect is found but the cloned `MergeMem` is not empty. That happens >> because of EA: it adds edges to the `MergeMem` at the safepoint that >> it doesn't add to the backedge `Phis`. >> >> So it's the verification code that fails and I propose dealing with >> this by ignoring memory state added by EA in the verification code. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8336702 > - test indentation > - fix & test Anyone else for a review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21009#issuecomment-2401525131 From roland at openjdk.org Wed Oct 9 07:19:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 07:19:00 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: <4ckwGsbjH0VDbGmBBTuG4HUc6ARwbISQ7L8xsVCeqDs=.28d696f2-47ba-4323-855d-e71369242876@github.com> References: <4ckwGsbjH0VDbGmBBTuG4HUc6ARwbISQ7L8xsVCeqDs=.28d696f2-47ba-4323-855d-e71369242876@github.com> Message-ID: <9txXoz-5H9NuAz9aubEzaYp-VLYNXIJCFe_InxZ3-zQ=.94c9facc-c44d-4766-851a-4bf31e7ba76f@github.com> On Mon, 7 Oct 2024 06:00:57 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java >> >> Co-authored-by: Christian Hagedorn > > Marked as reviewed by thartmann (Reviewer). @TobiHartmann Do you have an update on testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21303#issuecomment-2401523971 From thartmann at openjdk.org Wed Oct 9 07:31:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Oct 2024 07:31:02 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: <3okDpS1wmMEyxWhDIJyCNb2jmSGic_7GWhB3KPt4VdA=.b1eb8167-0b28-4a57-af72-5e50cf3fea74@github.com> On Wed, 2 Oct 2024 13:58:14 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn Sorry, that slipped through. Testing looked good. Let me re-run some quick testing with the latest updates. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21303#issuecomment-2401546792 From chagedorn at openjdk.org Wed Oct 9 08:03:09 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Oct 2024 08:03:09 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:22:23 GMT, Roland Westrelin wrote: >> When converting a `LongCountedLoop` into a loop nest, c2 needs jvm >> state to add predicates to the inner loop. For that, it peels an >> iteration of the loop and uses the state of the safepoint at the end >> of the loop. That's only legal if there's no side effect between the >> safepoint and the backedge that goes back into the loop. The assert >> failure here happens in code that checks that. >> >> That code compares the memory states at the safepoint and at the >> backedge. If they are the same then there's no side effect. To check >> consistency, the `MergeMem` at the safepoint is cloned. As the logic >> iterates over the backedge state, it clears every component of the >> state it encounters from the `MergeMem`. Once done, the cloned >> `MergeMem` should be "empty". In the case of this failure, no side >> effect is found but the cloned `MergeMem` is not empty. That happens >> because of EA: it adds edges to the `MergeMem` at the safepoint that >> it doesn't add to the backedge `Phis`. >> >> So it's the verification code that fails and I propose dealing with >> this by ignoring memory state added by EA in the verification code. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8336702 > - test indentation > - fix & test Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21009#pullrequestreview-2356368812 From duke at openjdk.org Wed Oct 9 09:13:37 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Wed, 9 Oct 2024 09:13:37 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. > > The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. > > This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Replace LogDecorators::Decorator::NoDecorator with LogDecorators::None ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21383/files - new: https://git.openjdk.org/jdk/pull/21383/files/a80d5fe1..5c933c06 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21383&range=04-05 Stats: 13 lines in 3 files changed: 0 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21383/head:pull/21383 PR: https://git.openjdk.org/jdk/pull/21383 From jbhateja at openjdk.org Wed Oct 9 09:59:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 9 Oct 2024 09:59:11 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: > This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. > > > MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) > MulL (URShift SRC1 , 32) (URShift SRC2, 32) > MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) > MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 1299.407 ops/ms > VectorXXH3HashingB... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction ------------- Changes: https://git.openjdk.org/jdk/pull/21244/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=01 Stats: 354 lines in 12 files changed: 343 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From jbhateja at openjdk.org Wed Oct 9 10:11:03 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 9 Oct 2024 10:11:03 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Hi @iwanowww , @sviswa7, @merykitty, Can you kindly review this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2401895553 From jbhateja at openjdk.org Wed Oct 9 10:12:57 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 9 Oct 2024 10:12:57 GMT Subject: RFR: 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 08:09:29 GMT, Jatin Bhateja wrote: > - Enable APX EGPRs state save restoration check which triggers synthetic SIGSEGV and verifies modified EGPRs state across OS signal handling for non-product builds to match with [corresponding logic in signal handlers.](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L251) > > - Currently we haven't enabled APX support in product builds and intend to do so once entire planned support ([JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030)) is validated and checked into JDK-mainline, we are following incremental development approach for APX and hence don't want partial APX support to be enabled in intermediate releases. > > Kindly review. > > Best Regards, > Jatin Hi @TobiHartmann , @vnkozlov , @sviswa7 can you kindly check this small patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21419#issuecomment-2401895714 From thartmann at openjdk.org Wed Oct 9 10:36:58 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Oct 2024 10:36:58 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 13:58:14 GMT, Roland Westrelin wrote: >> The patch includes 2 test cases for this: test1() causes the assert >> failure in the bug description, test2() causes an incorrect execution >> where a load floats above a store that it should be dependent on. >> >> In the test cases, `field` is accessed on object `a` of type `A`. When >> the field is accessed, the type that c2 has for `a` is `A` with >> interface `I`. The holder of the field is class `A` which implements >> no interface. The reason the type of `a` and the type of the holder >> are slightly different is because `a` is the result of a merge of >> objects of subclasses `B` and `C` which implements `I`. >> >> The root cause of the bug is that `Compile::flatten_alias_type()` >> doesn't change `A` + interface `I` into `A`, the actual holder of the >> field. So `field` in `A` + interface `I` and `field` in `A` get >> different slices which is wrong. At parse time, the logic that creates >> the `Store` node uses: >> >> >> C->alias_type(field)->adr_type() >> >> >> to compute the slice which is the slice for `field` in `A`. So the >> slice used at parse time is the right one but during igvn, when the >> slice is computed from the input address, a different slice (the one >> for `A` + interface `I`) is used. That causes load/store nodes when >> they are processed by igvn to use the wrong memory state. >> >> In `Compile::flatten_alias_type()`: >> >> >> if (!ik->equals(canonical_holder) || tj->offset() != offset) { >> if( is_known_inst ) { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); >> } else { >> tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); >> } >> } >> >> >> only flattens the type if it's not the canonical holder but it should >> test that the type doesn't implement interfaces that the canonical >> holder doesn't. To keep the logic simple, the fix I propose creates a >> new type whenever there's a chance that a type implements extra >> interfaces (the type is not exact). >> >> I also added asserts in `GraphKit::make_load()` and >> `GraphKit::store_to_memory()` to make sure the slice that is passed >> and the address type agree. Those asserts fire with the new test >> cases. When running testing, I found that they also catch a few cases >> in `library_call.cpp` where an incorrect slice is passed. >> >> As further clean up, maybe we want to drop the slice argument to >> `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to >> their callers) given it's redundant with th... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java > > Co-authored-by: Christian Hagedorn All testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21303#issuecomment-2401946988 From thartmann at openjdk.org Wed Oct 9 11:24:59 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Oct 2024 11:24:59 GMT Subject: RFR: 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 08:09:29 GMT, Jatin Bhateja wrote: > - Enable APX EGPRs state save restoration check which triggers synthetic SIGSEGV and verifies modified EGPRs state across OS signal handling for non-product builds to match with [corresponding logic in signal handlers.](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L251) > > - Currently we haven't enabled APX support in product builds and intend to do so once entire planned support ([JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030)) is validated and checked into JDK-mainline, we are following incremental development approach for APX and hence don't want partial APX support to be enabled in intermediate releases. > > Kindly review. > > Best Regards, > Jatin Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21419#pullrequestreview-2356853675 From chagedorn at openjdk.org Wed Oct 9 11:45:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Oct 2024 11:45:12 GMT Subject: RFR: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking [v3] In-Reply-To: References: Message-ID: <2g1Fk2CW1jURKwEvmKKbHsBWlBlTQzLsdOwwDRpoboM=.0bb892bf-74fe-4a34-9bce-1b19ec641b58@github.com> On Thu, 26 Sep 2024 07:42:54 GMT, Christian Hagedorn wrote: >> This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. >> >> ### Predicate Interfaces and Implementing Classes >> - `Predicate` interface is implemented by four predicate classes: >> - `ParsePredicate` (existing class) >> - `RuntimePredicate` (existing and updated class) >> - `TemplateAssertionPredicate` (new class) >> - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) >> >> ### Predicate Iterator with Visitor classes >> There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: >> - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. >> - Replaces the old now retired `ParsePredicateIterator`. >> - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. >> - Replaces the old now retired `PredicateEntryIterator`. >> - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. >> >> #### To Be Replaced soon >> There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. >> >> ### More Information >> More information about specific classes and changes can be found as code comments and PR comments. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add missing public for UnifiedPredicateVisitor Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21161#issuecomment-2402086101 From chagedorn at openjdk.org Wed Oct 9 11:45:13 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Oct 2024 11:45:13 GMT Subject: Integrated: 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 14:19:41 GMT, Christian Hagedorn wrote: > This patch introduces new predicate classes which implement a new `Predicate` interface. These classes represent the different predicates in the C2 IR. They are used in combination with new predicate iterator and visitors classes to provide an easy way to walk and process predicates in the IR. > > ### Predicate Interfaces and Implementing Classes > - `Predicate` interface is implemented by four predicate classes: > - `ParsePredicate` (existing class) > - `RuntimePredicate` (existing and updated class) > - `TemplateAssertionPredicate` (new class) > - `InitializedAssertionPredicate` (new class, renamed old `InitializedAssertionPredicate` class to `InitializedAssertionPredicateCreator`) > > ### Predicate Iterator with Visitor classes > There is a new `PredicateIterator` class which can be used to iterate through the predicates of a loop. For each predicate, a `PredicateVisitor` can be applied. The user can implement the `PredicateIterator` interface and override the default do-nothing implementations to the specific needs of the code. I've done this for a couple of places in the code by defining new visitors: > - `ParsePredicateUsefulMarker`: This visitor marks all Parse Predicates as useful. > - Replaces the old now retired `ParsePredicateIterator`. > - `DominatedPredicates`: This visitor checks the dominance relation to an `early` node when trying to figure out the latest legal placement for a node. The goal is to skip as many predicates as possible to avoid interference with Loop Predication and/or creating a Loop Limit Check Predicate. > - Replaces the old now retired `PredicateEntryIterator`. > - `Predicates::dump()`: Newly added dumping code for the predicates above a loop which uses a new `PredicatePrinter` visitor. This helps debugging issues with predicates. > > #### To Be Replaced soon > There are a couple of places where we use similar code to walk predicates and apply some transformation/modifications to the IR. The goal is to replace these locations with the new visitors as well. This will incrementally be done with the next couple of PRs. > > ### More Information > More information about specific classes and changes can be found as code comments and PR comments. > > Thanks, > Christian This pull request has now been integrated. Changeset: 3fba1702 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/3fba1702cd8dc817b11bfa51077c41424d289281 Stats: 566 lines in 4 files changed: 420 ins; 62 del; 84 mod 8340786: Introduce Predicate classes with predicate iterators and visitors for simplified walking Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21161 From duke at openjdk.org Wed Oct 9 12:21:58 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Wed, 9 Oct 2024 12:21:58 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 13:20:36 GMT, Ant?n Seoane wrote: >> There is `LogDecorators::None` > > LogDecorators::None is defined in the .cpp, so I'd either have to make it "visible" or use the alternative NoDecorators. Both options are fine for me I am using `LogDecorators::None` now, I think it is cleaner than the `NoDecorators` alternative ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1793418398 From stefank at openjdk.org Wed Oct 9 13:42:12 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 9 Oct 2024 13:42:12 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. Looks good to me. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21422#pullrequestreview-2357165925 From fbredberg at openjdk.org Wed Oct 9 13:42:12 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 9 Oct 2024 13:42:12 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 Message-ID: This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. ------------- Commit messages: - 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 Changes: https://git.openjdk.org/jdk/pull/21422/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21422&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341854 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21422.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21422/head:pull/21422 PR: https://git.openjdk.org/jdk/pull/21422 From aboldtch at openjdk.org Wed Oct 9 13:42:12 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 9 Oct 2024 13:42:12 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21422#pullrequestreview-2357186012 From pchilanomate at openjdk.org Wed Oct 9 13:44:59 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 9 Oct 2024 13:44:59 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. Looks good, thanks for fixing this. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21422#pullrequestreview-2357206121 From kbarrett at openjdk.org Wed Oct 9 14:57:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 9 Oct 2024 14:57:37 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v3] In-Reply-To: References: Message-ID: > Please review this change to TypeRawPtr::add_offset to prevent a compiler from > inferring things based on prior pointer arithmetic not invoking UB. As noted in > the bug report, clang is actually doing this. > > To accomplish this, changed to integral arithmetic. Also added over/underflow > checks. > > Also made a couple of minor touchups. Replaced an implicit conversion to bool > with an explicit compare to nullptr (per style guide). Removed a no longer > needed dummy return after a (now) noreturn function. > > Testing: mach5 tier1-7 > That testing was with calls to "fatal" for the over/underflow cases and the > sum==0 case. There were no hits. I'm not sure how to construct a test that > would hit those. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: remove unreachable TypePtr::Null case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21324/files - new: https://git.openjdk.org/jdk/pull/21324/files/cc1f2ac8..c3dc62e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21324&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21324&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21324.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21324/head:pull/21324 PR: https://git.openjdk.org/jdk/pull/21324 From kbarrett at openjdk.org Wed Oct 9 14:57:38 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 9 Oct 2024 14:57:38 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> <5LaZcZeK2ShpVHDF5LuK1m_Z0RLeQOIdHYXd9t_Vl5c=.ed093c6b-3949-4757-ba4b-a963067ce0ab@github.com> Message-ID: On Tue, 8 Oct 2024 18:32:54 GMT, Dean Long wrote: >> Oh, you are right. And TypeRawPtr::make asserts the PTR is neither Constant nor Null. Which makes >> both switch cases under modification here supposedly unreachable. That would explain why I never hit >> either after running lots of tests. All of the change proposed here can be eliminated, and instead change >> both cases to fall through to the default ShouldNotReachHere(). (And that would be another way to >> remove the -Wzero-as-null-pointer-constant warning that was how I got here in the first place. :) ) > > There's TypeRawPtr::make(enum PTR ptr) which doesn't allow Constant or Null, but we are using TypeRawPtr::make(address bits) here. > We may need to keep the Constant case. I wouldn't be surprised if there was a way to trigger that path using Unsafe. Yeah, keeping it makes sense. I've removed the TypePtr::Null case, allowing that one to default to ShuoldNotReachHere(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21324#discussion_r1793675908 From roland at openjdk.org Wed Oct 9 15:02:09 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 15:02:09 GMT Subject: RFR: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop [v4] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 10:33:57 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/types/TestBadMemSliceWithInterfaces.java >> >> Co-authored-by: Christian Hagedorn > > All testing passed. @TobiHartmann @chhagedorn thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21303#issuecomment-2402582761 From roland at openjdk.org Wed Oct 9 15:02:12 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 15:02:12 GMT Subject: RFR: 8336702: C2 compilation fails with "all memory state should have been processed" assert [v2] In-Reply-To: References: Message-ID: <0TN0cCdVnmSXZIojKFYweszMATqugx7m7TSfXSSF5X8=.0db400f0-b313-41c4-8fdf-9f321217c250@github.com> On Wed, 2 Oct 2024 10:42:14 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8336702 >> - test indentation >> - fix & test > > Looks good to me. Testing passed. @TobiHartmann @chhagedorn thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21009#issuecomment-2402577712 From roland at openjdk.org Wed Oct 9 15:02:14 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 15:02:14 GMT Subject: Integrated: 8336702: C2 compilation fails with "all memory state should have been processed" assert In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:34:44 GMT, Roland Westrelin wrote: > When converting a `LongCountedLoop` into a loop nest, c2 needs jvm > state to add predicates to the inner loop. For that, it peels an > iteration of the loop and uses the state of the safepoint at the end > of the loop. That's only legal if there's no side effect between the > safepoint and the backedge that goes back into the loop. The assert > failure here happens in code that checks that. > > That code compares the memory states at the safepoint and at the > backedge. If they are the same then there's no side effect. To check > consistency, the `MergeMem` at the safepoint is cloned. As the logic > iterates over the backedge state, it clears every component of the > state it encounters from the `MergeMem`. Once done, the cloned > `MergeMem` should be "empty". In the case of this failure, no side > effect is found but the cloned `MergeMem` is not empty. That happens > because of EA: it adds edges to the `MergeMem` at the safepoint that > it doesn't add to the backedge `Phis`. > > So it's the verification code that fails and I propose dealing with > this by ignoring memory state added by EA in the verification code. This pull request has now been integrated. Changeset: ecc77a5b Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/ecc77a5b4a84c84ffa1580174872af6df3a4f6ca Stats: 75 lines in 2 files changed: 73 ins; 0 del; 2 mod 8336702: C2 compilation fails with "all memory state should have been processed" assert Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/21009 From roland at openjdk.org Wed Oct 9 15:02:10 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 9 Oct 2024 15:02:10 GMT Subject: Integrated: 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop In-Reply-To: References: Message-ID: <-9qrfoG6r9BG2K8oxSRpTImnaiQQCuXRiT4IggzqVkU=.2b3cf70a-76ec-46a8-b1f5-ecc63f407e53@github.com> On Wed, 2 Oct 2024 11:21:43 GMT, Roland Westrelin wrote: > The patch includes 2 test cases for this: test1() causes the assert > failure in the bug description, test2() causes an incorrect execution > where a load floats above a store that it should be dependent on. > > In the test cases, `field` is accessed on object `a` of type `A`. When > the field is accessed, the type that c2 has for `a` is `A` with > interface `I`. The holder of the field is class `A` which implements > no interface. The reason the type of `a` and the type of the holder > are slightly different is because `a` is the result of a merge of > objects of subclasses `B` and `C` which implements `I`. > > The root cause of the bug is that `Compile::flatten_alias_type()` > doesn't change `A` + interface `I` into `A`, the actual holder of the > field. So `field` in `A` + interface `I` and `field` in `A` get > different slices which is wrong. At parse time, the logic that creates > the `Store` node uses: > > > C->alias_type(field)->adr_type() > > > to compute the slice which is the slice for `field` in `A`. So the > slice used at parse time is the right one but during igvn, when the > slice is computed from the input address, a different slice (the one > for `A` + interface `I`) is used. That causes load/store nodes when > they are processed by igvn to use the wrong memory state. > > In `Compile::flatten_alias_type()`: > > > if (!ik->equals(canonical_holder) || tj->offset() != offset) { > if( is_known_inst ) { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, true, nullptr, offset, to->instance_id()); > } else { > tj = to = TypeInstPtr::make(to->ptr(), canonical_holder, false, nullptr, offset); > } > } > > > only flattens the type if it's not the canonical holder but it should > test that the type doesn't implement interfaces that the canonical > holder doesn't. To keep the logic simple, the fix I propose creates a > new type whenever there's a chance that a type implements extra > interfaces (the type is not exact). > > I also added asserts in `GraphKit::make_load()` and > `GraphKit::store_to_memory()` to make sure the slice that is passed > and the address type agree. Those asserts fire with the new test > cases. When running testing, I found that they also catch a few cases > in `library_call.cpp` where an incorrect slice is passed. > > As further clean up, maybe we want to drop the slice argument to > `GraphKit::make_load()` and `GraphKit::store_to_memory()` (and to > their callers) given it's redundant with the address type and error > prone. This pull request has now been integrated. Changeset: ff2f39f2 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/ff2f39f24018436556a8956ec55da433dc697437 Stats: 124 lines in 4 files changed: 112 ins; 1 del; 11 mod 8340214: C2 compilation asserts with "no node with a side effect" in PhaseIdealLoop::try_sink_out_of_loop Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21303 From kxu at openjdk.org Wed Oct 9 15:11:12 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Oct 2024 15:11:12 GMT Subject: Integrated: 8325495: C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 19:27:29 GMT, Kangcheng Xu wrote: > This pull request resolves [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) by converting series of additions of the same operand into multiplications. I.e., `a + a + ... + a + a + a => n*a`. > > As an added benefit, it also converts `C * a + a` into `(C+1) * a` and `a << C + a` into `(2^C + 1) * a` (with respect to constant `C`). This is actually a side effect of IGVN being iterative: at converting the `i`-th addition, the previous `i-1` additions would have already been optimized to multiplication (and thus, further into bit shifts and additions/subtractions if possible). > > Some notable examples of this transformation include: > - `a + a + a` => `a*3` => `(a<<1) + a` > - `a + a + a + a` => `a*4` => `a<<2` > - `a*3 + a` => `a*4` => `a<<2` > - `(a << 1) + a + a` => `a*2 + a + a` => `a*3 + a` => `a*4 => a<<2` > > See included IR unit tests for more. This pull request has now been integrated. Changeset: c30ad012 Author: Kangcheng Xu URL: https://git.openjdk.org/jdk/commit/c30ad0124e7743f3a4c29ef901761f8fcc53de10 Stats: 414 lines in 3 files changed: 414 ins; 0 del; 0 mod 8325495: C2: implement optimization for series of Add of unique value Reviewed-by: chagedorn, roland ------------- PR: https://git.openjdk.org/jdk/pull/20754 From dcubed at openjdk.org Wed Oct 9 16:08:58 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 9 Oct 2024 16:08:58 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. > > Tested ok in tier1-3 on all x64 related platforms. Also ran the vm.lang.LockUnlock.testContendedLock test. Thumbs up. I think this is a trivial fix since the new instruction: `orl(t, 1)` is one of the well known ways to set ZF to 0. It's just the opposite of the well known way to set ZF to 1 used on L835 below: `xorl(t, t)`. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21422#pullrequestreview-2357597296 From sviswanathan at openjdk.org Wed Oct 9 16:29:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 16:29:06 GMT Subject: RFR: 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 08:09:29 GMT, Jatin Bhateja wrote: > - Enable APX EGPRs state save restoration check which triggers synthetic SIGSEGV and verifies modified EGPRs state across OS signal handling for non-product builds to match with [corresponding logic in signal handlers.](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L251) > > - Currently we haven't enabled APX support in product builds and intend to do so once entire planned support ([JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030)) is validated and checked into JDK-mainline, we are following incremental development approach for APX and hence don't want partial APX support to be enabled in intermediate releases. > > Kindly review. > > Best Regards, > Jatin Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21419#pullrequestreview-2357643470 From fbredberg at openjdk.org Wed Oct 9 16:43:58 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 9 Oct 2024 16:43:58 GMT Subject: RFR: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. > > Tested ok in tier1-3 on all x64 related platforms. Also ran the vm.lang.LockUnlock.testContendedLock test. Thanks everyone for the quick review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21422#issuecomment-2402812881 From fbredberg at openjdk.org Wed Oct 9 16:49:02 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 9 Oct 2024 16:49:02 GMT Subject: Integrated: 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 13:11:58 GMT, Fredrik Bredberg wrote: > This bug was created in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > `C2_MacroAssembler::fast_unlock_lightweight()` on x86 issues a `testl(monitor, monitor);` instruction for the sole purpose of clearing the zero-flag, which should force us to go into the slow path. > > However, this instruction incorrectly only checks the lower 32-bits, which results in setting the zero-flag if the ObjectMonitor has all-zeros in the lower 32-bits. For some reason this seems to be quite common on macosx-x64, where we tend to get an ObjectMonitor address that is 0x0000600000000000. > > The reason we wanted to go into the slow path was that we've observed that there is a thread queued on either the EntryList or cxq, and there is no successor. However since we failed to clear the zero-flag, we will go into the fast path and no one will wake up the stranded thread. Thus the system will hang and any test system will timeout. > > Tested ok in tier1-3 on all x64 related platforms. Also ran the vm.lang.LockUnlock.testContendedLock test. This pull request has now been integrated. Changeset: fcc9c8d5 Author: Fredrik Bredberg URL: https://git.openjdk.org/jdk/commit/fcc9c8d570396506068e0a1d4123e32b195e6653 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8341854: Incorrect clearing of ZF in fast_unlock_lightweight on x86 Reviewed-by: stefank, aboldtch, pchilanomate, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/21422 From kvn at openjdk.org Wed Oct 9 16:59:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 9 Oct 2024 16:59:01 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* In-Reply-To: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: <0GutJI2Gd9s2w3wK9R4UJBYhbyrYZIakEGEaz34jvZw=.95efd71c-d350-4a34-bfda-4cbf06599c1e@github.com> On Tue, 8 Oct 2024 19:46:12 GMT, Quan Anh Mai wrote: > Hi, > > This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. > > Please take a look and leave your reviews, > Thanks a lot. My testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2402839331 From kvn at openjdk.org Wed Oct 9 17:10:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 9 Oct 2024 17:10:00 GMT Subject: RFR: 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 08:09:29 GMT, Jatin Bhateja wrote: > - Enable APX EGPRs state save restoration check which triggers synthetic SIGSEGV and verifies modified EGPRs state across OS signal handling for non-product builds to match with [corresponding logic in signal handlers.](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L251) > > - Currently we haven't enabled APX support in product builds and intend to do so once entire planned support ([JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030)) is validated and checked into JDK-mainline, we are following incremental development approach for APX and hence don't want partial APX support to be enabled in intermediate releases. > > Kindly review. > > Best Regards, > Jatin Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21419#pullrequestreview-2357732186 From qamai at openjdk.org Wed Oct 9 17:12:33 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 9 Oct 2024 17:12:33 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v2] In-Reply-To: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: > Hi, > > This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. > > Please take a look and leave your reviews, > Thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: style changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21414/files - new: https://git.openjdk.org/jdk/pull/21414/files/78b88e46..90f11d40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21414&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21414&range=00-01 Stats: 22 lines in 2 files changed: 1 ins; 6 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21414/head:pull/21414 PR: https://git.openjdk.org/jdk/pull/21414 From qamai at openjdk.org Wed Oct 9 17:12:34 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 9 Oct 2024 17:12:34 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* In-Reply-To: <0GutJI2Gd9s2w3wK9R4UJBYhbyrYZIakEGEaz34jvZw=.95efd71c-d350-4a34-bfda-4cbf06599c1e@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <0GutJI2Gd9s2w3wK9R4UJBYhbyrYZIakEGEaz34jvZw=.95efd71c-d350-4a34-bfda-4cbf06599c1e@github.com> Message-ID: On Wed, 9 Oct 2024 16:55:55 GMT, Vladimir Kozlov wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > My testing passed. @vnkozlov Thanks for your reviews and testings, the latest commit addresses your concern, as well as contains some minor style changes and a removal of switch case duplication. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2402865288 From jbhateja at openjdk.org Wed Oct 9 17:47:07 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 9 Oct 2024 17:47:07 GMT Subject: Integrated: 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds In-Reply-To: References: Message-ID: <3LlcLPjSu70smZz7MpxZw4TGI9F7N3qWfarmBBA5ET8=.2bd0f976-836d-415c-ae2b-38357629db27@github.com> On Wed, 9 Oct 2024 08:09:29 GMT, Jatin Bhateja wrote: > - Enable APX EGPRs state save restoration check which triggers synthetic SIGSEGV and verifies modified EGPRs state across OS signal handling for non-product builds to match with [corresponding logic in signal handlers.](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L251) > > - Currently we haven't enabled APX support in product builds and intend to do so once entire planned support ([JDK-8329030](https://bugs.openjdk.org/browse/JDK-8329030)) is validated and checked into JDK-mainline, we are following incremental development approach for APX and hence don't want partial APX support to be enabled in intermediate releases. > > Kindly review. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 3180aaa3 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/3180aaa370de16eb1835e1f57664b9fb15a6bb01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8341832: Incorrect continuation address of synthetic SIGSEGV for APX in product builds Reviewed-by: thartmann, sviswanathan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21419 From kvn at openjdk.org Wed Oct 9 17:52:59 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 9 Oct 2024 17:52:59 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v2] In-Reply-To: References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: On Wed, 9 Oct 2024 17:12:33 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style changes Good. You need second review because change is not trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21414#pullrequestreview-2357817812 From kxu at openjdk.org Wed Oct 9 18:21:30 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Oct 2024 18:21:30 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v22] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: remove <= test cases, disable StressLongCountedLoop and PerMethodTrapLimit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/32bedd00..845e34cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=20-21 Stats: 65 lines in 1 file changed: 6 ins; 37 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From kxu at openjdk.org Wed Oct 9 18:21:30 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 9 Oct 2024 18:21:30 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v21] In-Reply-To: <1R6tWNQF3WSJ4joPaoWR3N_JKy0W-BB0Zntg00N-mlU=.e8b25703-93ea-42be-ba1d-46e06c17987c@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <1R6tWNQF3WSJ4joPaoWR3N_JKy0W-BB0Zntg00N-mlU=.e8b25703-93ea-42be-ba1d-46e06c17987c@github.com> Message-ID: On Mon, 7 Oct 2024 07:56:19 GMT, Tobias Hartmann wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> correctly verify outputs with custom @Run methods > > `compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java` times out in our testing both with `-XX:StressLongCountedLoop=200000000` and with `-XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0`: > > > "main" #1 [2771172] prio=5 os_prio=0 cpu=500187.70ms elapsed=503.08s allocated=6554K defined_classes=227 tid=0x0000ffff9002d550 nid=2771172 runnable [0x0000ffff972bf000] > java.lang.Thread.State: RUNNABLE > Thread: 0x0000ffff9002d550 [0x2a48e4] State: _at_safepoint _at_poll_safepoint 1 > JavaThread state: _thread_blocked > at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:93) > at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.runTestIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:103) > at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base at 24-internal/DirectMethodHandle$Holder) > at java.lang.invoke.LambdaForm$MH/0x0000ffff58460870.invoke(java.base at 24-internal/LambdaForm$MH) > at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base at 24-internal/Invokers$Holder) > at jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(java.base at 24-internal/DirectMethodHandleAccessor.java:154) > at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(java.base at 24-internal/DirectMethodHandleAccessor.java:104) > at java.lang.reflect.Method.invoke(java.base at 24-internal/Method.java:573) > at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159) > at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:98) > at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:89) > at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861) > at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252) > at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165) @TobiHartmann. Thanks for the feedback! I did some investigation, reasons for timeouts comes three folds: 1. Tests with `i <= stop` is not a counted loop in the first place and should be removed: Now I remember why I originally didn't test for it. Consider `for (int i = 0; i <= stop; i++);` when `stop = Integer.MAX_VALUE`. Overflow in Java is well-defined, which means the code must loop definitely and optimizations of any kind can't break this. Therefore, `<=` are not counted loops to begin with. `@IR(failOn = {IRNode.COUNTED_LOOP})` doesn't fail either. I removed these test cases. 2. It is normal to timeout with `-XX:StressLongCountedLoop=200000000` for all test cases: An value other than `0` for this flag will forcefully convert int counted loops to long counted loops, which C2 doesn't do parallel IV at this point. This is same issue as [JDK-8294839](https://bugs.openjdk.org/browse/JDK-8294838). Loops are still loops. For a large random `stop` value, this will take a long time to loop through. 3. It is normal to timeout with `-XX:PerMethodTrapLimit=0` for test cases with stride other than `1`: Take `for (int i = 0; i < stop; i += 2)` for an example. Since there is a chance for increment to `i` go beyond `stop` (and eventually overflows), there must be some sort of runtime check for `stop`. Normally, a `loop_limit_check` trap is compiled to take the slow path (deoptimization). However, the zero trap limit forces C2 to loop and check `i < stop` on every iteration. For a large random `stop` value, this will take a long time. For the latter two reasons, I added `runWithFlags()` to essentially disable the flags in questions. https://github.com/openjdk/jdk/blob/845e34cc7a82ef5cb69620a12f487adaca9d2613/test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java#L47-L51 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2402984653 From azvegint at openjdk.org Wed Oct 9 18:24:40 2024 From: azvegint at openjdk.org (Alexander Zvegintsev) Date: Wed, 9 Oct 2024 18:24:40 GMT Subject: Integrated: 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 18:02:09 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList applications/ctw/modules/java_base_2.java on linux-x64. Marked as reviewed by azvegint (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21430#pullrequestreview-2357847049 From dcubed at openjdk.org Wed Oct 9 18:24:40 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 9 Oct 2024 18:24:40 GMT Subject: Integrated: 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 18:06:25 GMT, Alexander Zvegintsev wrote: >> A trivial fix to ProblemList applications/ctw/modules/java_base_2.java on linux-x64. > > Marked as reviewed by azvegint (Reviewer). @azvegint - Thanks for the lightning fast review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21430#issuecomment-2402968716 From dcubed at openjdk.org Wed Oct 9 18:24:40 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 9 Oct 2024 18:24:40 GMT Subject: Integrated: 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 In-Reply-To: References: Message-ID: <8xEj6FBjBgN_nnLs4fVRN-5v2nhuCdys87o7qQ5NInY=.5e43e62f-1728-4337-b1dc-c0a70cc9accd@github.com> On Wed, 9 Oct 2024 18:02:09 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList applications/ctw/modules/java_base_2.java on linux-x64. This pull request has now been integrated. Changeset: a45abf13 Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/a45abf131be9ee52828c5db18a18847c45ae6994 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 Reviewed-by: azvegint ------------- PR: https://git.openjdk.org/jdk/pull/21430 From dcubed at openjdk.org Wed Oct 9 18:24:40 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 9 Oct 2024 18:24:40 GMT Subject: Integrated: 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 Message-ID: A trivial fix to ProblemList applications/ctw/modules/java_base_2.java on linux-x64. ------------- Commit messages: - 8341860: ProblemList applications/ctw/modules/java_base_2.java on linux-x64 Changes: https://git.openjdk.org/jdk/pull/21430/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21430&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341860 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21430.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21430/head:pull/21430 PR: https://git.openjdk.org/jdk/pull/21430 From svkamath at openjdk.org Wed Oct 9 18:31:41 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Wed, 9 Oct 2024 18:31:41 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: > 8341052: SHA-512 implementation using SHA-NI Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Addressed a review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20633/files - new: https://git.openjdk.org/jdk/pull/20633/files/afeb5028..85c1aea9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20633&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20633&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20633.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20633/head:pull/20633 PR: https://git.openjdk.org/jdk/pull/20633 From dlong at openjdk.org Wed Oct 9 18:53:07 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 9 Oct 2024 18:53:07 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 16:33:45 GMT, Vladimir Kozlov wrote: >> @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. >> >> Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. > > @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? Thanks @vnkozlov. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2403058861 From jkarthikeyan at openjdk.org Wed Oct 9 19:29:18 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 9 Oct 2024 19:29:18 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v2] In-Reply-To: References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: On Wed, 9 Oct 2024 17:12:33 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > style changes I think this is a really nice cleanup, it makes working with vectors more clear. I've added some minor stylistic fixes you could do since you're already changing these lines. src/hotspot/share/opto/type.cpp line 2532: > 2530: //------------------------------meet------------------------------------------- > 2531: // Compute the MEET of two types. Since each TypeVect is the only instance of > 2532: // its species, meetting often returns itself Suggestion: // its species, meeting often returns itself. src/hotspot/share/opto/vectorIntrinsics.cpp line 602: > 600: } > 601: > 602: const TypeVect * vt = TypeVect::make(elem_bt, num_elem); Suggestion: const TypeVect* vt = TypeVect::make(elem_bt, num_elem); src/hotspot/share/opto/vectorIntrinsics.cpp line 624: > 622: > 623: Node * mod_val = gvn().makecon(TypeInt::make(num_elem-1)); > 624: Node * bcast_mod = gvn().transform(VectorNode::scalar2vector(mod_val, num_elem, elem_bt)); Suggestion: Node* bcast_mod = gvn().transform(VectorNode::scalar2vector(mod_val, num_elem, elem_bt)); src/hotspot/share/opto/vectorIntrinsics.cpp line 2202: > 2200: > 2201: // cast index vector from elem_bt vector to byte vector > 2202: const TypeVect * byte_vt = TypeVect::make(T_BYTE, num_elem); Suggestion: const TypeVect* byte_vt = TypeVect::make(T_BYTE, num_elem); ------------- Marked as reviewed by jkarthikeyan (Committer). PR Review: https://git.openjdk.org/jdk/pull/21414#pullrequestreview-2358047181 PR Review Comment: https://git.openjdk.org/jdk/pull/21414#discussion_r1794077620 PR Review Comment: https://git.openjdk.org/jdk/pull/21414#discussion_r1794068522 PR Review Comment: https://git.openjdk.org/jdk/pull/21414#discussion_r1794070557 PR Review Comment: https://git.openjdk.org/jdk/pull/21414#discussion_r1794069515 From dlong at openjdk.org Wed Oct 9 20:15:17 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 9 Oct 2024 20:15:17 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v3] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 14:57:37 GMT, Kim Barrett wrote: >> Please review this change to TypeRawPtr::add_offset to prevent a compiler from >> inferring things based on prior pointer arithmetic not invoking UB. As noted in >> the bug report, clang is actually doing this. >> >> To accomplish this, changed to integral arithmetic. Also added over/underflow >> checks. >> >> Also made a couple of minor touchups. Replaced an implicit conversion to bool >> with an explicit compare to nullptr (per style guide). Removed a no longer >> needed dummy return after a (now) noreturn function. >> >> Testing: mach5 tier1-7 >> That testing was with calls to "fatal" for the over/underflow cases and the >> sum==0 case. There were no hits. I'm not sure how to construct a test that >> would hit those. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove unreachable TypePtr::Null case Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21324#pullrequestreview-2358219138 From sviswanathan at openjdk.org Wed Oct 9 20:16:13 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 20:16:13 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 18:31:41 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed a review comment Marked as reviewed by sviswanathan (Reviewer). Looks good to me. ------------- PR Review: https://git.openjdk.org/jdk/pull/20633#pullrequestreview-2358219543 PR Comment: https://git.openjdk.org/jdk/pull/20633#issuecomment-2403346825 From vlivanov at openjdk.org Wed Oct 9 21:20:18 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 9 Oct 2024 21:20:18 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v2] In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 20:30:34 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > simplification based on reviewer comments src/hotspot/share/ci/ciMethod.cpp line 692: > 690: > 691: // Redefinition support. > 692: if (this->get_Method()->is_old() || root_m->get_Method()->is_old()) { Is it safe to access raw `Method*` from a compiler thread which is not in VM state? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1794240981 From duke at openjdk.org Wed Oct 9 21:47:21 2024 From: duke at openjdk.org (hanklo6) Date: Wed, 9 Oct 2024 21:47:21 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v2] In-Reply-To: References: Message-ID: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> > Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > ### Generate test instructions > With `binutils = 2.43` > * `python3 x86-asmtest.py > asmtest.out.h` > ### Run test > * `make test TEST="gtest:AssemblerX86"` hanklo6 has updated the pull request incrementally with one additional commit since the last revision: Add copyright header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20857/files - new: https://git.openjdk.org/jdk/pull/20857/files/2f258ba9..766582d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20857&range=00-01 Stats: 44 lines in 2 files changed: 44 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20857.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20857/head:pull/20857 PR: https://git.openjdk.org/jdk/pull/20857 From sviswanathan at openjdk.org Wed Oct 9 21:47:35 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 21:47:35 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v2] In-Reply-To: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> References: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> Message-ID: On Wed, 9 Oct 2024 21:47:21 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request incrementally with one additional commit since the last revision: > > Add copyright header Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20857#pullrequestreview-2358391276 From sviswanathan at openjdk.org Wed Oct 9 21:57:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 21:57:12 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v2] In-Reply-To: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> References: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> Message-ID: On Wed, 9 Oct 2024 21:47:21 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request incrementally with one additional commit since the last revision: > > Add copyright header @vnkozlov We look forward to your inputs on this encoding test PR. It takes care of the testing action item that came up during the review of APX instruction encoding PR (https://github.com/openjdk/jdk/pull/18476). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20857#issuecomment-2403497773 From kvn at openjdk.org Wed Oct 9 22:18:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 9 Oct 2024 22:18:13 GMT Subject: RFR: 8339507: Test generation tool and gtest for testing APX encoding of extended gpr instructions [v2] In-Reply-To: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> References: <8iiMADD8HDJ8-b_SluuLPn9795YmnKYiyQgw5OybGtY=.357a6919-be67-43e5-bc24-6431cf824084@github.com> Message-ID: On Wed, 9 Oct 2024 21:47:21 GMT, hanklo6 wrote: >> Add test generation tool and gtest for testing APX encoding of instructions with extended general-purpose registers. >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. >> >> ### Generate test instructions >> With `binutils = 2.43` >> * `python3 x86-asmtest.py > asmtest.out.h` >> ### Run test >> * `make test TEST="gtest:AssemblerX86"` > > hanklo6 has updated the pull request incrementally with one additional commit since the last revision: > > Add copyright header Is this test for both 32- and 64-bits instructions/VMs? How complete the set of instructions covered by the test? test/hotspot/gtest/x86/test_assemblerx86.cpp line 26: > 24: #include "precompiled.hpp" > 25: > 26: #if defined(X86) You may add ` && !defined(ZERO)` similar to `test_assembler_aarch64.cpp` test. test/hotspot/gtest/x86/test_assemblerx86.cpp line 93: > 91: address entry = __ pc(); > 92: > 93: // python x86-asmtest.py | expand > asmtest.out.h The PR description shows different instructions to build: With binutils = 2.43 python3 x86-asmtest.py > asmtest.out.h I would like to have comment with correct and detailed instructions how to build `asmtest.out.h` ------------- PR Review: https://git.openjdk.org/jdk/pull/20857#pullrequestreview-2358422678 PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1794304617 PR Review Comment: https://git.openjdk.org/jdk/pull/20857#discussion_r1794301774 From qamai at openjdk.org Thu Oct 10 01:09:06 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 10 Oct 2024 01:09:06 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> > Hi, > > This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. > > Please take a look and leave your reviews, > Thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: more style changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21414/files - new: https://git.openjdk.org/jdk/pull/21414/files/90f11d40..a99a7434 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21414&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21414&range=01-02 Stats: 11 lines in 3 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21414/head:pull/21414 PR: https://git.openjdk.org/jdk/pull/21414 From qamai at openjdk.org Thu Oct 10 01:10:18 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 10 Oct 2024 01:10:18 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v2] In-Reply-To: References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: On Wed, 9 Oct 2024 19:26:50 GMT, Jasmine Karthikeyan wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> style changes > > I think this is a really nice cleanup, it makes working with vectors more clear. I've added some minor stylistic fixes you could do since you're already changing these lines. @jaskarth Nice suggestions, I have reviewed the patch and done similar changes to nearby lines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2403696783 From jkarthikeyan at openjdk.org Thu Oct 10 03:04:47 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 10 Oct 2024 03:04:47 GMT Subject: RFR: 8341781: Improve Min/Max node identities Message-ID: Hi all, This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: Baseline Patch Benchmark Mode Cnt Score Error Units Score Error Units Improvement BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! ------------- Commit messages: - Min/Max identities Changes: https://git.openjdk.org/jdk/pull/21439/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341781 Stats: 293 lines in 5 files changed: 287 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21439.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21439/head:pull/21439 PR: https://git.openjdk.org/jdk/pull/21439 From jkarthikeyan at openjdk.org Thu Oct 10 03:06:17 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 10 Oct 2024 03:06:17 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: On Thu, 10 Oct 2024 01:09:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more style changes Thanks, looks good to me! ------------- Marked as reviewed by jkarthikeyan (Committer). PR Review: https://git.openjdk.org/jdk/pull/21414#pullrequestreview-2358765443 From liach at openjdk.org Thu Oct 10 05:16:14 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 10 Oct 2024 05:16:14 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 02:59:14 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! src/hotspot/share/opto/addnode.hpp line 270: > 268: virtual int Opcode() const = 0; > 269: virtual int max_opcode() const = 0; > 270: virtual int min_opcode() const = 0; The old comment above // all the behavior of addition on a ring. Only new thing is that we allow // 2 equal inputs to be equal. seems outdated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794661997 From chagedorn at openjdk.org Thu Oct 10 06:11:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Oct 2024 06:11:12 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 02:59:14 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! Few comments, otherwise, looks good to me. src/hotspot/share/opto/addnode.cpp line 1478: > 1476: } > 1477: > 1478: // If the operations are different return the operand, as Max(A, Min(A, B)) == A if the value isn't a floating point value, Suggestion: // If the operations are different return the operand 'A', as Max(A, Min(A, B)) == A if the value isn't a floating point value, src/hotspot/share/opto/addnode.cpp line 1479: > 1477: > 1478: // If the operations are different return the operand, as Max(A, Min(A, B)) == A if the value isn't a floating point value, > 1479: // as if B == NaN the identity doesn't hold. Reads as "as if". Maybe rephrase to Suggestion: // For floating points, the identity does not hold if B == NaN. ? src/hotspot/share/opto/addnode.cpp line 1485: > 1483: } > 1484: > 1485: return nullptr; I guess you can remove this since we return nullptr below anyway. Suggestion: test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 116: > 114: > 115: @Test > 116: @IR(applyIfPlatform = { "riscv64", "false" }, phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) Can you add a comment here why we cannot apply the rules for riscv? test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 122: > 120: > 121: @Test > 122: @IR(applyIfPlatform = { "riscv64", "false" }, failOn = { IRNode.MIN_L, IRNode.MAX_L }) Since `MinL/MaxL` are expanded in macro expansion, this rule will also succeed even if the optimization is not applied. I suggest to also add `phase = CompilePhase.BEFORE_MACRO_EXPANSION`. Same below. test/hotspot/jtreg/compiler/vectorization/runner/BasicShortOpTest.java line 220: > 218: short[] res = new short[SIZE]; > 219: for (int i = 0; i < SIZE; i++) { > 220: res[i] = (short) Math.min(a[i], b[i]); I guess without this change, this collapses to a constant which enables vectorization which was not expected before? ------------- PR Review: https://git.openjdk.org/jdk/pull/21439#pullrequestreview-2359045864 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794700959 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794705393 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794707261 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794711053 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794714816 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1794720021 From dnsimon at openjdk.org Thu Oct 10 07:42:12 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Oct 2024 07:42:12 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v5] In-Reply-To: <4X5xzYDU45UnuXtoC0sEJ7dF5seNqQDJxhdNrdktsV8=.03a8e2b5-7e43-44b7-92b4-b4440dc26770@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <4X5xzYDU45UnuXtoC0sEJ7dF5seNqQDJxhdNrdktsV8=.03a8e2b5-7e43-44b7-92b4-b4440dc26770@github.com> Message-ID: On Fri, 4 Oct 2024 16:34:54 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > Simplified C2V_BLOCK. Looks good to me. src/hotspot/share/compiler/compilerThread.cpp line 58: > 56: > 57: void CompilerThread::set_compiler(AbstractCompiler* c) { > 58: /* The comment could be a little shorter: /* * Compiler threads need to make Java upcalls to the jargraal compiler. * Java upcalls are also needed by the InterpreterRuntime when using jargraal. */ ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21285#pullrequestreview-2359296330 PR Review Comment: https://git.openjdk.org/jdk/pull/21285#discussion_r1794843319 From rcastanedalo at openjdk.org Thu Oct 10 08:37:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 10 Oct 2024 08:37:16 GMT Subject: Integrated: 8341619: C2: remove unused StoreCM node In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:28:19 GMT, Roberto Casta?eda Lozano wrote: > This cleanup removes C2's `StoreCM` (store card mark) node and all its special handling code. This node used to model card mark stores in early-expanded G1 post-barriers, and is no longer needed after [JEP 475](https://openjdk.org/jeps/475). > > __Testing:__ tier1-5 (linux-x64, linux-aarch64, windows-x64, macosx-x64, and macosx-aarch64; release and debug mode). This pull request has now been integrated. Changeset: 16042556 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/16042556f394adfa93e54173944198397ad29dea Stats: 388 lines in 23 files changed: 0 ins; 376 del; 12 mod 8341619: C2: remove unused StoreCM node Reviewed-by: chagedorn, thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21385 From chagedorn at openjdk.org Thu Oct 10 09:06:13 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Oct 2024 09:06:13 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 02:59:14 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! Your new test fails on Linux with `-XX:UseAVX=0`: One or more @IR rules failed: Failed IR Rules (8) of Methods (8) ---------------------------------- 1) Method "public double compiler.c2.irTests.TestMinMaxIdentities.doubleMaxMax(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_D#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 2) Method "public double compiler.c2.irTests.TestMinMaxIdentities.doubleMaxMin(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1", "_#MAX_D#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 3) Method "public double compiler.c2.irTests.TestMinMaxIdentities.doubleMinMax(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1", "_#MAX_D#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 4) Method "public double compiler.c2.irTests.TestMinMaxIdentities.doubleMinMin(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 5) Method "public float compiler.c2.irTests.TestMinMaxIdentities.floatMaxMax(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_F#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 6) Method "public float compiler.c2.irTests.TestMinMaxIdentities.floatMaxMin(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1", "_#MAX_F#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 7) Method "public float compiler.c2.irTests.TestMinMaxIdentities.floatMinMax(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1", "_#MAX_F#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 8) Method "public float compiler.c2.irTests.TestMinMaxIdentities.floatMinMin(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! Looks like we do not emit `Min/MaxF/D` nodes with `UseAVX=0`. I quickly checked the code and indeed, the intrinsics are only enabled if `UseAVX >= 1`: https://github.com/openjdk/jdk/blob/16042556f394adfa93e54173944198397ad29dea/src/hotspot/cpu/x86/x86.ad#L1542-L1549 You can probably just update your tests to exclude IR matching for this setup. Maybe you also want to double check the other architectures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21439#issuecomment-2404520133 From chagedorn at openjdk.org Thu Oct 10 09:14:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Oct 2024 09:14:39 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes Message-ID: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). The patch includes the following changes: - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. Thanks, Christian ------------- Commit messages: - 8341328: Refactor initial Assertion Predicate creation into separate classes Changes: https://git.openjdk.org/jdk/pull/21446/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21446&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341328 Stats: 529 lines in 6 files changed: 302 ins; 118 del; 109 mod Patch: https://git.openjdk.org/jdk/pull/21446.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21446/head:pull/21446 PR: https://git.openjdk.org/jdk/pull/21446 From chagedorn at openjdk.org Thu Oct 10 09:14:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Oct 2024 09:14:39 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes In-Reply-To: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> References: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> Message-ID: On Thu, 10 Oct 2024 09:04:21 GMT, Christian Hagedorn wrote: > This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). > > The patch includes the following changes: > - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: > - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). > - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. > - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) > - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). > - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) > - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. > > Thanks, > Christian src/hotspot/share/opto/loopPredicate.cpp line 1277: > 1275: IfTrueNode* template_assertion_predicate_proj = > 1276: create_template_assertion_predicate(if_opcode, cl, parse_predicate_proj, upper_bound_proj, scale, offset, range, > 1277: deopt_reason); We only use the opcode from the `iff`. `init`, `limit` and `stride` can be fetched from the `CountedLoop` again. src/hotspot/share/opto/loopTransform.cpp line 3088: > 3086: set_ctrl(iffm->in(1), new_limit_ctrl); > 3087: > 3088: C->print_method(PHASE_AFTER_RANGE_CHECK_ELIMINATION, 4, cl); Moved this down because we missed some transformations when having this earlier. Additionally, if there are multiple range checks, we can see the intermediate state for one transformation with the next `PHASE_BEFORE_RANGE_CHECK_ELIMINATION` dump. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21446#discussion_r1795020752 PR Review Comment: https://git.openjdk.org/jdk/pull/21446#discussion_r1795027412 From jbhateja at openjdk.org Thu Oct 10 12:22:20 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Oct 2024 12:22:20 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 18:31:41 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed a review comment src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 1522: > 1520: } > 1521: > 1522: void MacroAssembler::sha512_update_ni_x1(Register arg_hash, Register arg_msg, Register ofs, Register limit, bool multi_block) { Please add a comment on this mentioning the source of algorithm. https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 1602: > 1600: vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] > 1601: vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] > 1602: vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] [Algorithm](https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm) is specifically crafted for 256 bit vectors and with 512 bit extension we modify it. Do you think we should factor out following pattern and add an alternative implementation for it ? ``` vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] This is a fixed pattern seen 4 times within computation loop and once outside the loop. We are permuting two vectors with constant paramutation mask and blending them using immediate mask. This is a very valid use case for two table permutation instruction VPERMI2Q (available for AVX512VL targets) We can store permutation pattern outside the loop into a vector and then re-use it within the loop. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 1587: > 1585: __ sha512_AVX2(msg, state0, state1, msgtmp0, msgtmp1, msgtmp2, msgtmp3, msgtmp4, > 1586: buf, state, ofs, limit, rsp, multi_block, shuf_mask); > 1587: } Suggestion: const XMMRegister msg = xmm0; const XMMRegister state0 = xmm1; const XMMRegister state1 = xmm2; const XMMRegister msgtmp0 = xmm3; const XMMRegister msgtmp1 = xmm4; const XMMRegister msgtmp2 = xmm5; const XMMRegister msgtmp3 = xmm6; const XMMRegister msgtmp4 = xmm7; const XMMRegister shuf_mask = xmm8; __ sha512_AVX2(msg, state0, state1, msgtmp0, msgtmp1, msgtmp2, msgtmp3, msgtmp4, buf, state, ofs, limit, rsp, multi_block, shuf_mask); } src/hotspot/cpu/x86/stubRoutines_x86.cpp line 446: > 444: 0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL, > 445: }; > 446: Remove this newline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1795316551 PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1795279620 PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1785638858 PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1785638760 From iveresov at openjdk.org Thu Oct 10 15:33:38 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 10 Oct 2024 15:33:38 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" Message-ID: `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. ------------- Commit messages: - 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" Changes: https://git.openjdk.org/jdk/pull/21455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341831 Stats: 9 lines in 1 file changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21455/head:pull/21455 PR: https://git.openjdk.org/jdk/pull/21455 From iveresov at openjdk.org Thu Oct 10 15:37:22 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 10 Oct 2024 15:37:22 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v2] In-Reply-To: References: Message-ID: > `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Remove the test from the problem list - 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" Summary: Relax assert to deal with CacheWB nodes ------------- Changes: https://git.openjdk.org/jdk/pull/21455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=01 Stats: 11 lines in 2 files changed: 8 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21455/head:pull/21455 PR: https://git.openjdk.org/jdk/pull/21455 From jbhateja at openjdk.org Thu Oct 10 16:27:25 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Oct 2024 16:27:25 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v23] In-Reply-To: <0LkBvvNPq5jmWfOdjItIXGedRDtpiivJM06BAx7vP0I=.c5417544-edef-4623-beaa-08cd7c565361@github.com> References: <0LkBvvNPq5jmWfOdjItIXGedRDtpiivJM06BAx7vP0I=.c5417544-edef-4623-beaa-08cd7c565361@github.com> Message-ID: On Tue, 8 Oct 2024 19:25:24 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge branch 'JDK-8338201' of http://github.com/jatin-bhateja/jdk into JDK-8338201 > - Update VectorMath.java > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338201 > - Typographical error fixups > - Doc fixups > - Typographic error > - Merge stashing and re-commit > - Tuning extra spaces. > - Tests for newly added VectorMath.* operations > - Test cleanups. > - ... and 16 more: https://git.openjdk.org/jdk/compare/7312eea3...ce76c3e5 Hi @vnkozlov , Can you kindly run this through your test infrastructure. We have two review approvals for Java and x86 backend code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2405554905 From qamai at openjdk.org Thu Oct 10 16:52:17 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 10 Oct 2024 16:52:17 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* In-Reply-To: <0GutJI2Gd9s2w3wK9R4UJBYhbyrYZIakEGEaz34jvZw=.95efd71c-d350-4a34-bfda-4cbf06599c1e@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <0GutJI2Gd9s2w3wK9R4UJBYhbyrYZIakEGEaz34jvZw=.95efd71c-d350-4a34-bfda-4cbf06599c1e@github.com> Message-ID: <5vIqh2WNKarobKtio8JND9Yf81Mt67pTJ9YlTj59bFE=.069563d4-f269-4df8-9232-2cf1862147ff@github.com> On Wed, 9 Oct 2024 16:55:55 GMT, Vladimir Kozlov wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > My testing passed. @vnkozlov Could you re-review this, please, it seems required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2405603659 From kvn at openjdk.org Thu Oct 10 17:00:17 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Oct 2024 17:00:17 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v2] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 15:37:22 GMT, Igor Veresov wrote: >> `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Remove the test from the problem list > - 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" > Summary: Relax assert to deal with CacheWB nodes Please add comment into code explaining change. ------------- PR Review: https://git.openjdk.org/jdk/pull/21455#pullrequestreview-2360944637 From kvn at openjdk.org Thu Oct 10 17:05:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Oct 2024 17:05:13 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: On Thu, 10 Oct 2024 01:09:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more style changes Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21414#pullrequestreview-2360953623 From iveresov at openjdk.org Thu Oct 10 17:26:25 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 10 Oct 2024 17:26:25 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: References: Message-ID: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> > `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21455/files - new: https://git.openjdk.org/jdk/pull/21455/files/b8e0d8bd..ae69ee4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21455/head:pull/21455 PR: https://git.openjdk.org/jdk/pull/21455 From iveresov at openjdk.org Thu Oct 10 17:26:25 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 10 Oct 2024 17:26:25 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v2] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 16:57:24 GMT, Vladimir Kozlov wrote: > Please add comment into code explaining change. Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2405665750 From svkamath at openjdk.org Thu Oct 10 18:52:30 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 10 Oct 2024 18:52:30 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v4] In-Reply-To: References: Message-ID: > Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Updated code as per review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20633/files - new: https://git.openjdk.org/jdk/pull/20633/files/85c1aea9..3cb9175a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20633&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20633&range=02-03 Stats: 13 lines in 2 files changed: 1 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/20633.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20633/head:pull/20633 PR: https://git.openjdk.org/jdk/pull/20633 From svkamath at openjdk.org Thu Oct 10 18:52:31 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 10 Oct 2024 18:52:31 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 11:52:36 GMT, Jatin Bhateja wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed a review comment > > src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 1602: > >> 1600: vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] >> 1601: vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] >> 1602: vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] > > I assume [Algorithm](https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm) is specifically crafted for 256 bit vectors and with 512 bit extension we modify it. Do you think we should factor out following pattern and add an alternative implementation for it ? > > ``` > vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] > vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] > vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] > > > This is a fixed pattern seen 4 times within computation loop and once outside the loop. > We are permuting two vectors with constant paramutation mask and blending them using immediate mask. > This is a very valid use case for two table permutation instruction VPERMI2Q (available for AVX512VL targets) > We can store permutation pattern outside the loop into a vector and then re-use it within the loop. We can do this change in a separate PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1795938470 From kvn at openjdk.org Thu Oct 10 19:21:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Oct 2024 19:21:12 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: On Thu, 10 Oct 2024 17:26:25 GMT, Igor Veresov wrote: >> `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Add comment LoadStore nodes should have the same issue. Why they are not affected? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2405863892 From iveresov at openjdk.org Thu Oct 10 21:04:11 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 10 Oct 2024 21:04:11 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: On Thu, 10 Oct 2024 19:18:51 GMT, Vladimir Kozlov wrote: > LoadStore nodes should have the same issue. Why they are not affected? Because LoadStore is an official store. It consumes a memory state and produces memory state. CacheWB is not really a store, that is it doesn't produce memory effects from the perspective of the backend (its match rule is not a Set). It's hard to tell what's the best way to model it, so I just decided not to mess with its semantics right now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2406030612 From kvn at openjdk.org Thu Oct 10 21:21:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Oct 2024 21:21:11 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: On Thu, 10 Oct 2024 17:26:25 GMT, Igor Veresov wrote: >> `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Marked as reviewed by kvn (Reviewer). Okay ------------- PR Review: https://git.openjdk.org/jdk/pull/21455#pullrequestreview-2361441227 PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2406055260 From dlong at openjdk.org Thu Oct 10 21:46:14 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 10 Oct 2024 21:46:14 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v2] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 21:17:46 GMT, Vladimir Ivanov wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> simplification based on reviewer comments > > src/hotspot/share/ci/ciMethod.cpp line 692: > >> 690: >> 691: // Redefinition support. >> 692: if (this->get_Method()->is_old() || root_m->get_Method()->is_old()) { > > Is it safe to access raw `Method*` from a compiler thread which is not in VM state? No, probably not. I'll fix it. I was assuming the whole function was in the VM state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1796131749 From dlong at openjdk.org Thu Oct 10 22:40:44 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 10 Oct 2024 22:40:44 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request incrementally with one additional commit since the last revision: make sure to be in VM state when checking is_old ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21148/files - new: https://git.openjdk.org/jdk/pull/21148/files/0705b33e..80c9ae67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=01-02 Stats: 16 lines in 2 files changed: 10 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From vlivanov at openjdk.org Thu Oct 10 22:56:12 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 10 Oct 2024 22:56:12 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 22:40:44 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > make sure to be in VM state when checking is_old src/hotspot/share/ci/ciMethod.cpp line 695: > 693: // Redefinition support. > 694: if (this->is_old() || root_m->is_old()) { > 695: return nullptr; IMO you can safely drop this particular check. The one after `Dependencies::find_unique_concrete_method()` should be enough to preserve the invariant (`target == cha_monomorphic_target`) . But thinking more about it, now I'm curious what happens when an old method is actually encountered. The fix conservatively rejects possible inlining opportunity, but it seems it doesn't invalidate resulting nmethod anymore. So, no recompilation attempt follows to recuperate that. We could either record a evol dependency on a stale `Method` (to fail during nmethod installation step) or fail-fast the compilation (probably, implies additional checks to propagate the failure status). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1796190689 From duke at openjdk.org Thu Oct 10 23:04:10 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 10 Oct 2024 23:04:10 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 18:31:41 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Addressed a review comment This implementation looks good to me. I went through the implementation of `sha512_update_ni_x1`. Looked at it line by line and compared it to the ipsec [implementation](https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm). Thanks, Srinivas Vamsi Parasa (Intel) ------------- Marked as reviewed by vamsi-parasa at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/20633#pullrequestreview-2361568171 From dlong at openjdk.org Thu Oct 10 23:08:13 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 10 Oct 2024 23:08:13 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 22:53:03 GMT, Vladimir Ivanov wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> make sure to be in VM state when checking is_old > > src/hotspot/share/ci/ciMethod.cpp line 695: > >> 693: // Redefinition support. >> 694: if (this->is_old() || root_m->is_old()) { >> 695: return nullptr; > > IMO you can safely drop this particular check. The one after `Dependencies::find_unique_concrete_method()` should be enough to preserve the invariant (`target == cha_monomorphic_target`) . > > But thinking more about it, now I'm curious what happens when an old method is actually encountered. The fix conservatively rejects possible inlining opportunity, but it seems it doesn't invalidate resulting nmethod anymore. So, no recompilation attempt follows to recuperate that. > > We could either record a evol dependency on a stale `Method` (to fail during nmethod installation step) or fail-fast the compilation (probably, implies additional checks to propagate the failure status). Even if we check for stale Methods in various places, including invoke(), there is nothing to prevent the method from going stale after the last spot-check. My understanding was that we already handle stale metadata as a precondition to creating the nmethod. If we have a loophole there that lets stale metadata get through, then that's a separate existing bug for C1 and C2. I was tempted to add a bailout, but the reason would be as a performance improvement to short-circuit wasted work, not to correct a stale metadata problem. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1796199310 From dlong at openjdk.org Thu Oct 10 23:11:10 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 10 Oct 2024 23:11:10 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 23:05:18 GMT, Dean Long wrote: >> src/hotspot/share/ci/ciMethod.cpp line 695: >> >>> 693: // Redefinition support. >>> 694: if (this->is_old() || root_m->is_old()) { >>> 695: return nullptr; >> >> IMO you can safely drop this particular check. The one after `Dependencies::find_unique_concrete_method()` should be enough to preserve the invariant (`target == cha_monomorphic_target`) . >> >> But thinking more about it, now I'm curious what happens when an old method is actually encountered. The fix conservatively rejects possible inlining opportunity, but it seems it doesn't invalidate resulting nmethod anymore. So, no recompilation attempt follows to recuperate that. >> >> We could either record a evol dependency on a stale `Method` (to fail during nmethod installation step) or fail-fast the compilation (probably, implies additional checks to propagate the failure status). > > Even if we check for stale Methods in various places, including invoke(), there is nothing to prevent the method from going stale after the last spot-check. My understanding was that we already handle stale metadata as a precondition to creating the nmethod. If we have a loophole there that lets stale metadata get through, then that's a separate existing bug for C1 and C2. > I was tempted to add a bailout, but the reason would be as a performance improvement to short-circuit wasted work, not to correct a stale metadata problem. > IMO you can safely drop this particular check. The one after Dependencies::find_unique_concrete_method() should be enough to preserve the invariant (target == cha_monomorphic_target) . If I do that, then I can also revert the VM state changes. However, I wasn't able to convince myself that this check is not needed. If we end up returning root_m here as cha_monomorphic_target, it seems possible that it could be a new version of the method, and then target == cha_monomorphic_target would fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1796201992 From duke at openjdk.org Thu Oct 10 23:16:11 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 10 Oct 2024 23:16:11 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: References: Message-ID: <0EVgFAWZ9O9e_dlRj4Na8Xf7QEC6FGU03wAs1bwVq5c=.ff19b62d-3b2c-4320-8be2-c7dee7cafdae@github.com> On Thu, 10 Oct 2024 18:49:38 GMT, Smita Kamath wrote: >> src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 1602: >> >>> 1600: vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] >>> 1601: vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] >>> 1602: vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] >> >> I assume [Algorithm](https://github.com/intel/intel-ipsec-mb/blob/main/lib/avx2_t4/sha512_x1_ni_avx2.asm) is specifically crafted for 256 bit vectors and with 512 bit extension we modify it. Do you think we should factor out following pattern and add an alternative implementation for it ? >> >> ``` >> vpermq(xmm8, xmm4, 0x1b, Assembler::AVX_256bit);//ymm8 = W[20] W[21] W[22] W[23] >> vpermq(xmm9, xmm3, 0x39, Assembler::AVX_256bit);//ymm9 = W[16] W[19] W[18] W[17] >> vpblendd(xmm7, xmm8, xmm9, 0x3f, Assembler::AVX_256bit);//ymm7 = W[20] W[19] W[18] W[17] >> >> >> This is a fixed pattern seen 4 times within computation loop and once outside the loop. >> We are permuting two vectors with constant paramutation mask and blending them using immediate mask. >> This is a very valid use case for two table permutation instruction VPERMI2Q (available for AVX512VL targets) >> We can store permutation pattern outside the loop into a vector and then re-use it within the loop. > > We can do this change in a separate PR. I agree with Smita. The current implementation has a one-to-one correspondence with the ipsec implementation. Any new changes or refactoring will require a new round of exhaustive testing and could be implemented as a separate PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1796204440 From dlong at openjdk.org Thu Oct 10 23:24:12 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 10 Oct 2024 23:24:12 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 23:09:00 GMT, Dean Long wrote: >> Even if we check for stale Methods in various places, including invoke(), there is nothing to prevent the method from going stale after the last spot-check. My understanding was that we already handle stale metadata as a precondition to creating the nmethod. If we have a loophole there that lets stale metadata get through, then that's a separate existing bug for C1 and C2. >> I was tempted to add a bailout, but the reason would be as a performance improvement to short-circuit wasted work, not to correct a stale metadata problem. > >> IMO you can safely drop this particular check. The one after Dependencies::find_unique_concrete_method() should be enough to preserve the invariant (target == cha_monomorphic_target) . > > If I do that, then I can also revert the VM state changes. However, I wasn't able to convince myself that this check is not needed. If we end up returning root_m here as cha_monomorphic_target, it seems possible that it could be a new version of the method, and then target == cha_monomorphic_target would fail. C1 does call 2159 dependency_recorder()->assert_evol_method(inline_target); which in the CHA case would be `cha_monomorphic_target`, not `target`, so it looks like we may not detect if `target` is stale as long as `cha_monomorphic_target` is not. It seems like a minor loophole, but I'm not sure what kind of problems it could cause, especially if the bytecodes of `target` are not used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1796211261 From dlong at openjdk.org Fri Oct 11 01:06:46 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 01:06:46 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v4] In-Reply-To: References: Message-ID: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request incrementally with one additional commit since the last revision: fix errors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21148/files - new: https://git.openjdk.org/jdk/pull/21148/files/80c9ae67..55988fd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=02-03 Stats: 15 lines in 2 files changed: 11 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From dlong at openjdk.org Fri Oct 11 01:06:46 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 01:06:46 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 22:40:44 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > make sure to be in VM state when checking is_old Hold off on re-reviews. I need to fix some errors introduced by moving the VM state transitions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2406329451 From dlong at openjdk.org Fri Oct 11 01:37:53 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 01:37:53 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v5] In-Reply-To: References: Message-ID: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request incrementally with one additional commit since the last revision: redo VM state ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21148/files - new: https://git.openjdk.org/jdk/pull/21148/files/55988fd3..2c7fc099 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=03-04 Stats: 29 lines in 2 files changed: 7 ins; 17 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From dlong at openjdk.org Fri Oct 11 01:47:10 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 01:47:10 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: <-8mlP1ioQUoMffiB5nO9sDlyPH_9kEImNl9NqdVqC6s=.b2442fa3-cf7a-424f-9e70-cefd0eb70419@github.com> On Thu, 10 Oct 2024 21:01:26 GMT, Igor Veresov wrote: > > LoadStore nodes should have the same issue. Why they are not affected? > > Because LoadStore is an official store. It consumes a memory state and produces memory state. CacheWB is not really a store, that is it doesn't produce memory effects from the perspective of the backend (its match rule is not a Set). It's hard to tell what's the best way to model it, so I just decided not to mess with its semantics right now. Should it be treated like a memory barrier? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2406371663 From dlong at openjdk.org Fri Oct 11 01:54:17 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 01:54:17 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v5] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 01:37:53 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > redo VM state OK, fixed version pushed. I moved the first group of is_old checks into resolve_invoke(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2406382772 From iveresov at openjdk.org Fri Oct 11 03:09:11 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 11 Oct 2024 03:09:11 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: <-8mlP1ioQUoMffiB5nO9sDlyPH_9kEImNl9NqdVqC6s=.b2442fa3-cf7a-424f-9e70-cefd0eb70419@github.com> References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> <-8mlP1ioQUoMffiB5nO9sDlyPH_9kEImNl9NqdVqC6s=.b2442fa3-cf7a-424f-9e70-cefd0eb70419@github.com> Message-ID: On Fri, 11 Oct 2024 01:44:22 GMT, Dean Long wrote: > > > LoadStore nodes should have the same issue. Why they are not affected? > > > > > > Because LoadStore is an official store. It consumes a memory state and produces memory state. CacheWB is not really a store, that is it doesn't produce memory effects from the perspective of the backend (its match rule is not a Set). It's hard to tell what's the best way to model it, so I just decided not to mess with its semantics right now. > > Should it be treated like a memory barrier? I'm not sure why it's not, I guess they wanted a more relaxed behavior? It's more like the opposite of prefetch really. I didn't want to touch the semantics of it in this bug fix because it feels like it will likely open another can of worms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21455#issuecomment-2406471308 From jkarthikeyan at openjdk.org Fri Oct 11 04:30:11 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 04:30:11 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 05:57:09 GMT, Christian Hagedorn wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 116: > >> 114: >> 115: @Test >> 116: @IR(applyIfPlatform = { "riscv64", "false" }, phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) > > Can you add a comment here why we cannot apply the rules for riscv? This is a good call, the IR rules don't apply to RISC-V because it doesn't have support for CMoves so the MinL/MaxL nodes aren't made at all. Since `Math.min/max(LL)` isn't intensified it first needs to be matched into CMove, then Min/Max, and then the identity needs to be called. Since #20098 implements the intrinsic we could remove the special casing after it's merged. I've added a comment to the source code as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1796408365 From jkarthikeyan at openjdk.org Fri Oct 11 04:35:12 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 04:35:12 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 06:06:19 GMT, Christian Hagedorn wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > test/hotspot/jtreg/compiler/vectorization/runner/BasicShortOpTest.java line 220: > >> 218: short[] res = new short[SIZE]; >> 219: for (int i = 0; i < SIZE; i++) { >> 220: res[i] = (short) Math.min(a[i], b[i]); > > I guess without this change, this collapses to a constant which enables vectorization which was not expected before? Yeah, exactly - since `65536` is larger than the maximum short value, with this patch it can optimize the MinI node away entirely. I changed it to use the `b` array, which is what the `vectorMax` test case below uses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1796410477 From jkarthikeyan at openjdk.org Fri Oct 11 05:03:12 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 05:03:12 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: <1T3CTv_0tE6J0fU6QJx20vyfikrIxPG2uzTVRNJGPwA=.059935c2-32c8-43c5-ab9b-d9541f65e25f@github.com> On Fri, 11 Oct 2024 04:28:00 GMT, Jasmine Karthikeyan wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 116: >> >>> 114: >>> 115: @Test >>> 116: @IR(applyIfPlatform = { "riscv64", "false" }, phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) >> >> Can you add a comment here why we cannot apply the rules for riscv? > > This is a good call, the IR rules don't apply to RISC-V because it doesn't have support for CMoves so the MinL/MaxL nodes aren't made at all. Since `Math.min/max(LL)` isn't intensified it first needs to be matched into CMove, then Min/Max, and then the identity needs to be called. Since #20098 implements the intrinsic we could remove the special casing after it's merged. I've added a comment to the source code as well. On closer inspection it seems that because of the CMove cost model, the outer min/max operation doesn't get turned into a CMove so the Long IR rules don't reliably get matched anywhere. It must have slipped through the cracks because of the way that my IR rules were structured, I only realized this after I added the compile phase to the other 2 rules. I think for this to work it would need the intrinsics from the other PR. Do you think we should continue with this PR with the Long cases disabled and enable it afterwards, or should we wait for #20098 to be merged? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1796424851 From chagedorn at openjdk.org Fri Oct 11 07:06:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Oct 2024 07:06:10 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: <1T3CTv_0tE6J0fU6QJx20vyfikrIxPG2uzTVRNJGPwA=.059935c2-32c8-43c5-ab9b-d9541f65e25f@github.com> References: <1T3CTv_0tE6J0fU6QJx20vyfikrIxPG2uzTVRNJGPwA=.059935c2-32c8-43c5-ab9b-d9541f65e25f@github.com> Message-ID: On Fri, 11 Oct 2024 05:00:06 GMT, Jasmine Karthikeyan wrote: >> This is a good call, the IR rules don't apply to RISC-V because it doesn't have support for CMoves so the MinL/MaxL nodes aren't made at all. Since `Math.min/max(LL)` isn't intensified it first needs to be matched into CMove, then Min/Max, and then the identity needs to be called. Since #20098 implements the intrinsic we could remove the special casing after it's merged. I've added a comment to the source code as well. > > On closer inspection it seems that because of the CMove cost model, the outer min/max operation doesn't get turned into a CMove so the Long IR rules don't reliably get matched anywhere. It must have slipped through the cracks because of the way that my IR rules were structured, I only realized this after I added the compile phase to the other 2 rules. I think for this to work it would need the intrinsics from the other PR. Do you think we should continue with this PR with the Long cases disabled and enable it afterwards, or should we wait for #20098 to be merged? Thanks for sharing more details. I think it's perfectly fine to still add them now but leave them disabled with a reference to JDK-8307513 since you already wrote them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1796513108 From thartmann at openjdk.org Fri Oct 11 07:56:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Oct 2024 07:56:50 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found Message-ID: Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 Thanks, Tobias ------------- Commit messages: - Fix - 8336726: C2: assert(\!do_asserts || projs->fallthrough_ioproj \!= nullptr) failed: must be found Changes: https://git.openjdk.org/jdk/pull/21450/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21450&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336726 Stats: 83 lines in 4 files changed: 77 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21450.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21450/head:pull/21450 PR: https://git.openjdk.org/jdk/pull/21450 From thartmann at openjdk.org Fri Oct 11 08:36:12 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Oct 2024 08:36:12 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 23:30:03 GMT, Dean Long wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> In short: >> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. >> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. >> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. >> >> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. >> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) t... > > Wouldn't it be better to get rid of the concurrency? We could grab CodeCache_lock and Patching_lock in the same block, so we serialize the patching and register_nmethod. @dean-long I discussed this with @tschatzl and, on his request, improved the PR description a bit. He would also prefer the alignment solution because it does not increase the scope of the lock (and we already rely on word-aligned word-sized memory accesses being atomic in many other places). What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2406908303 From chagedorn at openjdk.org Fri Oct 11 13:40:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Oct 2024 13:40:11 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 13:29:11 GMT, Tobias Hartmann wrote: > Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. > > Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: > https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 > > Thanks, > Tobias That looks reasonable to me. As we have discussed offline, it's probably not worth/too complex to verify that we always end in an infinite loop afterwards. test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java line 53: > 51: public static void test(boolean flag) { > 52: // Avoid executing endless loop > 53: if (flag) return; You should add braces here: Suggestion: if (flag) { return; } ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21450#pullrequestreview-2362827732 PR Review Comment: https://git.openjdk.org/jdk/pull/21450#discussion_r1796968384 From thartmann at openjdk.org Fri Oct 11 14:23:48 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Oct 2024 14:23:48 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found [v2] In-Reply-To: References: Message-ID: > Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. > > Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: > https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21450/files - new: https://git.openjdk.org/jdk/pull/21450/files/8ffdbd01..ba06b702 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21450&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21450&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21450.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21450/head:pull/21450 PR: https://git.openjdk.org/jdk/pull/21450 From thartmann at openjdk.org Fri Oct 11 14:23:48 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Oct 2024 14:23:48 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 13:29:11 GMT, Tobias Hartmann wrote: > Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. > > Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: > https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 > > Thanks, > Tobias Thanks for the review, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21450#issuecomment-2407518566 From chagedorn at openjdk.org Fri Oct 11 14:33:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Oct 2024 14:33:11 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 14:23:48 GMT, Tobias Hartmann wrote: >> Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. >> >> Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: >> https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21450#pullrequestreview-2362954248 From jkarthikeyan at openjdk.org Fri Oct 11 15:15:54 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 15:15:54 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Suggestions from review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21439/files - new: https://git.openjdk.org/jdk/pull/21439/files/af771cff..b4b96143 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=00-01 Stats: 15 lines in 3 files changed: 5 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21439.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21439/head:pull/21439 PR: https://git.openjdk.org/jdk/pull/21439 From jkarthikeyan at openjdk.org Fri Oct 11 15:15:54 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 15:15:54 GMT Subject: RFR: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 02:59:14 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! Thanks for the suggestions and testing, @liach and @chhagedorn! I've taken a look at the backend implementations, and it seems that aarch64 and RISC-V unconditionally support floating point Min/Max while x64 only supports them with `UseAVX >= 1`, as described. I made it so that the test only runs when it matches that criteria. I've pushed a commit that should address all the suggestions here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21439#issuecomment-2407621359 From jkarthikeyan at openjdk.org Fri Oct 11 15:15:55 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 15:15:55 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: <1T3CTv_0tE6J0fU6QJx20vyfikrIxPG2uzTVRNJGPwA=.059935c2-32c8-43c5-ab9b-d9541f65e25f@github.com> Message-ID: On Fri, 11 Oct 2024 07:03:44 GMT, Christian Hagedorn wrote: >> On closer inspection it seems that because of the CMove cost model, the outer min/max operation doesn't get turned into a CMove so the Long IR rules don't reliably get matched anywhere. It must have slipped through the cracks because of the way that my IR rules were structured, I only realized this after I added the compile phase to the other 2 rules. I think for this to work it would need the intrinsics from the other PR. Do you think we should continue with this PR with the Long cases disabled and enable it afterwards, or should we wait for #20098 to be merged? > > Thanks for sharing more details. I think it's perfectly fine to still add them now but leave them disabled with a reference to JDK-8307513 since you already wrote them. Sounds good! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1797098107 From qamai at openjdk.org Fri Oct 11 15:31:17 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 15:31:17 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: On Thu, 10 Oct 2024 01:09:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. >> >> Please take a look and leave your reviews, >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more style changes Thanks a lot for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2407651084 From qamai at openjdk.org Fri Oct 11 15:31:19 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 15:31:19 GMT Subject: Integrated: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* In-Reply-To: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> Message-ID: On Tue, 8 Oct 2024 19:46:12 GMT, Quan Anh Mai wrote: > Hi, > > This patch refactors `TypeVect` to use a `BasicType` instead of a `const Type*` as our current implementation. This conveys the element information in a clearer and safer manner. > > Please take a look and leave your reviews, > Thanks a lot. This pull request has now been integrated. Changeset: 7276a1be Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/7276a1bec0d90f63e9e433fdcdfd6564b70dc9bb Stats: 208 lines in 18 files changed: 3 ins; 77 del; 128 mod 8341784: Refactor TypeVect to use a BasicType instead of a const Type* Reviewed-by: kvn, jkarthikeyan ------------- PR: https://git.openjdk.org/jdk/pull/21414 From jkarthikeyan at openjdk.org Fri Oct 11 15:34:15 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 15:34:15 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Hmm, do you think this pattern could be matched in the ad-files instead of the middle end? I think that might be a lot cleaner since the backend already has systems for matching node trees, which could avoid a lot of the complexity here. I think it could make the patch a lot smaller and simpler. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407658405 From qamai at openjdk.org Fri Oct 11 15:54:47 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 15:54:47 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop Message-ID: Hi, This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - add benchmark - don't eagerly spill if we are reassigned anyway - eagerly spill a node in the loop entry Changes: https://git.openjdk.org/jdk/pull/21472/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341697 Stats: 87 lines in 3 files changed: 81 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From qamai at openjdk.org Fri Oct 11 16:01:09 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 16:01:09 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 15:50:20 GMT, Quan Anh Mai wrote: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. > > Please take a look and leave your reviews, thanks a lot. The benchmark result: Benchmark Mode Cnt Score Error Units LoopCounterBench.field_ret avgt 3 417.865 ? 2.914 ns/op LoopCounterBench.localVar_ret avgt 3 332.657 ? 109.310 ns/op The inner loop is free of spills because it has been hoisted to the loop entry: ? 0x00007fdf9821b546: mov r9d,DWORD PTR [r11+0xc] ;*getfield increment {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 1 (line 56) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) 0.03% ? 0x00007fdf9821b54a: mov esi,DWORD PTR [r12+r8*8+0xc] ; implicit exception: dispatches to 0x00007fdf9821b6f4 ? ;*lastore {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 27 (line 58) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) ? 0x00007fdf9821b54f: lea rax,[r12+r14*8] ? 0x00007fdf9821b553: lea r13,[r12+r8*8] 0.03% ? 0x00007fdf9821b557: xor edi,edi THE SPILL ? 0x00007fdf9821b559: vmovq xmm0,rbp ? 0x00007fdf9821b55e: xchg ax,ax ;*aload_0 {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 16 (line 58) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) ?? 0x00007fdf9821b560: cmp edi,r10d 1.66% ??? 0x00007fdf9821b563: jae 0x00007fdf9821b587 ??? 0x00007fdf9821b565: mov rbp,QWORD PTR [rax+rdi*8+0x10];*laload {reexecute=0 rethrow=0 return_oop=0} ??? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 26 (line 58) ??? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) 5.43% ??? 0x00007fdf9821b56a: cmp edi,esi 0.17% ??? 0x00007fdf9821b56c: jae 0x00007fdf9821b5c8 ??? 0x00007fdf9821b56e: mov QWORD PTR [r13+rdi*8+0x10],rbp;*goto {reexecute=0 rethrow=0 return_oop=0} ??? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 32 (line 57) ??? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) 1.40% ??? 0x00007fdf9821b573: add edi,r9d ;*iadd {reexecute=0 rethrow=0 return_oop=0} ??? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 30 (line 57) ??? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) 3.40% ??? 0x00007fdf9821b576: mov rbp,QWORD PTR [r15+0x450] ; ImmutableOopMap {r11=Oop r8=NarrowOop rcx=Oop rbx=Oop rdx=Oop rax=Oop r13=Oop r14=NarrowOop } ??? ;*goto {reexecute=1 rethrow=0 return_oop=0} ??? ; - (reexecute) org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 32 (line 57) ??? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) 1.80% ??? 0x00007fdf9821b57d: test DWORD PTR [rbp+0x0],eax ;*goto {reexecute=0 rethrow=0 return_oop=0} ??? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 32 (line 57) ??? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) ??? ; {poll} 84.42% ??? 0x00007fdf9821b580: cmp edi,r10d 0.30% ??? 0x00007fdf9821b583: jl 0x00007fdf9821b560 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ? ? ; - org.openjdk.bench.vm.compiler.LoopCounterBench::localVar_ret at 13 (line 57) ? ? ; - org.openjdk.bench.vm.compiler.jmh_generated.LoopCounterBench_localVar_ret_jmhTest::localVar_ret_avgt_jmhStub at 17 (line 190) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2407697430 From qamai at openjdk.org Fri Oct 11 16:05:11 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 16:05:11 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 15:50:20 GMT, Quan Anh Mai wrote: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. > > Please take a look and leave your reviews, thanks a lot. Thanks to @shipilev for the benchmark, could you verify that this can solve the issue in the original benchmark as I imagine this is a simplified version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2407710352 From jbhateja at openjdk.org Fri Oct 11 16:19:17 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 11 Oct 2024 16:19:17 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: On Fri, 11 Oct 2024 15:27:34 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> more style changes > > Thanks a lot for your reviews Hi @merykitty , LGTM. Best Regards. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2407733997 From qamai at openjdk.org Fri Oct 11 16:34:18 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 16:34:18 GMT Subject: RFR: 8341784: Refactor TypeVect to use a BasicType instead of a const Type* [v3] In-Reply-To: References: <7I_ilGaelEbSPsnVouXj0hGpW9eXxH8yiPY6s4leEDs=.2d2fc13c-b937-48b5-aee8-a28b2df80e85@github.com> <59KHIjiKHAJlqjuzs8pDq4a1nZUB97IGko_9n0ksRAA=.96ed09ba-b6dd-4a3c-b5f6-ac59f8a38875@github.com> Message-ID: <4k1WvfgPtwKa4RSDzjGnJYo2_O1dzDKdfHQrbLX5730=.040ea20c-7318-43e8-b39d-d0c2d44b3a27@github.com> On Fri, 11 Oct 2024 16:16:46 GMT, Jatin Bhateja wrote: >> Thanks a lot for your reviews > > Hi @merykitty , LGTM. > > Best Regards. @jatin-bhateja Thanks a lot for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21414#issuecomment-2407757842 From qamai at openjdk.org Fri Oct 11 16:57:13 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 16:57:13 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Another approach is to do similarly to `MacroLogicVNode`. You can make another node and transform `MulVL` to it before matching, this is more flexible than using match rules. I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering`. It can be used to do e.g split `ExtractI` into the 128-bit lane extraction and the element extraction from that lane. This allows us to do `GVN` on those and `v.lane(5) + v.lane(7)` can be compiled nicely as: vextracti128 xmm0, ymm1, 1 pextrd eax, xmm0, 1 // vextracti128 xmm0, ymm1, 1 here will be gvn-ed pextrd ecx, xmm0, 3 add eax, ecx ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407793168 From jkarthikeyan at openjdk.org Fri Oct 11 17:15:08 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 11 Oct 2024 17:15:08 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 11 Oct 2024 16:54:23 GMT, Quan Anh Mai wrote: > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407821557 From kvn at openjdk.org Fri Oct 11 18:34:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 11 Oct 2024 18:34:15 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 14:23:48 GMT, Tobias Hartmann wrote: >> Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. >> >> Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: >> https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java > > Co-authored-by: Christian Hagedorn Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21450#pullrequestreview-2363370518 From jrose at openjdk.org Fri Oct 11 18:50:16 2024 From: jrose at openjdk.org (John R Rose) Date: Fri, 11 Oct 2024 18:50:16 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 09:13:37 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Replace LogDecorators::Decorator::NoDecorator with LogDecorators::None For the compiler outputs which have no tags, what happens with (a) lines that begin with something like `[42] ` and (b) multi-line outputs? In both cases a log parser could (on a bad day) struggle to interpret the UL records correctly. I see that strict compatibility with existing compiler outputs can lead to additional parsing ambiguities, which will have to be dealt with at some point in the future. (Is there a leading space? I think not. So a leading `[42]` could be a problem if it crops up. Perhaps we need a targeted way to discriminate such things, such as injecting one leading space in some cases TBD.) Note that I am not advocating, here, for an immediate solution for parsing ambiguities, but I do want us to track such issues. Another side note, just FTR: There is a third issue with UL output from compilation, which is the grouping of logically connected log outputs. In the compiler logs we use XML nesting today for such logical grouping. This grouping, in addition to unambiguous delimiting of decorations, is yet another use, by compilation logs, of a basic property of XML: The syntax is not only somewhat readable, but also well defined. I suppose if XML syntax is encapsulated in UL syntax, that would provide a parseable ("tool-friendly") solution. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21383#issuecomment-2407955314 From qamai at openjdk.org Fri Oct 11 18:56:39 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 18:56:39 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: refinement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/21600d7d..85a2c266 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=00-01 Stats: 44 lines in 2 files changed: 34 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From vlivanov at openjdk.org Fri Oct 11 19:04:15 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Oct 2024 19:04:15 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 14:23:48 GMT, Tobias Hartmann wrote: >> Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. >> >> Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: >> https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java > > Co-authored-by: Christian Hagedorn Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21450#pullrequestreview-2363412880 From dlong at openjdk.org Fri Oct 11 19:12:13 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 19:12:13 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: On Thu, 10 Oct 2024 17:26:25 GMT, Igor Veresov wrote: >> `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Add comment src/hotspot/share/opto/gcm.cpp line 773: > 771: if (use_mem_state->is_Mach()) { > 772: int ideal_op = use_mem_state->as_Mach()->ideal_Opcode(); > 773: is_cache_wb = (ideal_op == Op_CacheWB || ideal_op == Op_CacheWBPostSync || ideal_op == Op_CacheWBPreSync); The match rules for CacheWBPostSync and CacheWBPreSync don't have memory operands. Is needs_anti_dependence_check() really returning true for them? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21455#discussion_r1797333904 From qamai at openjdk.org Fri Oct 11 19:14:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 11 Oct 2024 19:14:28 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v3] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: refactor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/85a2c266..b6e78eb8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=01-02 Stats: 30 lines in 1 file changed: 16 ins; 4 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From dlong at openjdk.org Fri Oct 11 20:00:17 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 20:00:17 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann wrote: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > In short: > - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. > - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. > - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. > > Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. > I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remo... Sorry, I'm still not convinced this is safe. I took another look at C1 patching, and not only are we trying to scan oops while they are being patched, we are also patching the reloc information at the same time (see the call to change_reloc_info_for_address()). So that means there is a window where the instruction is patched but the reloc information is stale. If the scope of the lock is the only issue, then we could try to address that with a finer-grained lock or even a per-nmethod lock. @tschatzl , when we call register_nmethod(), do we really need to scan the oops immediately, or could that be delayed until the next safepoint? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2408043697 From vlivanov at openjdk.org Fri Oct 11 20:31:14 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Oct 2024 20:31:14 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 23:21:03 GMT, Dean Long wrote: >>> IMO you can safely drop this particular check. The one after Dependencies::find_unique_concrete_method() should be enough to preserve the invariant (target == cha_monomorphic_target) . >> >> If I do that, then I can also revert the VM state changes. However, I wasn't able to convince myself that this check is not needed. If we end up returning root_m here as cha_monomorphic_target, it seems possible that it could be a new version of the method, and then target == cha_monomorphic_target would fail. > > C1 does call > > 2159 dependency_recorder()->assert_evol_method(inline_target); > > which in the CHA case would be `cha_monomorphic_target`, not `target`, so it looks like we may not detect if `target` is stale as long as `cha_monomorphic_target` is not. It seems like a minor loophole, but I'm not sure what kind of problems it could cause, especially if the bytecodes of `target` are not used. The scenario which concerns me is performance-related. If CHA conservatively disables inlining when concurrent class redefinition takes place during parsing, then there is no mechanism in place to recuperate possible loss of performance. Normally, if inlined method is redefined later during compilation, nmethod installation fails during dependency validation. But here no inlining happens (missed optimization opportunity) and call sites in generated code are resolved based on symbolic information (except rare cases when resolved method is attached to the call site, see `SharedRuntime::find_callee_info_helper()` for details), so there are no guarantees the stale `Method*` is recorded. I agree that the window for such sequence of events is narrow, but it may be a source of surprising performance anomalies in rare cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1797396108 From iveresov at openjdk.org Fri Oct 11 20:38:24 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 11 Oct 2024 20:38:24 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v4] In-Reply-To: References: Message-ID: > `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Address Dean's comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21455/files - new: https://git.openjdk.org/jdk/pull/21455/files/ae69ee4b..914b97ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21455&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21455/head:pull/21455 PR: https://git.openjdk.org/jdk/pull/21455 From iveresov at openjdk.org Fri Oct 11 20:38:25 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 11 Oct 2024 20:38:25 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v3] In-Reply-To: References: <4LwxITr1Kx92sHhdaCYRS_iZuxH_Um6x56TrgUfk_UY=.d87cccd8-e57c-4df0-af05-8ce67f462b1c@github.com> Message-ID: On Fri, 11 Oct 2024 19:09:06 GMT, Dean Long wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment > > src/hotspot/share/opto/gcm.cpp line 773: > >> 771: if (use_mem_state->is_Mach()) { >> 772: int ideal_op = use_mem_state->as_Mach()->ideal_Opcode(); >> 773: is_cache_wb = (ideal_op == Op_CacheWB || ideal_op == Op_CacheWBPostSync || ideal_op == Op_CacheWBPreSync); > > The match rules for CacheWBPostSync and CacheWBPreSync don't have memory operands. Is needs_anti_dependence_check() really returning true for them? Yes, you're right. I'll remove those. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21455#discussion_r1797399651 From dlong at openjdk.org Fri Oct 11 21:11:15 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 21:11:15 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 20:28:31 GMT, Vladimir Ivanov wrote: >> C1 does call >> >> 2159 dependency_recorder()->assert_evol_method(inline_target); >> >> which in the CHA case would be `cha_monomorphic_target`, not `target`, so it looks like we may not detect if `target` is stale as long as `cha_monomorphic_target` is not. It seems like a minor loophole, but I'm not sure what kind of problems it could cause, especially if the bytecodes of `target` are not used. > > The scenario which concerns me is performance-related. If CHA conservatively disables inlining when concurrent class redefinition takes place during parsing, then there is no mechanism in place to recuperate possible loss of performance. Normally, if inlined method is redefined later during compilation, nmethod installation fails during dependency validation. But here no inlining happens (missed optimization opportunity) and call sites in generated code are resolved based on symbolic information (except rare cases when resolved method is attached to the call site, see `SharedRuntime::find_callee_info_helper()` for details), so there are no guarantees the stale `Method*` is recorded. > > I agree that the window for such sequence of events is narrow, but it may be a source of surprising performance anomalies in rare cases. OK, it sounds like we have two choices. Either record an evol dependency every time we short-circuit an optimization based on is_old, or bail out. I vote for bailing out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1797425086 From kbarrett at openjdk.org Fri Oct 11 21:15:27 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 11 Oct 2024 21:15:27 GMT Subject: RFR: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB [v2] In-Reply-To: <_UfXDCsUh7XlJS1APDR6uEdAFZyDktB56D3l5idS0OA=.f6ed2d6d-251d-4237-ad0d-3dd8298e538b@github.com> References: <69yscci9MIFeBcB0i9TAluXgucCy1EgzX4DScFxjPbc=.c28f7b1a-0b03-4677-83e2-95478a72f396@github.com> <_UfXDCsUh7XlJS1APDR6uEdAFZyDktB56D3l5idS0OA=.f6ed2d6d-251d-4237-ad0d-3dd8298e538b@github.com> Message-ID: On Fri, 4 Oct 2024 16:03:36 GMT, Vladimir Kozlov wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> remove surrounding whitespace > > Side note: please enable GHA testing for your repo. Thanks for reviews, @vnkozlov and @dean-long ------------- PR Comment: https://git.openjdk.org/jdk/pull/21324#issuecomment-2408124892 From kbarrett at openjdk.org Fri Oct 11 21:15:27 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 11 Oct 2024 21:15:27 GMT Subject: Integrated: 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB In-Reply-To: References: Message-ID: On Thu, 3 Oct 2024 12:50:55 GMT, Kim Barrett wrote: > Please review this change to TypeRawPtr::add_offset to prevent a compiler from > inferring things based on prior pointer arithmetic not invoking UB. As noted in > the bug report, clang is actually doing this. > > To accomplish this, changed to integral arithmetic. Also added over/underflow > checks. > > Also made a couple of minor touchups. Replaced an implicit conversion to bool > with an explicit compare to nullptr (per style guide). Removed a no longer > needed dummy return after a (now) noreturn function. > > Testing: mach5 tier1-7 > That testing was with calls to "fatal" for the over/underflow cases and the > sum==0 case. There were no hits. I'm not sure how to construct a test that > would hit those. This pull request has now been integrated. Changeset: 0a57fe1d Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/0a57fe1df6f3431cfb2d5d868597c61ef6af3806 Stats: 15 lines in 1 file changed: 8 ins; 2 del; 5 mod 8341178: TypeRawPtr::add_offset may be "miscompiled" due to UB Reviewed-by: dlong, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21324 From kvn at openjdk.org Fri Oct 11 21:19:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 11 Oct 2024 21:19:13 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v5] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 01:37:53 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > redo VM state Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21148#pullrequestreview-2363574423 From kvn at openjdk.org Fri Oct 11 21:19:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 11 Oct 2024 21:19:14 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 21:08:56 GMT, Dean Long wrote: >> The scenario which concerns me is performance-related. If CHA conservatively disables inlining when concurrent class redefinition takes place during parsing, then there is no mechanism in place to recuperate possible loss of performance. Normally, if inlined method is redefined later during compilation, nmethod installation fails during dependency validation. But here no inlining happens (missed optimization opportunity) and call sites in generated code are resolved based on symbolic information (except rare cases when resolved method is attached to the call site, see `SharedRuntime::find_callee_info_helper()` for details), so there are no guarantees the stale `Method*` is recorded. >> >> I agree that the window for such sequence of events is narrow, but it may be a source of surprising performance anomalies in rare cases. > > OK, it sounds like we have two choices. Either record an evol dependency every time we short-circuit an optimization based on is_old, or bail out. I vote for bailing out. I vote for bail out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1797429171 From dlong at openjdk.org Fri Oct 11 21:34:17 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 11 Oct 2024 21:34:17 GMT Subject: RFR: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" [v4] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 20:38:24 GMT, Igor Veresov wrote: >> `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Address Dean's comment Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21455#pullrequestreview-2363588573 From vlivanov at openjdk.org Fri Oct 11 22:20:20 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 11 Oct 2024 22:20:20 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v3] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 21:15:21 GMT, Vladimir Kozlov wrote: >> OK, it sounds like we have two choices. Either record an evol dependency every time we short-circuit an optimization based on is_old, or bail out. I vote for bailing out. > > I vote for bail out. I prefer bailing out as well, but, please, check it doesn't mark the root method as non-compilable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21148#discussion_r1797464343 From sviswanathan at openjdk.org Fri Oct 11 23:35:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 11 Oct 2024 23:35:06 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 Message-ID: When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. Also a regression test case is added accordingly. Best Regards, Sandhya ------------- Commit messages: - 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 Changes: https://git.openjdk.org/jdk/pull/21480/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338126 Stats: 20 lines in 2 files changed: 19 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21480/head:pull/21480 PR: https://git.openjdk.org/jdk/pull/21480 From qamai at openjdk.org Sat Oct 12 10:30:51 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 12 Oct 2024 10:30:51 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v4] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. My proposal is that if a node is not reassigned inside a loop, and will be spilt there, we spill it eagerly at the loop entry instead. This can lead to more reload inside the loop, but as the loop-carried dependencies are eliminated, a load is negligible. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add LoopAwaredSpilling flag, refine implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/b6e78eb8..5f572bbb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=02-03 Stats: 167 lines in 3 files changed: 97 ins; 6 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From qamai at openjdk.org Sat Oct 12 10:52:50 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 12 Oct 2024 10:52:50 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v5] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. > > My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. > > A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. > > - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. > - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. > - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/5f572bbb..74fbc7d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=03-04 Stats: 22 lines in 1 file changed: 19 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From qamai at openjdk.org Sat Oct 12 10:55:10 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 12 Oct 2024 10:55:10 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v4] In-Reply-To: References: Message-ID: On Sat, 12 Oct 2024 10:30:51 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add LoopAwaredSpilling flag, refine implementation New benchmark results: Before After Benchmark (prob) Mode Cnt Score Error Score Error Units LoopCounterBench.field_ret N/A avgt 5 425.678 ? 5.086 419.819 ? 1.965 ns/op LoopCounterBench.localVar_ret N/A avgt 5 1126.937 ? 1.078 325.651 ? 5.309 ns/op LoopCounterBench.reloadAtEntry_ret N/A avgt 5 582.465 ? 2.649 491.421 ? 0.909 ns/op LoopCounterBench.spillUncommon_ret 0.0 avgt 5 490.901 ? 5.505 490.981 ? 2.118 ns/op LoopCounterBench.spillUncommon_ret 0.01 avgt 5 2491.557 ? 4.837 1912.170 ? 19.208 ns/op LoopCounterBench.spillUncommon_ret 0.1 avgt 5 21316.571 ? 88.198 10518.618 ? 183.380 ns/op LoopCounterBench.spillUncommon_ret 0.2 avgt 5 42095.064 ? 210.995 19908.240 ? 313.108 ns/op LoopCounterBench.spillUncommon_ret 0.5 avgt 5 113825.492 ? 1637.428 48194.341 ? 719.049 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2408520138 From qamai at openjdk.org Sun Oct 13 07:03:04 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 13 Oct 2024 07:03:04 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. > > My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. > > A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. > > - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. > - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. > - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: refine comments + typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/74fbc7d5..12d1a2b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=04-05 Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From shade at openjdk.org Sun Oct 13 08:04:15 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Sun, 13 Oct 2024 08:04:15 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: On Sun, 13 Oct 2024 07:03:04 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > refine comments + typo This was really found by @rschwietzke, maybe he would like to test it :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2408872608 From jbhateja at openjdk.org Sun Oct 13 09:57:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 13 Oct 2024 09:57:00 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v24] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update adlc changes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/ce76c3e5..506ae299 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=22-23 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Sun Oct 13 11:18:01 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 13 Oct 2024 11:18:01 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Updating tests to use floorMod ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/1cca8e24..79ee29c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=15-16 Stats: 31 lines in 31 files changed: 0 ins; 0 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Sun Oct 13 17:12:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 13 Oct 2024 17:12:11 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v4] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 18:52:30 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated code as per review comments Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20633#pullrequestreview-2364957828 From jbhateja at openjdk.org Sun Oct 13 17:12:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 13 Oct 2024 17:12:12 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v3] In-Reply-To: <0EVgFAWZ9O9e_dlRj4Na8Xf7QEC6FGU03wAs1bwVq5c=.ff19b62d-3b2c-4320-8be2-c7dee7cafdae@github.com> References: <0EVgFAWZ9O9e_dlRj4Na8Xf7QEC6FGU03wAs1bwVq5c=.ff19b62d-3b2c-4320-8be2-c7dee7cafdae@github.com> Message-ID: On Thu, 10 Oct 2024 23:13:11 GMT, Srinivas Vamsi Parasa wrote: >> We can do this change in a separate PR. > > I agree with Smita. The current implementation has a one-to-one correspondence with the ipsec implementation. Any new changes or refactoring could be implemented as a separate PR. I agree, in principle, any optimization crafted to AVX2 is also applicable to AVX512 target, in future with AVX10.2 (converged ISA) we will have a 256bits flavors of two table permute for non-AVX512 targets, for now AVX-SHA512 is only available on client parts (upcoming Lunar lake) and its ok to follow the IPsec algorithm in toto. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20633#discussion_r1798463742 From thartmann at openjdk.org Mon Oct 14 05:30:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 05:30:23 GMT Subject: RFR: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 14:23:48 GMT, Tobias Hartmann wrote: >> Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. >> >> Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: >> https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/c2/TestCallDevirtualizationWithInfiniteLoop.java > > Co-authored-by: Christian Hagedorn Thanks for the reviews, Vladimir and Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21450#issuecomment-2409974472 From thartmann at openjdk.org Mon Oct 14 05:30:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 05:30:24 GMT Subject: Integrated: 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 13:29:11 GMT, Tobias Hartmann wrote: > Post-parse call devirtualization asserts when calling `CallNode::extract_projections` on a virtual call that does not have the `fallthrough_ioproj` anymore. The projection was removed because the call is followed by an endless loop that does not have any IO uses. > > Similar to incremental inlining, we should not assert that all call projections are still there for post-parse call devirtualization because parts of the graph might have been removed already: > https://github.com/openjdk/jdk/blob/580eb62dc097efeb51c76b095c1404106859b673/src/hotspot/share/opto/callnode.cpp#L963-L965 > > Thanks, > Tobias This pull request has now been integrated. Changeset: 8d0975a2 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/8d0975a27d826f7aa487a612131827586abaefd5 Stats: 85 lines in 4 files changed: 79 ins; 0 del; 6 mod 8336726: C2: assert(!do_asserts || projs->fallthrough_ioproj != nullptr) failed: must be found Reviewed-by: chagedorn, kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/21450 From duke at openjdk.org Mon Oct 14 06:41:12 2024 From: duke at openjdk.org (Rene Schwietzke) Date: Mon, 14 Oct 2024 06:41:12 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v4] In-Reply-To: References: Message-ID: On Sat, 12 Oct 2024 10:52:11 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add LoopAwaredSpilling flag, refine implementation > > New benchmark results: > > Before After > Benchmark (prob) Mode Cnt Score Error Score Error Units > LoopCounterBench.field_ret N/A avgt 5 425.678 ? 5.086 419.819 ? 1.965 ns/op > LoopCounterBench.localVar_ret N/A avgt 5 1126.937 ? 1.078 325.651 ? 5.309 ns/op > LoopCounterBench.reloadAtEntry_ret N/A avgt 5 582.465 ? 2.649 491.421 ? 0.909 ns/op > LoopCounterBench.spillUncommon_ret 0.0 avgt 5 490.901 ? 5.505 490.981 ? 2.118 ns/op > LoopCounterBench.spillUncommon_ret 0.01 avgt 5 2491.557 ? 4.837 1912.170 ? 19.208 ns/op > LoopCounterBench.spillUncommon_ret 0.1 avgt 5 21316.571 ? 88.198 10518.618 ? 183.380 ns/op > LoopCounterBench.spillUncommon_ret 0.2 avgt 5 42095.064 ? 210.995 19908.240 ? 313.108 ns/op > LoopCounterBench.spillUncommon_ret 0.5 avgt 5 113825.492 ? 1637.428 48194.341 ? 719.049 ns/op Sure thing, I will give it a try in the coming days. @merykitty 's results look promising. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2409100311 From epeter at openjdk.org Mon Oct 14 07:50:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Oct 2024 07:50:37 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v22] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Add example where I use the framework with VM flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/ad3865bb..2d4a8ff0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=20-21 Stats: 132 lines in 2 files changed: 132 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Mon Oct 14 08:36:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Oct 2024 08:36:12 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v23] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 76 additional commits since the last revision: - Merge branch 'master' into fuzzer-test - test refactoring - Add example where I use the framework with VM flags - Apply suggestions from code review Co-authored-by: Christian Hagedorn - move some code for Christian - more for Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - another small suggestion from Christian - more fixup for Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - ... and 66 more: https://git.openjdk.org/jdk/compare/3e81a0a4...5178e7c2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/2d4a8ff0..5178e7c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=21-22 Stats: 211263 lines in 1584 files changed: 195772 ins; 7833 del; 7658 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From thartmann at openjdk.org Mon Oct 14 08:54:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 08:54:28 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type Message-ID: After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 -> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. Thanks, Tobias ------------- Commit messages: - First prototype Changes: https://git.openjdk.org/jdk/pull/21470/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339694 Stats: 111 lines in 4 files changed: 108 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21470/head:pull/21470 PR: https://git.openjdk.org/jdk/pull/21470 From duke at openjdk.org Mon Oct 14 08:59:18 2024 From: duke at openjdk.org (Rene Schwietzke) Date: Mon, 14 Oct 2024 08:59:18 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: <3rG5xHJ4ikWxAeo6XP_XCbrUCuRa9M8KDUpU7L1iEOU=.fb54b7ad-056a-4f06-b034-1d3123a44db7@github.com> On Sun, 13 Oct 2024 07:03:04 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > refine comments + typo Fix confirmed. Performance matches the user expectation when pulling data local. I will look into the runtime difference for the plain loop and systemcopy. ### Old - JDK 23.0.0 Benchmark (SIZE) Mode Cnt Score Error Units Example8ArrayCopying.manualCopy1 1000 avgt 10 70.222 ? 3.549 ns/op Example8ArrayCopying.manualCopy2 1000 avgt 10 70.011 ? 0.880 ns/op Example8ArrayCopying.manualCopyAntiUnroll1 1000 avgt 10 394.275 ? 20.067 ns/op Example8ArrayCopying.manualCopyAntiUnroll2 1000 avgt 10 636.158 ? 101.505 ns/op Example8ArrayCopying.manualCopyAntiUnroll3 1000 avgt 10 1646.330 ? 23.042 ns/op Example8ArrayCopying.systemCopy 1000 avgt 10 74.845 ? 1.535 ns/op ### New - JDK 24-internal (merrykitty/improveregalloc, 12d1a2b21fc62145dac04fecf43f267f539b2aa5) Example8ArrayCopying.manualCopy1 1000 avgt 10 80.155 ? 4.504 ns/op Example8ArrayCopying.manualCopy2 1000 avgt 10 81.122 ? 3.074 ns/op Example8ArrayCopying.manualCopyAntiUnroll1 1000 avgt 10 394.094 ? 6.809 ns/op Example8ArrayCopying.manualCopyAntiUnroll2 1000 avgt 10 626.155 ? 13.055 ns/op Example8ArrayCopying.manualCopyAntiUnroll3 1000 avgt 10 564.199 ? 23.854 ns/op Example8ArrayCopying.systemCopy 1000 avgt 10 99.393 ? 0.634 ns/op Source code for reference: https://github.com/Xceptance/jmh-training/blob/1dbcc9c38553b0e8b683c6f70475a25150b66635/src/main/java/org/xc/jmh/Example8ArrayCopying.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2410501449 From amitkumar at openjdk.org Mon Oct 14 09:03:10 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 14 Oct 2024 09:03:10 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double In-Reply-To: References: Message-ID: <1pH2MHT0z7llvLP9DFnu1H9V1YKdEHOfDvksC1nEhVk=.800bafbe-598d-47e7-b047-ad4cab5d73e5@github.com> On Fri, 4 Oct 2024 10:39:25 GMT, Amit Kumar wrote: > This is trivial PR to change data type of some "*Threshold" variables form `intx` to `double`. Hi, Can I get reviews for this trivial change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2410511223 From thartmann at openjdk.org Mon Oct 14 09:31:33 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 09:31:33 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v2] In-Reply-To: References: Message-ID: > After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 > > Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 > > -> > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 > > The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. > > Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: > https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 > > I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Missed a return ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21470/files - new: https://git.openjdk.org/jdk/pull/21470/files/94259abe..0e9e0219 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21470/head:pull/21470 PR: https://git.openjdk.org/jdk/pull/21470 From thartmann at openjdk.org Mon Oct 14 10:46:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 10:46:50 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: > After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 > > Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: > > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 > > -> > https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 > > The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. > > Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: > https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 > > I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Modified ciTypeFlow::can_trap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21470/files - new: https://git.openjdk.org/jdk/pull/21470/files/0e9e0219..4a48a793 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21470&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21470.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21470/head:pull/21470 PR: https://git.openjdk.org/jdk/pull/21470 From chagedorn at openjdk.org Mon Oct 14 11:06:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 14 Oct 2024 11:06:30 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v23] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 08:36:12 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 76 additional commits since the last revision: > > - Merge branch 'master' into fuzzer-test > - test refactoring > - Add example where I use the framework with VM flags > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - move some code for Christian > - more for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - another small suggestion from Christian > - more fixup for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - ... and 66 more: https://git.openjdk.org/jdk/compare/bcd1673b...5178e7c2 Nice that you added an additional test. Still looks good. test/hotspot/jtreg/compiler/lib/compile_framework/README.md line 51: > 49: Should one require the modified classpath that includes the compiled classes, this is available with `compileFramework.getEscapedClassPathOfCompiledClasses()`. This can be necessary if the test launches any other VMs that also access the compiled classes. This is for example necessary when using the IR Framework. > 50: > 51: ### Running the compiled code in a new VM Following the capital letter style from the other titles: Suggestion: ### Running the Compiled Code in a New VM test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/RunWithFlagsExample.java line 66: > 64: CompileFramework comp = new CompileFramework(); > 65: > 66: // Add a java source file. Suggestion: // Add a Java source file. test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/RunWithFlagsExample.java line 78: > 76: comp.getEscapedClassPathOfCompiledClasses(), > 77: // Pass additional flags here. > 78: // "-Xbatch" is a harmless VM flag, so this example runs everywhere without issue. Suggestion: // "-Xbatch" is a harmless VM flag, so this example runs everywhere without issues. test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/RunWithFlagsExample.java line 87: > 85: > 86: // Execute the command, and capture the output. > 87: // The JTREG VM options are automatically passed to the test VM. Suggestion: // The JTREG Java and VM options are automatically passed to the test VM. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20184#pullrequestreview-2366239197 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1799268346 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1799272237 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1799272889 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1799274775 From duke at openjdk.org Mon Oct 14 11:25:20 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Mon, 14 Oct 2024 11:25:20 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 09:13:37 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Replace LogDecorators::Decorator::NoDecorator with LogDecorators::None Right now, no space is being added at the beginning. I agree that it should be important to at least distinguish decorators from the message itself and I hope to address that, alongside multiline/grouping, in a future PR (adding a space between them seems an easy and sensible choice IMO, and in the case for no decorators we would just have a starting space which does not affect human readability). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21383#issuecomment-2410918753 From jbhateja at openjdk.org Mon Oct 14 12:15:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 14 Oct 2024 12:15:11 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Fri, 11 Oct 2024 17:12:49 GMT, Jasmine Karthikeyan wrote: > > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` > > I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. Hey @jaskarth , @merykitty , we already have an infrastructure where during parsing we create Macro Nodes which can be lowered / expanded to multiple IRs nodes during macro expansion, what we need in this case is a target specific IR pattern check since not all targets may support 32x32 multiplication with quadword saturation, idea is to avoid creating a new IR and piggyback needed on existing MulVL IR, we already use such tricks for relaxed unsafe reductions. Patch is performing point optimization for specific set of constrained multiplication patterns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411053693 From thartmann at openjdk.org Mon Oct 14 12:18:11 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Oct 2024 12:18:11 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 23:27:35 GMT, Sandhya Viswanathan wrote: > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya That looks good to me. @eme64 should have a look as well. I submitted testing and will report back once it passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21480#pullrequestreview-2366448884 From epeter at openjdk.org Mon Oct 14 12:21:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Oct 2024 12:21:16 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 23:27:35 GMT, Sandhya Viswanathan wrote: > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java line 76: > 74: sout[i+1] = Float.floatToFloat16(finp[i+1]); > 75: } > 76: } Your test looks different than the one that I added on JIRA. Can you please add that one as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1799399301 From epeter at openjdk.org Mon Oct 14 12:26:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Oct 2024 12:26:14 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 23:27:35 GMT, Sandhya Viswanathan wrote: > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya src/hotspot/cpu/x86/x86.ad line 3679: > 3677: > 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{ > 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); Ok, and so what alternative path is the matcher now going to take if we have `vector_length = 8`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1799405906 From epeter at openjdk.org Mon Oct 14 12:33:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Oct 2024 12:33:44 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v24] In-Reply-To: References: Message-ID: > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/lib/compile_framework/README.md Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/5178e7c2..4eeab363 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=22-23 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From qamai at openjdk.org Mon Oct 14 13:45:23 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 14 Oct 2024 13:45:23 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: On Sun, 13 Oct 2024 07:03:04 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > refine comments + typo Thanks for the source code. That's really interesting, running the benchmark multiple times may give different results, and even when there is a difference in the observed throughputs, the 2 compiled methods are exactly the same. So I think we are running into different quirks here, probably due to the fact that this benchmark saturates the memory bandwidth. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2411312474 From qamai at openjdk.org Mon Oct 14 14:14:13 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 14 Oct 2024 14:14:13 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: <2g_Hm5UuVBqoklekkaxtnYn05JYKmosnzaMefQi_q3s=.aea039bb-d80c-4863-986b-d73d7cf71fcc@github.com> On Mon, 14 Oct 2024 12:12:58 GMT, Jatin Bhateja wrote: >>> I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` >> >> I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. > >> > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` >> >> I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. > > Hey @jaskarth , @merykitty , we already have an infrastructure where during parsing we create Macro Nodes which can be lowered / expanded to multiple IRs nodes during macro expansion, what we need in this case is a target specific IR pattern check since not all targets may support 32x32 multiplication with quadword saturation, idea is to avoid creating a new IR and piggyback needed information on existing MulVL IR, we already use such tricks for relaxed unsafe reductions. Going forward, infusion of KnownBits into our data flow analysis infrastructure will streamline such optimizations, this patch is performing point optimization for specific set of constrained multiplication patterns. @jatin-bhateja That is machine-independent lowering, we are talking about machine-dependent lowering to which `MacroLogicV` transformation belongs. You can have `phaselowering_x86` and not have to add another method to `Matcher` as well as add default implementations to various architecture files. You can reuse `MulVL` node for that but I believe these transformations should be done as late as possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411389030 From qamai at openjdk.org Mon Oct 14 14:17:09 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 14 Oct 2024 14:17:09 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v7] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. > > My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. > > A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. > > - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. > - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. > - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix uncommon_freq ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21472/files - new: https://git.openjdk.org/jdk/pull/21472/files/12d1a2b2..1d36cb4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21472&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21472/head:pull/21472 PR: https://git.openjdk.org/jdk/pull/21472 From jkarthikeyan at openjdk.org Mon Oct 14 15:07:21 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 14 Oct 2024 15:07:21 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction For the record I think in this PR we could simply match the IR patterns in the ad file, since (from my understanding) the patterns we are matching could be supported there. We should do platform-specific lowering in a separate patch because it is pretty nuanced, and we could potentially move it to the new system afterwards. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411538179 From psandoz at openjdk.org Mon Oct 14 15:37:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 14 Oct 2024 15:37:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sun, 13 Oct 2024 11:18:01 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating tests to use floorMod Marked as reviewed by psandoz (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2367019017 From iveresov at openjdk.org Mon Oct 14 16:48:27 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Mon, 14 Oct 2024 16:48:27 GMT Subject: Integrated: 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" In-Reply-To: References: Message-ID: <1Oi70hebODn90MIbP6HFaaHObA0zX57DCgsh-4LnJK8=.400ceef1-812d-4cea-8f75-50f3d36a210c@github.com> On Thu, 10 Oct 2024 15:22:20 GMT, Igor Veresov wrote: > `CacheWB` nodes are peculiar in a sense that they both are anti-dependent and produce memory. I think it's reasonable to relax the assert in `insert_anti_dependences()` to work around their properties. This pull request has now been integrated. Changeset: a8a8b2de Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/a8a8b2deba854ac105ed760c09e65701c4d0f6fc Stats: 13 lines in 2 files changed: 10 ins; 2 del; 1 mod 8341831: PhaseCFG::insert_anti_dependences asserts with "no loads" Reviewed-by: dlong, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21455 From aboldtch at openjdk.org Mon Oct 14 17:36:13 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 14 Oct 2024 17:36:13 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 09:13:37 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Replace LogDecorators::Decorator::NoDecorator with LogDecorators::None I have a few smaller nits, and one larger issue. The rest of the implementation and logic looks fine. src/hotspot/share/logging/logDecorators.cpp line 30: > 28: const LogLevelType AnyLevel = LogLevelType::NotMentioned; > 29: #define UNDECORATED_DEFAULTS \ > 30: UNDECORATED_DEFAULT(AnyLevel, LOG_TAGS(jit, inlining)) Maybe move this down to next where it is used and then `#undef UNDECORATED_DEFAULTS` src/hotspot/share/logging/logDecorators.cpp line 55: > 53: #define UNDECORATED_DEFAULT(level, ...) LogDecorators::DefaultUndecoratedSelection(level, __VA_ARGS__), > 54: UNDECORATED_DEFAULTS > 55: #undef UNDECORATED_TAGSET Suggestion: #undef UNDECORATED_DEFAULT Typo, I think this was ment to match with the `#define`. src/hotspot/share/logging/logDecorators.cpp line 57: > 55: #undef UNDECORATED_TAGSET > 56: }; > 57: const size_t LogDecorators::number_of_default_decorators = sizeof(default_decorators) / sizeof(LogDecorators::DefaultUndecoratedSelection); I think this reads better and is less error prone. Suggestion: const size_t LogDecorators::number_of_default_decorators = ARRAY_SIZE(default_decorators); src/hotspot/share/logging/logDecorators.hpp line 142: > 140: // Check if we have some default decorators for a given LogSelection. If that is the case, > 141: // the output parameter mask will contain the defaults-specified decorators mask > 142: static bool has_disabled_default_decorators(const LogSelection& selection, const DefaultUndecoratedSelection* defaults = default_decorators, size_t defaults_count = number_of_default_decorators); I was trying to think if we could make this mockable without the incomplete object types (`const DefaultUndecoratedSelection* defaults = default_decorators, size_t defaults_count = number_of_default_decorators`). Maybe have the mockable part private (we already friend the gtest) and only have a `static bool has_disabled_default_decorators(const LogSelection& selection)` public (which calls this on the inside). But I am fine with this as it currently is. ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2367213509 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1799860489 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1799862505 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1799845556 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1799869065 From aboldtch at openjdk.org Mon Oct 14 17:39:12 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 14 Oct 2024 17:39:12 GMT Subject: RFR: 8341622: Tag-specific disabled default decorators for UnifiedLogging [v6] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 09:13:37 GMT, Ant?n Seoane wrote: >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. However, some specific logging cases do not need decorations, and manually having to disable them results in cumbersome extra input and loss of ergonomics. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific disabling of default decorators to UL. These disables are in no way overriding user input -- they will only act whenever -Xlog has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on an inclusion rule: e.g. if -Xlog:jit+compilation is provided, a default for jit may be applied. Additionally, defaults may target a specific log level. >> >> The original use case for this is related to C2 logging migration to UnifiedLogging, as currently no decorators are found in compiler logs and it would be expected to stay the same without the extra explicit removal every time via -Xlog. However, this would ease the migration of other logging that was initially deterred by this, such as -XX:+PrintInterpreter. >> >> This PR is a simplification of the [8340363](https://bugs.openjdk.org/browse/JDK-8340363) (closed) ticket. > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Replace LogDecorators::Decorator::NoDecorator with LogDecorators::None Seems like on of my comments (the large one) got lost. Trying this again. :) src/hotspot/share/logging/logDecorators.hpp line 96: > 94: > 95: const LogSelection& selection() const { return _selection; } > 96: }; I am uncomfortable with this type erasure. `LogTagType[LogTag::MaxTags + 1 /* = 6 */]` -> `LogTagType*` -> `LogTagType[LogTag::MaxTags /* = 5 */]`. I think this should be rewritten so that `tag_arr` is typed as a `LogTagType[5]`. I think everywhere we have a `const LogTagType parameter[LogTag::MaxTags]` really should have been `const LogTagType (¶meter)[LogTag::MaxTags]` so that this would have been a compile error. My suggestion is to either do the following: Suggestion: public: DefaultUndecoratedSelection(LogLevelType level, LogTagType t0, LogTagType t1 = LogTag::__NO_TAG, LogTagType t2 = LogTag::__NO_TAG, LogTagType t3 = LogTag::__NO_TAG, LogTagType t4 = LogTag::__NO_TAG, LogTagType guard_tag = LogTag::__NO_TAG) : _selection(LogSelection::Invalid) { assert(guard_tag == LogTag::__NO_TAG, "Too many tags specified!"); LogTagType tag_arr[LogTag::MaxTags] = { t0, t1, t2, t3, t4 }; _selection = LogSelection(tag_arr, false, level); } const LogSelection& selection() const { return _selection; } }; or maybe even better, do what we do for the `LogTagSet` and have a static helper and a private constructor, so that we can turn all the asserts into compile errors. Something like: Suggestion: DefaultUndecoratedSelection(LogLevelType level, LogTagType t0, LogTagType t1, LogTagType t2, LogTagType t3, LogTagType t4) : _selection(LogSelection::Invalid) { LogTagType tag_arr[LogTag::MaxTags] = { t0, t1, t2, t3, t4 }; _selection = LogSelection(tag_arr, false, level); } public: template static DefaultUndecoratedSelection make() { STATIC_ASSERT(GuardTag == LogTag::__NO_TAG); return DefaultUndecoratedSelection(Level, T0, T1, T2, T3, T4); } const LogSelection& selection() const { return _selection; } }; And we can then use `LogDecorators::DefaultUndecoratedSelection::make()` to create them. ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21383#pullrequestreview-2367260954 PR Review Comment: https://git.openjdk.org/jdk/pull/21383#discussion_r1799872623 From jbhateja at openjdk.org Mon Oct 14 17:50:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 14 Oct 2024 17:50:14 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Mon, 14 Oct 2024 15:04:54 GMT, Jasmine Karthikeyan wrote: > For the record I think in this PR we could simply match the IR patterns in the ad file, since (from my understanding) the patterns we are matching could be supported there. We should do platform-specific lowering in a separate patch because it is pretty nuanced, and we could potentially move it to the new system afterwards. Hi @jaskarth , Bigger pattern matching is sensitive to [IR level node sharing](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.cpp#L1724), thus it may not full proof for above 4 patterns. Current patch takes care of this limitation. > @jatin-bhateja That is machine-independent lowering, we are talking about machine-dependent lowering to which `MacroLogicV` transformation belongs. You can have `phaselowering_x86` and not have to add another method to `Matcher` as well as add default implementations to various architecture files. You can reuse `MulVL` node for that but I believe these transformations should be done as late as possible. Hi @merykitty, I see some scope of refactoring and carving out a separate target specific lowering pass going forward, I have brough this up in past too. Existing optimizations are in line with current infrastructure and guards target specific optimizations with target specific match_rule_supported checks e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L2898. As @jaskarth suggests we can pick this up going forward. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411884206 From duke at openjdk.org Mon Oct 14 17:56:28 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 14 Oct 2024 17:56:28 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn Message-ID: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways ------------- Commit messages: - Add regression test - Remove unnecessary use of rscratch2 - 8335662: [AArch64] C2: guarantee(val < (1ULL << nbits)) failed: Field too big for insn Changes: https://git.openjdk.org/jdk/pull/21473/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21473&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335662 Stats: 46 lines in 3 files changed: 43 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21473.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21473/head:pull/21473 PR: https://git.openjdk.org/jdk/pull/21473 From aph at openjdk.org Mon Oct 14 17:56:28 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 14 Oct 2024 17:56:28 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: On Fri, 11 Oct 2024 16:51:16 GMT, Chad Rakoczy wrote: > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways Thanks. Fix looks reasonable, but i think we need a regression test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21473#issuecomment-2408568858 From duke at openjdk.org Mon Oct 14 17:56:28 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Mon, 14 Oct 2024 17:56:28 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: <1hTqbs0Xtv8J3MbMfHiGxmktyEjwnJ49jK20ojCc27I=.1994823d-94f9-4ae9-96e8-3c527be72825@github.com> On Fri, 11 Oct 2024 16:51:16 GMT, Chad Rakoczy wrote: > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways Yeah working on adding a regression test ------------- PR Comment: https://git.openjdk.org/jdk/pull/21473#issuecomment-2411744674 From sviswanathan at openjdk.org Mon Oct 14 18:38:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 14 Oct 2024 18:38:12 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 12:23:25 GMT, Emanuel Peter wrote: >> When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. >> Also a regression test case is added accordingly. >> >> Best Regards, >> Sandhya > > src/hotspot/cpu/x86/x86.ad line 3679: > >> 3677: >> 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{ >> 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); > > Ok, and so what alternative path is the matcher now going to take if we have `vector_length = 8`? @eme64 It is now going to do the load from memory in register and use the register/register version for the conversion and then use the store to memory. Please see before and after code snippets below. Generated code snippet for 2 element float vector to float16 vector conversion Before: vmovq 0x10(%rdx,%rbx,4),%xmm2 ; load 8 bytes from memory into xmm2 (correct) vcvtps2ph $0x4,%xmm2,0x10(%rsi,%rbx,2) ; convert to float16 and store 8 bytes to memory (incorrect) After: vmovq 0x10(%rdx,%rbx,4),%xmm15 ; load 8 bytes from memory into xmm15 (correct) vcvtps2ph $0x4,%xmm15,%xmm0 ; convert to float16 into register (correct) vmovd %xmm0,0x10(%rsi,%rbx,2) ; store 4 byte into memory (correct) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1799938212 From kvn at openjdk.org Mon Oct 14 19:32:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 14 Oct 2024 19:32:14 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 10:46:50 GMT, Tobias Hartmann wrote: >> After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 >> >> Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 >> >> -> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 >> >> The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. >> >> Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: >> https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 >> >> I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Modified ciTypeFlow::can_trap src/hotspot/share/ci/ciTypeFlow.cpp line 2220: > 2218: case Bytecodes::_ldc_w: > 2219: case Bytecodes::_ldc2_w: > 2220: return str.is_in_error() || !str.get_constant().is_loaded(); There is also `con.is_valid()` check in `do_ldc()`. But I do know what memory is referenced in "OutOfMemoryError in the CI while loading a String constant" when it is invalid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21470#discussion_r1799984194 From kvn at openjdk.org Mon Oct 14 19:42:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 14 Oct 2024 19:42:13 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 10:39:25 GMT, Amit Kumar wrote: > This is trivial PR to change data type of some "*Threshold" variables form `intx` to `double`. Any reasons `Tier2*Threshold` flags were bit changed? For consistency. ------------- PR Review: https://git.openjdk.org/jdk/pull/21354#pullrequestreview-2367471555 From svkamath at openjdk.org Mon Oct 14 20:54:12 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 14 Oct 2024 20:54:12 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI [v4] In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 18:52:30 GMT, Smita Kamath wrote: >> Hi, I want to submit an optimization for SHA-512 algorithm using SHA instructions (sha512msg1, sha512msg2 and sha512rnds2) . Kindly review the code and provide feedback. Thank you. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated code as per review comments @ascarpino, I have approvals for this PR. Would it be possible for you to run tests and let me know the results? I appreciate your help. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20633#issuecomment-2412323595 From vlivanov at openjdk.org Mon Oct 14 21:19:12 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 14 Oct 2024 21:19:12 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 10:46:50 GMT, Tobias Hartmann wrote: >> After [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473), the CI will represent an unresolved constant dynamic as unloaded `ciConstant` of a `java/lang/Object` `ciInstanceKlass`: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciEnv.cpp#L721-L723 >> >> Now with a constant dynamic of array type, we hit an assert in type flow analysis on `*astore` because the type is not a primitive array type: >> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.cpp#L967-L971 >> >> -> >> https://github.com/openjdk/jdk/blob/6133866150cf6131ab578f1537f84c239703fa67/src/hotspot/share/ci/ciTypeFlow.hpp#L343-L346 >> >> The same problem exists with object array types. New `TestUnresolvedConstantDynamic` triggers both. >> >> Similar to `unloaded_ciinstance` used by [JDK-8280473](https://bugs.openjdk.org/browse/JDK-8280473) (see [here](https://github.com/openjdk/jdk/commit/88fc3bfdff7f89a02fcfb16909df144e6173c658#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450R784)), we could now add an unloaded `ciObjArray` and `ciTypeArray` to represent an unresolved dynamic constant of array type. We would then add a trap when parsing the array access. However, this requires rather invasive changes for this edge case and it wouldn't make much sense because even though type flow analysis would continue, we would add a trap when parsing `_ldc` anyway if the constant is not loaded: >> https://github.com/openjdk/jdk/blob/037f11b864734734dd7fbce029b2e8b4bc17f3ab/src/hotspot/share/opto/parse2.cpp#L1962-L1979 >> >> I propose to do the same in type flow analysis, i.e., trap early in `_ldc` when the constant is not loaded. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Modified ciTypeFlow::can_trap Proposed fix is broader than strictly needed to fix the immediate problem observed with condy. It affects all LDCs with not-yet-resolved CP entires. IMO it should be fine, but I haven't thought it through. (Also, the comment at https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciTypeFlow.cpp#L2210 is outdated now.) And [`Parse::do_one_bytecode()`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/parse2.cpp#L1962) should always see resolved case now (`constant.is_loaded() == true`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21470#issuecomment-2412358307 From jkarthikeyan at openjdk.org Mon Oct 14 21:53:15 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 14 Oct 2024 21:53:15 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction > Hi @jaskarth , Bigger pattern matching is sensitive to [IR level node sharing](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.cpp#L1724), thus it may not full proof for above 4 patterns. Current patch takes care of this limitation. I think this is a good point. I've taken a look at the patch and added some comments below. src/hotspot/cpu/x86/matcher_x86.hpp line 184: > 182: // Does the CPU supports doubleword multiplication with quadword saturation. > 183: static constexpr bool supports_double_word_mult_with_quadword_staturation(void) { > 184: return true; Should this be `UseAVX > 0`? I'm wondering since we have a `MulVL` rule that applies when `UseAVX == 0`. src/hotspot/share/opto/vectornode.cpp line 2089: > 2087: if (Matcher::supports_double_word_mult_with_quadword_staturation() && > 2088: !is_mult_lower_double_word()) { > 2089: auto is_clear_upper_double_word_uright_shift_op = [](const Node *n) { Suggestion: auto is_clear_upper_double_word_uright_shift_op = [](const Node* n) { src/hotspot/share/opto/vectornode.cpp line 2093: > 2091: n->in(2)->Opcode() == Op_RShiftCntV && n->in(2)->in(1)->is_Con() && > 2092: n->in(2)->in(1)->bottom_type()->isa_int() && > 2093: n->in(2)->in(1)->bottom_type()->is_int()->get_con() == 32L; Suggestion: n->in(2)->in(1)->bottom_type()->is_int()->get_con() == 32; Since you are comparing with a `TypeInt` I think this shouldn't be `32L`. src/hotspot/share/opto/vectornode.cpp line 2098: > 2096: auto is_lower_double_word_and_mask_op = [](const Node *n) { > 2097: if (n->Opcode() == Op_AndV) { > 2098: Node *replicate_operand = n->in(1)->Opcode() == Op_Replicate ? n->in(1) Suggestion: Node* replicate_operand = n->in(1)->Opcode() == Op_Replicate ? n->in(1) src/hotspot/share/opto/vectornode.cpp line 2124: > 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > 2123: if ((is_lower_double_word_and_mask_op(in(1)) || > 2124: is_lower_double_word_and_mask_op(in(1)) || `is_lower_double_word_and_mask_op(in(1)) || is_lower_double_word_and_mask_op(in(1))` is redundant, right? Shouldn't you only need it once? Same for the other 3 calls, which are similarly repeated. test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 41: > 39: */ > 40: > 41: public class VectorMultiplyOpt { Could it be possible to also do IR verification in this test? It would be good to check that we don't generate `AndVL` or `URShiftVL` with this transform. test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 43: > 41: public class VectorMultiplyOpt { > 42: > 43: public static long [] src1; Suggestion: public static long[] src1; And for the rest of the `long []` in this file too. test/micro/org/openjdk/bench/jdk/incubator/vector/VectorXXH3HashingBenchmark.java line 39: > 37: @Param({"1024", "2048", "4096", "8192"}) > 38: private int SIZE; > 39: private long [] accumulators; Suggestion: private long[] accumulators; ------------- PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2367683334 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800159123 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153755 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153568 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153842 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800151177 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800167403 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800165261 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800169840 From sviswanathan at openjdk.org Mon Oct 14 23:35:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 14 Oct 2024 23:35:43 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: Message-ID: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Update test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21480/files - new: https://git.openjdk.org/jdk/pull/21480/files/dedb4a0a..ed299327 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21480/head:pull/21480 PR: https://git.openjdk.org/jdk/pull/21480 From sviswanathan at openjdk.org Mon Oct 14 23:35:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 14 Oct 2024 23:35:43 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 12:18:30 GMT, Emanuel Peter wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test case > > test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java line 76: > >> 74: sout[i+1] = Float.floatToFloat16(finp[i+1]); >> 75: } >> 76: } > > Your test looks different than the one that I added on JIRA. Can you please add that one as well? Thanks for pointing that out. I have modified the contents of the loop kernel to match your testcase loop kernel now. I also verified that it fails before the fix and passes after the fix. Before the fix the test fails: Test results: failed: 1 And the jtr file shows the following: Custom Run Test: @Run: kernel_test_float_float16 - @Tests: {test_float_float16,test_float_float16_strided,test_float_float16_short_vector}: compiler.lib.ir_framework.shared.TestRunException: There was an error while invoking @Run method public void compiler.vectorization.TestFloatConversionsVector.kernel_test_float_float16() at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:162) at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:87) at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861) at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252) at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165) Caused by: java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:119) at java.base/java.lang.reflect.Method.invoke(Method.java:573) at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159) ... 4 more Caused by: java.lang.RuntimeException: assertEquals expected: 18483 but was: 0 at jdk.test.lib.Asserts.fail(Asserts.java:691) at jdk.test.lib.Asserts.assertEquals(Asserts.java:204) at jdk.test.lib.Asserts.assertEquals(Asserts.java:191) at compiler.vectorization.TestFloatConversionsVector.kernel_test_float_float16(TestFloatConversionsVector.java:112) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ... 6 more After the fix the test passes with no failures: Test results: passed: 1 Please let me know if this works or you would like to see any other change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1800232110 From vlivanov at openjdk.org Tue Oct 15 00:31:12 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 15 Oct 2024 00:31:12 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Some time ago, there was a relevant experiment to optimize vectorized Poly1305 implementation by utilizing VPMULDQ instruction on x86 (see [JDK-8219881](https://bugs.openjdk.org/browse/JDK-8219881) for details). The implementation used int-to-long vector casts and produced the following IR shape: `MulVL (VectorCastI2X src1) (VectorCastI2X src2)`. Does it make sense to cover it as part of this particular enhancement? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2412582542 From dlong at openjdk.org Tue Oct 15 04:34:09 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 15 Oct 2024 04:34:09 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: On Fri, 11 Oct 2024 16:51:16 GMT, Chad Rakoczy wrote: > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways > > Confirmed that added test fails before patch and passes after Why is part of the test a binary .class file? ------------- Changes requested by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21473#pullrequestreview-2368140781 From rrich at openjdk.org Tue Oct 15 06:36:52 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 15 Oct 2024 06:36:52 GMT Subject: RFR: 8341862: PPC64: C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR Message-ID: Make sure `LIR_Assembler::emit_unwind_handler()` jumps to the slow path directly for unlocking a synchronized method if `LM_MONITOR` is used. On the fast paths assertions are added that the mode is actually handled. The change passed our CI testing: Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. Testing was done on the main platforms and also on Linux/PPC64le and AIX. ------------- Commit messages: - C1: fix unlock in unwind handler for LM_MONITOR Changes: https://git.openjdk.org/jdk/pull/21497/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21497&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341862 Stats: 9 lines in 2 files changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21497.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21497/head:pull/21497 PR: https://git.openjdk.org/jdk/pull/21497 From mdoerr at openjdk.org Tue Oct 15 06:36:52 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 15 Oct 2024 06:36:52 GMT Subject: RFR: 8341862: PPC64: C1 unwind_handler fails to unlock synchronized methods with LM_MONITOR In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:51:21 GMT, Richard Reingruber wrote: > Make sure `LIR_Assembler::emit_unwind_handler()` jumps to the slow path directly for unlocking a synchronized method if `LM_MONITOR` is used. > On the fast paths assertions are added that the mode is actually handled. > > The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. Good catch! LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21497#pullrequestreview-2366762944 From amitkumar at openjdk.org Tue Oct 15 06:42:51 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 15 Oct 2024 06:42:51 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v2] In-Reply-To: References: Message-ID: > This is trivial PR to change data type of some "*Threshold" variables form `intx` to `double`. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: updates tier2 threshold datatype ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21354/files - new: https://git.openjdk.org/jdk/pull/21354/files/a53535f5..ce4ff580 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21354&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21354&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21354/head:pull/21354 PR: https://git.openjdk.org/jdk/pull/21354 From amitkumar at openjdk.org Tue Oct 15 06:49:10 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 15 Oct 2024 06:49:10 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v2] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 19:39:07 GMT, Vladimir Kozlov wrote: >Any reasons Tier2*Threshold flags were bit changed? For consistency. I guess you're asking why I left them unchanged? I looked into the project, and couldn't find where those flags are being used, so I left them unchanged at first. However, I've now updated them to `double` as well. Thanks for the suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2413030380 From rrich at openjdk.org Tue Oct 15 06:55:37 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 15 Oct 2024 06:55:37 GMT Subject: RFR: 8341715: PPC64: ObjectMonitor::_owner should be reset unconditionally in nmethod unlocking Message-ID: This removes the `ObjectMonitor::_owner` check when a nmethod unlocks an inflated monitor on ppc64. Monitor operations by nmethods are guaranteed to be balanced (see JBS-item for a reference) therefore the check is redundant. Other platforms don't have it either. I've removed the assertion that the unlocking thread owns the monitor again because it won't work with vthread monitor support in the loom repository. The fix passed our CI testing with `LockingMode` set to `LM_LEGACY` Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. Testing was done on the main platforms and also on Linux/PPC64le and AIX. The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with `LM_LEGACY`. ------------- Commit messages: - Remove assertion - compiler_fast_unlock_object: no need to check ObjectMonitor::_owner Changes: https://git.openjdk.org/jdk/pull/21494/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21494&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341715 Stats: 33 lines in 1 file changed: 26 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21494.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21494/head:pull/21494 PR: https://git.openjdk.org/jdk/pull/21494 From rrich at openjdk.org Tue Oct 15 06:59:39 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 15 Oct 2024 06:59:39 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success Message-ID: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: // flag == EQ indicates success, decrement held monitor count // flag == NE indicates failure The fix passed our CI testing with LockingMode set to LM_LEGACY Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. Testing was done on the main platforms and also on Linux/PPC64le and AIX. The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. ------------- Commit messages: - Must reach success with flag == EQ Changes: https://git.openjdk.org/jdk/pull/21496/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21496&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342042 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21496.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21496/head:pull/21496 PR: https://git.openjdk.org/jdk/pull/21496 From mdoerr at openjdk.org Tue Oct 15 06:59:39 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 15 Oct 2024 06:59:39 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success In-Reply-To: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: <3GAiIctv0lvgogniDgMHaVVuiRorbnLQgN7GwgxN-ek=.b58817d5-2d00-4507-8451-2e2313aa561f@github.com> On Mon, 14 Oct 2024 13:29:43 GMT, Richard Reingruber wrote: > This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: > > > // flag == EQ indicates success, decrement held monitor count > // flag == NE indicates failure > > > The fix passed our CI testing with LockingMode set to LM_LEGACY > Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. Good catch! LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21496#pullrequestreview-2366744824 From epeter at openjdk.org Tue Oct 15 07:00:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 07:00:11 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 18:35:52 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/x86.ad line 3679: >> >>> 3677: >>> 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{ >>> 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); >> >> Ok, and so what alternative path is the matcher now going to take if we have `vector_length = 8`? > > @eme64 It is now going to do the load from memory in register and use the register/register version for the conversion and then use the store to memory. Please see before and after code snippets below. > > Generated code snippet for 2 element float vector to float16 vector conversion > Before: > vmovq 0x10(%rdx,%rbx,4),%xmm2 ; load 8 bytes from memory into xmm2 (correct) > vcvtps2ph $0x4,%xmm2,0x10(%rsi,%rbx,2) ; convert to float16 and store 8 bytes to memory (incorrect) > > After: > vmovq 0x10(%rdx,%rbx,4),%xmm15 ; load 8 bytes from memory into xmm15 (correct) > vcvtps2ph $0x4,%xmm15,%xmm0 ; convert to float16 into register (correct) > vmovd %xmm0,0x10(%rsi,%rbx,2) ; store 4 byte into memory (correct) Ah, I see. You are using a 4-element register-only `vcvtps2ph` instruction, but only use the first 2-elements of it. Great :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1800564054 From rrich at openjdk.org Tue Oct 15 06:59:39 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 15 Oct 2024 06:59:39 GMT Subject: RFR: 8342042: PPC64: compiler_fast_unlock_object flags failure instead of success In-Reply-To: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> References: <_kTZvxuTWBUsj3YwEk8y5NgOhioScEiIdelsTDXye5Q=.79f2e71b-08a2-4fca-8833-233971508215@github.com> Message-ID: On Mon, 14 Oct 2024 13:29:43 GMT, Richard Reingruber wrote: > This change inverts the `EQ` bit in `flag` with a condition register nand instruction in order to meet the post condition given at L2751-L2752: > > > // flag == EQ indicates success, decrement held monitor count > // flag == NE indicates failure > > > The fix passed our CI testing with LockingMode set to LM_LEGACY > Tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > The test runtime/logging/MonitorInflationTest.java failed on all platforms. Apparently it has issues with LM_LEGACY. Thanks for the quick review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21496#issuecomment-2411369527 From chagedorn at openjdk.org Tue Oct 15 07:03:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Oct 2024 07:03:10 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v2] In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 15:15:54 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Suggestions from review Looks good, thanks for the update! I'll give this another spinning in our testing. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21439#pullrequestreview-2368344706 From epeter at openjdk.org Tue Oct 15 07:04:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 07:04:11 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> Message-ID: On Mon, 14 Oct 2024 23:35:43 GMT, Sandhya Viswanathan wrote: >> When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. >> Also a regression test case is added accordingly. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Update test case Thanks for the updates! It looks good to me now. I have one more wish: Could you allow to run the test on all platforms please? `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java` Currently, it only runs on selected platforms, see `@requires`. We should really run this on all platforms and compilers. The IR rules can be limited to platforms and features. That ensures that we do not have similar bugs elsewhere, and our test-coverage would be increased. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21480#pullrequestreview-2368347957 From chagedorn at openjdk.org Tue Oct 15 07:08:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Oct 2024 07:08:17 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v24] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 12:33:44 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/lib/compile_framework/README.md > > Co-authored-by: Christian Hagedorn > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20184#pullrequestreview-2368354538 From mli at openjdk.org Tue Oct 15 08:06:25 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 15 Oct 2024 08:06:25 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Message-ID: Hi, Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. This pr is based on https://github.com/openjdk/jdk/pull/20781. Thanks! ## Test ### tests: * test/jdk/jdk/incubator/vector/ * test/hotspot/jtreg/compiler/vectorapi/ ### options: * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs * -XX:+EnableVectorSupport -XX:-UseVectorStubs ## Performance ### Tests jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests in another pr). ### Options * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' ### Performance data I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/21502/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21502&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312425 Stats: 161 lines in 6 files changed: 156 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21502.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21502/head:pull/21502 PR: https://git.openjdk.org/jdk/pull/21502 From thartmann at openjdk.org Tue Oct 15 08:07:13 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 08:07:13 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 19:29:52 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Modified ciTypeFlow::can_trap > > src/hotspot/share/ci/ciTypeFlow.cpp line 2220: > >> 2218: case Bytecodes::_ldc_w: >> 2219: case Bytecodes::_ldc2_w: >> 2220: return str.is_in_error() || !str.get_constant().is_loaded(); > > There is also `con.is_valid()` check in `do_ldc()`. But I do know what memory is referenced in "OutOfMemoryError in the CI while loading a String constant" when it is invalid. But in that case no exception is installed and we bail out from compilation, right? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciTypeFlow.cpp#L746 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21470#discussion_r1800656778 From tschatzl at openjdk.org Tue Oct 15 08:11:20 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 15 Oct 2024 08:11:20 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Fri, 11 Oct 2024 19:57:25 GMT, Dean Long wrote: > @tschatzl , when we call register_nmethod(), do we really need to scan the oops immediately, or could that be delayed until the next safepoint? Could be delayed at least for the STW collectors, but we want to avoid doing any work during gc as much as possible. This may be more tricky with concurrent gcs. After some talk with @TobiHartmann we think that it is best and safer to extend the lock scope. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2413184441 From thartmann at openjdk.org Tue Oct 15 08:15:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 08:15:10 GMT Subject: RFR: 8339694: ciTypeFlow does not correctly handle unresolved constant dynamic of array type [v3] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 21:16:51 GMT, Vladimir Ivanov wrote: > Proposed fix is broader than strictly needed to fix the immediate problem observed with condy Right, to be on the safe side, I could add a `str.is_dynamic_constant()` check to limit the trap to condy. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21470#issuecomment-2413191943 From jbhateja at openjdk.org Tue Oct 15 08:20:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 15 Oct 2024 08:20:24 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> Message-ID: On Mon, 14 Oct 2024 23:35:43 GMT, Sandhya Viswanathan wrote: >> When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. >> Also a regression test case is added accordingly. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Update test case src/hotspot/cpu/x86/x86.ad line 3679: > 3677: > 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{ > 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); You can add an eligant prediction check like following instead of accesing bare inputs. n->as_StoreVector()->memory_size() >= 16. test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java line 110: > 108: } > 109: > 110: // Verifying the result Since we are using IR framework, we can leverage existing[ @Check](https://github.com/openjdk/jdk/blob/521effe017b9b6322036f1851220056a637d6b1c/test/hotspot/jtreg/compiler/lib/ir_framework/Check.java#L32) annotation for verification which works in conjunction with @Test method, it will automatically invoke validation after test method execution. We may need little refactoring for this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1800662857 PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1800673072 From thartmann at openjdk.org Tue Oct 15 09:17:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 09:17:50 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 13:39:28 GMT, Tobias Hartmann wrote: > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > In short: > - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. > - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. > - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. > > Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. > I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remo... Thanks for the discussions! I updated the PR to extend the scope of the `Patching_lock`. I also had to decrease the iterations in the test due to timeouts with debug on slow machines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2413338038 From thartmann at openjdk.org Tue Oct 15 09:17:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 09:17:50 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v2] In-Reply-To: References: Message-ID: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> > C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. > > Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 > > While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 > > In short: > - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. > - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. > - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. > > Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. > I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. > > The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: > https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 > > I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) triggers in both scenarios. I had to remo... Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Extend the Patching_lock instead - Merge branch 'master' into 8340313 - Extending patching lock - Increased timeout - Removed platform specific asserts from shared code - 8340313: Crash due to invalid oop in nmethod after C1 patching ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21389/files - new: https://git.openjdk.org/jdk/pull/21389/files/050e2c8f..ec5d105b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21389&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21389&range=00-01 Stats: 16085 lines in 245 files changed: 13548 ins; 1161 del; 1376 mod Patch: https://git.openjdk.org/jdk/pull/21389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21389/head:pull/21389 PR: https://git.openjdk.org/jdk/pull/21389 From epeter at openjdk.org Tue Oct 15 09:38:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 09:38:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sun, 13 Oct 2024 11:18:01 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating tests to use floorMod I gave it a quick scan, and I have no further comments. LGTM. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2368730929 From epeter at openjdk.org Tue Oct 15 10:22:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 10:22:21 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v24] In-Reply-To: References: Message-ID: <704HcEgAdeR1380vEK4z0bG0KiJ1kjRVSBCa9EFrt0w=.bee85693-033c-4d85-9f89-3e186ca3c2fb@github.com> On Sun, 13 Oct 2024 09:57:00 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update adlc changes. Are there any IR rules that verify that the correct C2 nodes are used? Is that a thing you generally do with the VectorAPI, just to make sure things get correctly intrinsified? src/hotspot/share/opto/vectornode.hpp line 161: > 159: // Needed for proper cloning. > 160: virtual uint size_of() const { return sizeof(*this); } > 161: bool is_unsigned() { return _is_unsigned; } Can you put this in the `print_spec`, so the IR dump shows if it is unsigned? ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2368845862 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1800870852 From ihse at openjdk.org Tue Oct 15 11:07:16 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 15 Oct 2024 11:07:16 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 14:57:46 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. > This pr is based on https://github.com/openjdk/jdk/pull/20781. > > Thanks! > > ## Test > ### tests: > * test/jdk/jdk/incubator/vector/ > * test/hotspot/jtreg/compiler/vectorapi/ > > ### options: > * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:+EnableVectorSupport -XX:-UseVectorStubs > > ## Performance > > ### Tests > jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). > > ### Options > * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' > * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' > > ### Performance data > I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. make/autoconf/flags-cflags.m4 line 920: > 918: # ACLE and this flag are required to build the aarch64 SVE related functions in > 919: # libvectormath. > 920: if test "x${OPENJDK_TARGET_CPU}" = "xaarch64"; then Suggestion: if test "x$OPENJDK_TARGET_CPU" = "xaarch64"; then ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21502#discussion_r1800940513 From eosterlund at openjdk.org Tue Oct 15 11:50:12 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 15 Oct 2024 11:50:12 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v2] In-Reply-To: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> References: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> Message-ID: On Tue, 15 Oct 2024 09:17:50 GMT, Tobias Hartmann wrote: >> C1 patching (explained in detail [here](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L835)) works by rewriting the [patch body](https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L860), usually a move, and then copying it into place over top of the jmp instruction, being careful to flush caches and doing it in an MP-safe way. >> >> Now the problem is that there can be multiple patch sides in one nmethod (for example, one for each field access in `TestConcurrentPatching::test`) and multiple threads executing that nmethod can trigger patching concurrently. Although the patch body is not executed, one `Thread A` can update the oop immediate of the `mov` in the patch body: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1185 >> >> While another `Thread B` is done with patching another patch side and already walks the nmethod oops via `Universe::heap()->register_nmethod(nm)` to notify the GC. `Thread B` might then encounter a half-written oop from `Thread A` if the immediate crosses a page or cache-line boundary: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1282 >> >> In short: >> - `Thread A`: Patches location 1 in an nmethod and executes `register_nmethod()`, walking all immediate oops. >> - `Thread B`: Patches location 2 in the same nmethod and just wrote half of the oop immediate of an `mov` in the patch body because the store is not atomic. >> - `Thread A`: Crashes when walking the immediate oops of the nmethod and encountering the oop just partially written by `Thread B` concurrently. >> >> Updating the oop immediate is not atomic on x86_64 because the address of the immediate is not 8-byte aligned. >> I propose to simply align it in `PatchingStub::emit_code` to guarantee atomicity. >> >> The new regression test triggers the issue reliably for the `load_mirror_patching_id` case but unfortunately, I was not able to trigger the `load_appendix_patching_id` case which should be affected as well: >> https://github.com/openjdk/jdk/blob/212e32931cafe446d94219d6c3ffd92261984dff/src/hotspot/share/c1/c1_Runtime1.cpp#L1197 >> >> I still added a corresponding test case `testIndy` and a [new assert that checks proper alignment](https://github.com/openjdk/jdk/commit/f418bc01c946b4c76f4bceac1ad503dabe182df7) t... > > Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Extend the Patching_lock instead > - Merge branch 'master' into 8340313 > - Extending patching lock > - Increased timeout > - Removed platform specific asserts from shared code > - 8340313: Crash due to invalid oop in nmethod after C1 patching I'm fine with the fix. I can't help though but to reflect on the ever diminishing role of the Patching_lock. It used to be used quite a lot, but has had its lunch eaten by the CompiledMethod_lock, CompiledIC_lock and CodeCache_lock over time. Today, the Patching_lock is used in exactly one place: in this exact C1 patching that we are looking at now. And now we found that holding that lock wasn't enough because we need the CodeCache_lock as well. We could instead extend the CodeCache_lock critical section a bit, and then there is no need for the Patching_lock at all. Is it time for this lock to retire? It's had a good run. Thoughts? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2413691556 From thartmann at openjdk.org Tue Oct 15 11:58:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 11:58:10 GMT Subject: RFR: 8340313: Crash due to invalid oop in nmethod after C1 patching [v2] In-Reply-To: References: <5-ppux2sUtARhvrnb09TdA53oUPC8yX1j020dmfWw00=.e76e4077-c448-470c-a1c9-a2f43906e754@github.com> Message-ID: On Tue, 15 Oct 2024 11:47:22 GMT, Erik ?sterlund wrote: > We could instead extend the CodeCache_lock critical section a bit, and then there is no need for the Patching_lock at all. Is it time for this lock to retire? +1 to retiring the `Patching_lock` and using the `CodeCache_lock` instead. Let's see what others think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21389#issuecomment-2413709220 From mli at openjdk.org Tue Oct 15 12:16:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 15 Oct 2024 12:16:28 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. > This pr is based on https://github.com/openjdk/jdk/pull/20781. > > Thanks! > > ## Test > ### tests: > * test/jdk/jdk/incubator/vector/ > * test/hotspot/jtreg/compiler/vectorapi/ > > ### options: > * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:+EnableVectorSupport -XX:-UseVectorStubs > > ## Performance > > ### Tests > jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). > > ### Options > * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' > * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' > > ### Performance data > I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Update make/autoconf/flags-cflags.m4 Co-authored-by: Magnus Ihse Bursie ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21502/files - new: https://git.openjdk.org/jdk/pull/21502/files/9baa41d9..3aaf1c46 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21502&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21502&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21502.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21502/head:pull/21502 PR: https://git.openjdk.org/jdk/pull/21502 From mli at openjdk.org Tue Oct 15 12:16:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 15 Oct 2024 12:16:28 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 11:04:40 GMT, Magnus Ihse Bursie wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Update make/autoconf/flags-cflags.m4 >> >> Co-authored-by: Magnus Ihse Bursie > > make/autoconf/flags-cflags.m4 line 920: > >> 918: # ACLE and this flag are required to build the aarch64 SVE related functions in >> 919: # libvectormath. >> 920: if test "x${OPENJDK_TARGET_CPU}" = "xaarch64"; then > > Suggestion: > > if test "x$OPENJDK_TARGET_CPU" = "xaarch64"; then Thanks, Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21502#discussion_r1801048212 From thartmann at openjdk.org Tue Oct 15 12:33:11 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 12:33:11 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: On Fri, 11 Oct 2024 16:51:16 GMT, Chad Rakoczy wrote: > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways > > Confirmed that added test fails before patch and passes after The class file is from the original bug report, it should be converted to a jasm file. test/hotspot/jtreg/compiler/c1/Test8335662.java line 27: > 25: * @test > 26: * @bug 8335662 > 27: * @summary Execute main() method Please use a more descriptive summary of the test. test/hotspot/jtreg/compiler/c1/Test8335662.java line 35: > 33: import java.lang.reflect.Method; > 34: > 35: public class Test8335662 { We don't use bug numbers for test names (anymore). ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21473#pullrequestreview-2369186037 PR Review Comment: https://git.openjdk.org/jdk/pull/21473#discussion_r1801070515 PR Review Comment: https://git.openjdk.org/jdk/pull/21473#discussion_r1801073113 From thartmann at openjdk.org Tue Oct 15 12:49:15 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 12:49:15 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v22] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Wed, 9 Oct 2024 18:21:30 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > remove <= test cases, disable StressLongCountedLoop and PerMethodTrapLimit Thanks for the detailed investigation and feedback. The changes look good to me, I'll re-run testing and report back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2413818288 From thartmann at openjdk.org Tue Oct 15 13:03:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Oct 2024 13:03:10 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes In-Reply-To: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> References: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> Message-ID: <1hBYun5fdCgGojbidnasoaJ7r0qYYQOXu4pYaIOukqU=.26ecdf8f-e40e-4ae1-90ff-5ac52fc318c4@github.com> On Thu, 10 Oct 2024 09:04:21 GMT, Christian Hagedorn wrote: > This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). > > The patch includes the following changes: > - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: > - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). > - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. > - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) > - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). > - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) > - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. > > Thanks, > Christian Difficult to review but looks good to me overall. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21446#pullrequestreview-2369272515 From chagedorn at openjdk.org Tue Oct 15 13:13:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Oct 2024 13:13:11 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes In-Reply-To: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> References: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> Message-ID: On Thu, 10 Oct 2024 09:04:21 GMT, Christian Hagedorn wrote: > This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). > > The patch includes the following changes: > - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: > - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). > - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. > - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) > - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). > - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) > - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. > > Thanks, > Christian Thanks Tobias for your review! I agree, it ended up more on the complex side than originally anticipated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21446#issuecomment-2413875570 From ihse at openjdk.org Tue Oct 15 13:53:11 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 15 Oct 2024 13:53:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 12:16:28 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Update make/autoconf/flags-cflags.m4 > > Co-authored-by: Magnus Ihse Bursie Build changes look fine. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21502#pullrequestreview-2369422806 From enikitin at openjdk.org Tue Oct 15 14:11:23 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Tue, 15 Oct 2024 14:11:23 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v24] In-Reply-To: References: Message-ID: <49ybhAOwbGMbp4G0gdR9cj14L20sWrSLrSIrxKmzfsw=.7eceb93c-be34-428c-b531-a3ce592bcb9a@github.com> On Mon, 14 Oct 2024 12:33:44 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/lib/compile_framework/README.md > > Co-authored-by: Christian Hagedorn > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn test/hotspot/jtreg/compiler/lib/compile_framework/Compile.java line 65: > 63: List command = new ArrayList<>(); > 64: > 65: command.add("%s/bin/javac".formatted(System.getProperty("compile.jdk"))); 1. Use ```jdk.test.lib.JDKToolFinder.getJDKTool("javac");``` ? 2. Store in a static variable once during initialization? To not request properties / call format string parsing every time? test/hotspot/jtreg/compiler/lib/compile_framework/Compile.java line 101: > 99: List command = new ArrayList<>(); > 100: > 101: command.add("%s/bin/java".formatted(System.getProperty("compile.jdk"))); 1. Use ```jdk.test.lib.JDKToolFinder.getJDKTool("java");``` ? 2. Store in a static variable once during initialization? To not request properties / call format string parsing every time? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1801209700 PR Review Comment: https://git.openjdk.org/jdk/pull/20184#discussion_r1801210692 From psandoz at openjdk.org Tue Oct 15 16:06:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 16:06:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 15 Oct 2024 09:35:23 GMT, Emanuel Peter wrote: > I gave it a quick scan, and I have no further comments. LGTM. Thank you, i will kick off an internal test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2414431367 From epeter at openjdk.org Tue Oct 15 16:09:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 16:09:15 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 06:42:51 GMT, Amit Kumar wrote: >> This is trivial PR to change data type of some "*Threshold" variables form `intx` to `double`. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > updates tier2 threshold datatype Instead of changing the `product` flags (is a CSR needed for that?), you could also just cast to `double` at every use site. Would that also work? src/hotspot/share/opto/bytecodeInfo.cpp line 316: > 314: int call_site_count = caller_method->scale_count(profile.count()); > 315: int invoke_count = caller_method->interpreter_invocation_count(); > 316: assert(invoke_count >= 0, "require invocation count greater than zero"); Technically, the comment is now wrong. It is no longer "greater than" but "greater than or equal to zero". Is that intended? Otherwise you should use `>`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2414437954 PR Review Comment: https://git.openjdk.org/jdk/pull/21354#discussion_r1801504146 From epeter at openjdk.org Tue Oct 15 16:32:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 16:32:15 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Sun, 6 Oct 2024 08:32:20 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > [vectorapi] Refactor VectorShuffle implementation src/hotspot/cpu/x86/x86.ad line 2172: > 2170: > 2171: // Return true if Vector::rearrange needs preparation of the shuffle argument > 2172: bool Matcher::vector_needs_load_shuffle(BasicType elem_bt, int vlen) { I think the name needs to be more expressive. If I read it alone, then I would think that it is about all kinds of vectors ... and it is confusing because what is a "load shuffle"? Are we shuffling loads or loading shuffles? src/hotspot/share/opto/vectornode.hpp line 1618: > 1616: public: > 1617: VectorLoadShuffleNode(Node* in, const TypeVect* vt) > 1618: : VectorNode(in, vt) {} Can you add a comment above "class VectorLoadShuffleNode" to say what its semantics are? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1801531980 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1801536233 From qamai at openjdk.org Tue Oct 15 16:33:20 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 15 Oct 2024 16:33:20 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Personally, I think this optimization is not essential, so we should proceed with introducing lowering first, then add this transformation to that phase, instead of trying to integrate this transformation then refactor it into phase lowering, which seems like a net extra step. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414491182 From psandoz at openjdk.org Tue Oct 15 16:42:18 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 16:42:18 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v23] In-Reply-To: References: <0LkBvvNPq5jmWfOdjItIXGedRDtpiivJM06BAx7vP0I=.c5417544-edef-4623-beaa-08cd7c565361@github.com> Message-ID: On Thu, 10 Oct 2024 16:24:35 GMT, Jatin Bhateja wrote: > Hi @vnkozlov , Can you kindly run this through your test infrastructure. We have two review approvals for Java and x86 backend code. I have kicked off some internal tests (FYI @vnkozlov) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2414510216 From jkarthikeyan at openjdk.org Tue Oct 15 17:03:15 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 15 Oct 2024 17:03:15 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ instruction for following IR pallets. >> >> >> MulL ( And SRC1, 0xFFFFFFFF) ( And SRC2, 0xFFFFFFFF) >> MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMULUDQ instruction performs unsigned multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimization:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414553899 From epeter at openjdk.org Tue Oct 15 17:05:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 17:05:40 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v25] In-Reply-To: References: Message-ID: <6wHlTOx8Nc4wPZg0L7XSvTjWX_WTET45i_YsvV2ByKY=.ce572319-e1ca-47e8-9be5-95c823fb32e4@github.com> > **Motivation** > > I want to write small dedicated fuzzers: > - Generate `java` and `jasm` source code: just some `String`. > - Quickly compile it (with this framework). > - Execute the compiled code. > > The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. > > **The CompileFramework** > > Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. > An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. > > I implemented a first, simple version of the framework. I added some tests and examples. > > **Example** > > > CompileFramework comp = new CompileFramework(); > comp.add(SourceCode.newJavaSourceCode("XYZ", "")); > comp.compile(); > comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); > > > https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 > > **Below some use cases: tests that would have been better with the CompileFramework** > > **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** > > I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. > > For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, > > to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something inevitably breaks in the VM code (i.e. most likely in loop-opts or SuperW... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: use JDKToolFinder for Evgeny ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20184/files - new: https://git.openjdk.org/jdk/pull/20184/files/4eeab363..d50b6e1a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20184&range=23-24 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20184.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20184/head:pull/20184 PR: https://git.openjdk.org/jdk/pull/20184 From epeter at openjdk.org Tue Oct 15 17:10:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 17:10:19 GMT Subject: RFR: 8337221: CompileFramework: test library to conveniently compile java and jasm sources for fuzzing [v25] In-Reply-To: <6wHlTOx8Nc4wPZg0L7XSvTjWX_WTET45i_YsvV2ByKY=.ce572319-e1ca-47e8-9be5-95c823fb32e4@github.com> References: <6wHlTOx8Nc4wPZg0L7XSvTjWX_WTET45i_YsvV2ByKY=.ce572319-e1ca-47e8-9be5-95c823fb32e4@github.com> Message-ID: <7Qlg20-QukNORu89brmvzlj6IyyOIf8taAfUHQF5Ve4=.33d85c10-749a-4226-83ef-e0ee35d79a60@github.com> On Tue, 15 Oct 2024 17:05:40 GMT, Emanuel Peter wrote: >> **Motivation** >> >> I want to write small dedicated fuzzers: >> - Generate `java` and `jasm` source code: just some `String`. >> - Quickly compile it (with this framework). >> - Execute the compiled code. >> >> The primary users of the CompileFramework are Compiler-Engineers. Imagine you are working on some optimization. You already have a list of **hand-written tests**, but you are worried that this does not give you good coverage. You also do not trust that an existing Fuzzer will catch your bugs (at least not fast enough). Hence, you want to **script-generate** a large list of tests. But where do you put this script? It would be nice if it was also checked in on git, so that others can modify and maintain the test easily. But with such a script, you can only generate a **static test**. In some cases that is good enough, but sometimes the list of all possible tests your script would generate is very very large. Too large. So you need to randomly sample some of the tests. At this point, it would be nice to generate different tests with every run: a "mini-fuzzer" or a **fuzzer dedicated to a compiler feature**. >> >> **The CompileFramework** >> >> Java sources are compiled with `javac`, jasm sources with `asmtools` that are delivered with `jtreg`. >> An important factor: Integration with the IR-Framwrork (`TestFramework`): we want to be able to generate IR-rules for our tests. >> >> I implemented a first, simple version of the framework. I added some tests and examples. >> >> **Example** >> >> >> CompileFramework comp = new CompileFramework(); >> comp.add(SourceCode.newJavaSourceCode("XYZ", "")); >> comp.compile(); >> comp.invoke("XYZ", "test", new Object[] {5}); // XYZ.test(5); >> >> >> https://github.com/openjdk/jdk/blob/e869cce8092ee995cf2f3ad1ab2bca69c5e256ab/test/hotspot/jtreg/testlibrary_tests/compile_framework/examples/SimpleJavaExample.java#L42-L74 >> >> **Below some use cases: tests that would have been better with the CompileFramework** >> >> **Use case: test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java** >> >> I needed to test loops with various `init / stride / limit / scale / unrolling-factor / ...`. >> >> For this I used `MethodHandle constant = MethodHandles.constant(int.class, value);`, >> >> to be able to chose different values before the C2 compilation, and then the C2 compilation would see them as constants and optimize assuming those constants. This works, but is difficult to extract reproducers once something i... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use JDKToolFinder for Evgeny @lepestock thanks for the hint! I applied your suggestion :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20184#issuecomment-2414567493 From qamai at openjdk.org Tue Oct 15 17:29:12 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 15 Oct 2024 17:29:12 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.04582d26-8f0b-46e5-a5c0-7d6ea4164e63@github.com> Message-ID: On Tue, 15 Oct 2024 17:00:26 GMT, Jasmine Karthikeyan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 >> - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction > > I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? > > About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) @jaskarth Please proceed with it, I have a really simple prototype for it but I don't have any plan to proceed further soon. Thanks a lot :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414605470 From kvn at openjdk.org Tue Oct 15 17:36:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Oct 2024 17:36:12 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: <02XI0hQTUSx-TDvEN78_ZYXqES3q9hXXLQ8gqJINUNs=.2220892b-aa87-4247-a749-253288a33996@github.com> On Mon, 14 Oct 2024 13:42:45 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> refine comments + typo > > Thanks for the source code. That's really interesting, running the benchmark multiple times may give different results, and even when there is a difference in the observed throughputs, the 2 compiled methods are exactly the same. So I think we are running into different quirks here, probably due to the fact that this benchmark saturates the memory bandwidth. @merykitty can you run this with regular Java benchmarks (SPECjvm, SPECjbb, Renaissance, DaCapo) to see if they are affected? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2414619344 From kvn at openjdk.org Tue Oct 15 17:55:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Oct 2024 17:55:14 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v6] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 13:42:45 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> refine comments + typo > > Thanks for the source code. That's really interesting, running the benchmark multiple times may give different results, and even when there is a difference in the observed throughputs, the 2 compiled methods are exactly the same. So I think we are running into different quirks here, probably due to the fact that this benchmark saturates the memory bandwidth. > @merykitty can you run this with regular Java benchmarks (SPECjvm, SPECjbb, Renaissance, DaCapo) to see if they are affected? We will also run our set of benchmarks to make sure there is no regression. If we see significant regression only in some benchmarks and improvement in others we can set `LoopAwareSpilling` to false in these changes and address regression in following PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2414657865 From duke at openjdk.org Tue Oct 15 18:53:45 2024 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 15 Oct 2024 18:53:45 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn [v2] In-Reply-To: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: > [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) > > Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. > > I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways > > Confirmed that added test fails before patch and passes after Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: - Add blank line at end of test - Add jasm and update test description ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21473/files - new: https://git.openjdk.org/jdk/pull/21473/files/a6d2f814..51298397 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21473&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21473&range=00-01 Stats: 278 lines in 4 files changed: 235 ins; 43 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21473.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21473/head:pull/21473 PR: https://git.openjdk.org/jdk/pull/21473 From aph at openjdk.org Tue Oct 15 19:35:18 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Oct 2024 19:35:18 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn [v2] In-Reply-To: References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: On Tue, 15 Oct 2024 18:53:45 GMT, Chad Rakoczy wrote: >> [JDK-8335662](https://bugs.openjdk.org/browse/JDK-8335662) >> >> Crash occurs in C1 during OSR when copying locks from interpreter frame to compiled frame. All loads used immediate offset regardless of offset size causing crash when it is over the max size for the instruction (32760). Fix is to check the size before preforming the load and storing the offset in a register if needed. >> >> I believe the risk is low because there will be no change to the instruction if the immediate offset fits in the load instruction. The instruction is only updated when the `offset_ok_for_immed` check fails which would cause the crash anyways >> >> Confirmed that added test fails before patch and passes after > > Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision: > > - Add blank line at end of test > - Add jasm and update test description One thing for you to think about ifm you are interested in sone further work in this area.. This is a generic problem. It might be very beneficial to look for every base + immediate offset instruction, see if there is a possibility that there may be an overflow, and insert a `form_address()`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21473#issuecomment-2414846924 From kvn at openjdk.org Tue Oct 15 19:39:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Oct 2024 19:39:11 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes In-Reply-To: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> References: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> Message-ID: On Thu, 10 Oct 2024 09:04:21 GMT, Christian Hagedorn wrote: > This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). > > The patch includes the following changes: > - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: > - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). > - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. > - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) > - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). > - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) > - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. > > Thanks, > Christian Good refactoring. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21446#pullrequestreview-2370392250 From psandoz at openjdk.org Tue Oct 15 19:43:24 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 19:43:24 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v24] In-Reply-To: References: Message-ID: On Sun, 13 Oct 2024 09:57:00 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update adlc changes. The compiler test `test/hotspot/jtreg/compiler/vectorapi/VectorCompareWithZeroTest.java` fails to compile and needs to update to use the renamed constants (`UNSIGNED_GT` -> `UGT` and `UNSIGNED_GE` -> `UGE`). This test is only compiled and run on aarch64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2414859793 From kvn at openjdk.org Tue Oct 15 19:55:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Oct 2024 19:55:13 GMT Subject: RFR: 8339067: Convert Threshold flags (like Tier4MinInvocationThreshold and Tier3MinInvocationThreshold) to double [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 16:06:04 GMT, Emanuel Peter wrote: > Instead of changing the `product` flags (is a CSR needed for that?), you could also just cast to `double` at every use site. Would that also work? Yes, we need CSR for these changes if we do as they are now. Have cast or assign to local variable is preferable, I agree. > src/hotspot/share/opto/bytecodeInfo.cpp line 316: > >> 314: int call_site_count = caller_method->scale_count(profile.count()); >> 315: int invoke_count = caller_method->interpreter_invocation_count(); >> 316: assert(invoke_count >= 0, "require invocation count greater than zero"); > > Technically, the comment is now wrong. It is no longer "greater than" but "greater than or equal to zero". Is that intended? Otherwise you should use `>`. Actually it should be `>` because we divide by it in next line. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21354#issuecomment-2414880608 PR Review Comment: https://git.openjdk.org/jdk/pull/21354#discussion_r1801823705 From chagedorn at openjdk.org Tue Oct 15 20:47:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Oct 2024 20:47:12 GMT Subject: RFR: 8341328: Refactor initial Assertion Predicate creation into separate classes In-Reply-To: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> References: <1c3c2PH00e9AoQMbW1W7iAKTB2b_GBND3ifxW8Yrf-I=.ddb65430-be26-4742-a6f4-61f70eb47f9e@github.com> Message-ID: <0R2dApC5dSwIjJozNvYCW46NIzArP6ZGLKVrr5Zn4XI=.edcf29b9-0693-47c3-af89-07e9b6d364aa@github.com> On Thu, 10 Oct 2024 09:04:21 GMT, Christian Hagedorn wrote: > This PR refactors the initial Assertion Predicate creation (i.e. when initially creating them, not when copy/copy-updating them from existing Template Assertion Predicates). > > The patch includes the following changes: > - `PhaseIdealLoop::add_template_assertion_predicate()`, `add_range_check_elimination_assertion_predicate()`and the preparation code to call it, and `clone_template_assertion_predicate()` have similar code. I tried to share the common bits with new classes: > - `TemplateAssertionPredicateCreator`: Creates a new Template Assertion Predicate either with an UCT (done in Loop Predication) or a Halt node (done in Range Check Elimination). > - `InitializedAssertionPredicateCreator`: Creates a new Initialized Assertion Predicate with a Halt Node. This is an existing class which provided a method to clone a Template Assertion Predicate expression and create a new Initialized Assertion Predicate with it. Now it's extended to create one without an existing template. > - `AssertionPredicateIfCreator`: Used by both classes above and also by `clone_template_assertion_predicate()` (it clones the Assertion Predicate expression first and then just needs to create the `If`) > - `AssertionPredicateExpressionCreator`: Create a new Assertion Predicate expression, either with an `Opaque4` (for Template Assertion Predicates) or an `OpaqueInitializedAssertionPredicate` (for Initialized Assertion Predicates). > - Some renaming to get more consistency (e.g. use `new_control` instead of `control` or `new_ctrl`) > - Adding new `AssertionPredicateType::FinalIv` which was missed to account for in [JDK-8335393](https://bugs.openjdk.org/browse/JDK-8335393) where a new Initialized Assertion Predicate was added for the final IV in Range Check Elimination for a special case when removing an empty main loop. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21446#issuecomment-2415048929 From psandoz at openjdk.org Tue Oct 15 21:00:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 21:00:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 15 Oct 2024 16:03:13 GMT, Paul Sandoz wrote: > > I gave it a quick scan, and I have no further comments. LGTM. > > Thank you, i will kick off an internal test. Tier 1 to 3 tests pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2415121395 From psandoz at openjdk.org Tue Oct 15 21:00:25 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 21:00:25 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v23] In-Reply-To: References: <0LkBvvNPq5jmWfOdjItIXGedRDtpiivJM06BAx7vP0I=.c5417544-edef-4623-beaa-08cd7c565361@github.com> Message-ID: On Tue, 15 Oct 2024 16:39:57 GMT, Paul Sandoz wrote: > > Hi @vnkozlov , Can you kindly run this through your test infrastructure. We have two review approvals for Java and x86 backend code. > > I have kicked off some internal tests (FYI @vnkozlov) Tier 1 to 3 test past, except for the trivial source compilation error previously mentioned. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2415124207 From psandoz at openjdk.org Tue Oct 15 21:40:20 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 21:40:20 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v24] In-Reply-To: <704HcEgAdeR1380vEK4z0bG0KiJ1kjRVSBCa9EFrt0w=.bee85693-033c-4d85-9f89-3e186ca3c2fb@github.com> References: <704HcEgAdeR1380vEK4z0bG0KiJ1kjRVSBCa9EFrt0w=.bee85693-033c-4d85-9f89-3e186ca3c2fb@github.com> Message-ID: On Tue, 15 Oct 2024 10:19:46 GMT, Emanuel Peter wrote: > Are there any IR rules that verify that the correct C2 nodes are used? Is that a thing you generally do with the VectorAPI, just to make sure things get correctly intrinsified? Not systematically. We have some IR testing for more complex areas, located under `test/hotspot/jtreg/compiler/vectorapi/`. When we started out testing there was no IR testing framework so we relied on classic unit tests running a test N times for C2 to kick in. That is still the case for the majority of tests. It would be nice to have a better balance, and a way to systematically generate IR tests for the various vector operations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2415212261 From dlong at openjdk.org Wed Oct 16 01:33:30 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Oct 2024 01:33:30 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v6] In-Reply-To: References: Message-ID: <-JXW7rxwUFheUwXdmlnVo_MhlJDct8NlANLLBE4Triw=.1227da53-0ff8-4ab3-ae6a-33f1ed904755@github.com> > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request incrementally with one additional commit since the last revision: bail out on old methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21148/files - new: https://git.openjdk.org/jdk/pull/21148/files/2c7fc099..701373f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=04-05 Stats: 24 lines in 7 files changed: 15 ins; 2 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From sviswanathan at openjdk.org Wed Oct 16 01:39:50 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Oct 2024 01:39:50 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v3] In-Reply-To: References: Message-ID: > When Float.floatToFloat16 is vectorized using a 2-element vector width due to dependencies, we incorrectly generate a 4-element vcvtps2ph with memory as destination storing 8 bytes instead of desired 4 bytes. This issue is fixed in this PR by limiting the memory version of match rule to 4-element vector and above. > Also a regression test case is added accordingly. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Run test on all platforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21480/files - new: https://git.openjdk.org/jdk/pull/21480/files/ed299327..f2981374 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21480&range=01-02 Stats: 11 lines in 2 files changed: 5 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21480/head:pull/21480 PR: https://git.openjdk.org/jdk/pull/21480 From sviswanathan at openjdk.org Wed Oct 16 01:39:50 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Oct 2024 01:39:50 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> Message-ID: On Tue, 15 Oct 2024 07:02:00 GMT, Emanuel Peter wrote: > Thanks for the updates! It looks good to me now. > > I have one more wish: Could you allow to run the test on all platforms please? `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java` > > Currently, it only runs on selected platforms, see `@requires`. We should really run this on all platforms and compilers. The IR rules can be limited to platforms and features. That ensures that we do not have similar bugs elsewhere, and our test-coverage would be increased. @eme64 I have attempted to update the test accordingly. Please take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21480#issuecomment-2415546350 From sviswanathan at openjdk.org Wed Oct 16 01:39:51 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Oct 2024 01:39:51 GMT Subject: RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2] In-Reply-To: References: <0QtKOzkzdn22Pf35tlXM-sei7oOsFg9kHxeoUckzm30=.067f913d-e455-4c20-8382-f96b7327cfd4@github.com> Message-ID: <-5pIV32xfg2DMexItjaDQWkkTK4FVIfbB7G73LKFoxA=.1a7c8ada-6641-486f-ba83-6b9f5b5eb7ec@github.com> On Tue, 15 Oct 2024 08:08:59 GMT, Jatin Bhateja wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test case > > src/hotspot/cpu/x86/x86.ad line 3679: > >> 3677: >> 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{ >> 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16); > > You can add an eligant prediction check like following instead of accesing bare inputs. > > n->as_StoreVector()->memory_size() >= 16. We have used bare inputs at many places in the ad file in the predicate. > test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVector.java line 110: > >> 108: } >> 109: >> 110: // Verifying the result > > Since we are using IR framework, we can leverage existing[ @Check](https://github.com/openjdk/jdk/blob/521effe017b9b6322036f1851220056a637d6b1c/test/hotspot/jtreg/compiler/lib/ir_framework/Check.java#L32) annotation for verification which works in conjunction with @Test method, it will automatically invoke validation after test method execution. We may need little refactoring for this. The added test follows the verification mechanism used already in the test. I would prefer not to get into refactoring. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1802214527 PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1802214007 From dlong at openjdk.org Wed Oct 16 01:44:42 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 16 Oct 2024 01:44:42 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v7] In-Reply-To: References: Message-ID: