From amitkumar at openjdk.org Mon Sep 2 10:33:23 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 2 Sep 2024 10:33:23 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 20:11:06 GMT, Coleen Phillimore wrote: >> Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix jvmci code. >> - Some C2 refactoring. >> - Assembly corrections from Matias and Dean. > > Thanks Chris and Matias for reviewing parts of this. Hi @coleenp, I got this error while build on my Mac-M1, did you see something like this ? : ERROR: Failed to generate link optimization data. This is likely a problem with the newly built JVM/JDK. # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/Users/amitkumar/jdk/src/hotspot/share/oops/klassFlags.hpp:72), pid=57968, tid=10499 # assert(!is_value_based_class()) failed: set once # # JRE version: (24.0) (fastdebug build ) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.amitkumar.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /Users/amitkumar/jdk/make/hs_err_pid57968.log # # These are the changes with which I am build the JVM. I can reproduce it on my s390x-machine as well. diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp index d442894798b..fbd1a2b3281 100644 --- a/src/hotspot/share/runtime/globals.hpp +++ b/src/hotspot/share/runtime/globals.hpp @@ -170,7 +170,7 @@ const int ObjectAlignmentInBytes = 8; product(bool, AlwaysSafeConstructors, false, EXPERIMENTAL, \ "Force safe construction, as if all fields are final.") \ \ - product(bool, UnlockDiagnosticVMOptions, trueInDebug, DIAGNOSTIC, \ + product(bool, UnlockDiagnosticVMOptions, true, DIAGNOSTIC, \ "Enable normal processing of flags relating to field diagnostics")\ \ product(bool, UnlockExperimentalVMOptions, false, EXPERIMENTAL, \ @@ -819,7 +819,7 @@ const int ObjectAlignmentInBytes = 8; product(bool, RestrictContended, true, \ "Restrict @Contended to trusted classes") \ \ - product(int, DiagnoseSyncOnValueBasedClasses, 0, DIAGNOSTIC, \ + product(int, DiagnoseSyncOnValueBasedClasses, 1, DIAGNOSTIC, \ "Detect and take action upon identifying synchronization on " \ "value based classes. Modes: " \ "0: off; " \ ------------- PR Comment: https://git.openjdk.org/jdk/pull/20719#issuecomment-2324389377 From duke at openjdk.org Tue Sep 3 07:40:29 2024 From: duke at openjdk.org (Francesco Nigro) Date: Tue, 3 Sep 2024 07:40:29 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Tue, 27 Aug 2024 17:09:18 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Working on it @galderz in the benchmark did you collected the mispredicts/branches? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2325808756 From thartmann at openjdk.org Tue Sep 3 08:06:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Sep 2024 08:06:29 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v5] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 21:51:49 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add parameters and rename generate_klass_flags_guard. The JIT changes look good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2276697701 From yzheng at openjdk.org Tue Sep 3 08:47:22 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 3 Sep 2024 08:47:22 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v5] In-Reply-To: References: Message-ID: <4a2yapux5-GcnB8oTuHmp2_rsqfLCsexWN9ifNzEjwg=.7594c3dd-6a33-434f-b57a-2105cb9f5c65@github.com> On Fri, 30 Aug 2024 21:51:49 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add parameters and rename generate_klass_flags_guard. JVMCI changes look good to me! ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2276790447 From yzheng at openjdk.org Tue Sep 3 08:47:23 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 3 Sep 2024 08:47:23 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 20:23:54 GMT, Coleen Phillimore wrote: >> I don't think the JVMCI knows about the type KlassFlags - I used the same code that I used for InstanceKlass::_misc_flags._flags (see above this). > > I made the change to refactor the getMiscFlags function, but if you want to add knowledge of the KlassFlags class (and InstanceKlassFlags also), you could do that separately from this PR. I think JVMCI already knows these type via the objArrayKlass import, as it knows about KlassFlags. I will open another PR for refactoring these and other things unrelated to this PR in `HotSpotVMConfig` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1741669853 From epeter at openjdk.org Tue Sep 3 12:15:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 12:15:32 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:02:46 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/hotspot/cpu/x86/x86.ad line 10490: > >> 10488: >> 10489: >> 10490: instruct selectFromTwoVec_evex(vec dst, vec src1, vec src2) > > You could rename `dst` -> `mask_and_dst`. That would maybe help the reader to more quickly know that it is an input-mask and output-dst. Also, for consistency, I would write out the name `selectFromTwoVector(s)_evex` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741772354 From epeter at openjdk.org Tue Sep 3 12:15:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 12:15:31 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> On Thu, 29 Aug 2024 05:42:58 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding descriptive comments Ok, I left a few more comments. Generally, this looks like a nice feature, thanks for implementing it @jatin-bhateja ! ? A few issues with code style (camelCase vs snake_case). I'm also wondering about good naming. Why did we/you chose "select" for this? Why not "shuffle"? Does "select" not often get used as synonym of "blend", which has different semantics? Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in `RearrangeNode::Ideal`. It looks a little "hacky", especially in conjunction with the `vector_indexes_needs_massaging` method. Can you give a clear definition of the semantics of `RearrangeNode` and `vector_indexes_needs_massaging`, please? I also added some control questions for testing. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6446: > 6444: } > 6445: > 6446: void C2_MacroAssembler::select_from_two_vector_evex(BasicType elem_bt, XMMRegister dst, XMMRegister src1, I also wonder if you could use the plural in these cases? You are selecting from two vectors, with the plural "s". Of course it is a bit annoying if you would have to name the IR node `SelectFromTwoVectors`, because we usually name the vector nodes `...Vector`, without the plural "s". src/hotspot/share/opto/library_call.cpp line 749: > 747: return inline_vector_compress_expand(); > 748: case vmIntrinsics::_VectorSelectFromTwoVectorOp: > 749: return inline_vector_select_from_two_vectors(); Interesting, here you use the correct plural "vectors". src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 544: > 542: byte[] vpayload1 = ((ByteVector)v1).vec(); > 543: byte[] vpayload2 = ((ByteVector)v2).vec(); > 544: byte[] vpayload3 = ((ByteVector)v3).vec(); Is there a reason you are not using more descriptive names here instead of `vpayload1`? I also wonder if the `selectFromHelper` should not be named more specifically: `selectFromTwoVector(s)Helper`? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2595: > 2593: @ForceInline > 2594: final ByteVector selectFromTemplate(ByteVector v1, ByteVector v2) { > 2595: int twovectorlen = length() * 2; `twovectorlen` -> `twoVectorLen` I think in Java we are supposed to use camelCase src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2770: > 2768: > 2769: /** > 2770: * Rearranges the lane elements of two vectors, selecting lanes I have a bit of a name concern here. Why are we calling it "select" and not "rearrange"? Because for a single "from" vector we also call it "rearrange", right? Is "select" not often synonymous to "blend", which works also with two "from" vectors, but with a mask and not indexing for "selection/rearranging"? test/jdk/jdk/incubator/vector/Byte128VectorTests.java line 324: > 322: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 323: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 324: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx])); I thought general Java style is camelCase? Is that not followed in the VectorAPI code? test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 1048: > 1046: return SHORT_GENERATOR_SELECT_FROM_TRIPLES.stream().map(List::toArray). > 1047: toArray(Object[][]::new); > 1048: } Just a control question: does this also occasionally generate examples with out-of-bounds indices? Negative out of bounds and positive out of bounds? test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 5812: > 5810: ShortVector bv = ShortVector.fromArray(SPECIES, b, i); > 5811: ShortVector idxv = ShortVector.fromArray(SPECIES, idx, i); > 5812: idxv.selectFrom(av, bv).intoArray(r, i); Would this test catch a bug where the backend would generate vectors that are too long or too short? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2276944129 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741766060 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741773766 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741914524 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741911809 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741919025 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741920940 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741947885 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741949290 From coleenp at openjdk.org Tue Sep 3 12:33:47 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 12:33:47 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v6] In-Reply-To: References: Message-ID: > Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. > > Tested with tier1-7. > > NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: - Remove unused function declaration. - Add parameters and rename generate_klass_flags_guard. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20719/files - new: https://git.openjdk.org/jdk/pull/20719/files/4c3a04dc..79c35f7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=04-05 Stats: 5 lines in 2 files changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20719/head:pull/20719 PR: https://git.openjdk.org/jdk/pull/20719 From coleenp at openjdk.org Tue Sep 3 12:33:48 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 12:33:48 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: References: Message-ID: <7zEKCE09_TD7dLWKVs0aB_8i7Dbh7T6Gqhx5TT2z628=.76734232-80a1-4d2b-af5b-6850351c8af3@github.com> On Mon, 2 Sep 2024 10:30:25 GMT, Amit Kumar wrote: >> Thanks Chris and Matias for reviewing parts of this. > > Hi @coleenp, > > I got this error while build on my Mac-M1, did you see something like this ? : > > ERROR: Failed to generate link optimization data. This is likely a problem with the newly built JVM/JDK. > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/Users/amitkumar/jdk/src/hotspot/share/oops/klassFlags.hpp:72), pid=57968, tid=10499 > # assert(!is_value_based_class()) failed: set once > # > # JRE version: (24.0) (fastdebug build ) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.amitkumar.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /Users/amitkumar/jdk/make/hs_err_pid57968.log > # > # > > > These are the changes with which I am build the JVM. I can reproduce it on my s390x-machine as well. > > diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp > index d442894798b..fbd1a2b3281 100644 > --- a/src/hotspot/share/runtime/globals.hpp > +++ b/src/hotspot/share/runtime/globals.hpp > @@ -170,7 +170,7 @@ const int ObjectAlignmentInBytes = 8; > product(bool, AlwaysSafeConstructors, false, EXPERIMENTAL, \ > "Force safe construction, as if all fields are final.") \ > \ > - product(bool, UnlockDiagnosticVMOptions, trueInDebug, DIAGNOSTIC, \ > + product(bool, UnlockDiagnosticVMOptions, true, DIAGNOSTIC, \ > "Enable normal processing of flags relating to field diagnostics")\ > \ > product(bool, UnlockExperimentalVMOptions, false, EXPERIMENTAL, \ > @@ -819,7 +819,7 @@ const int ObjectAlignmentInBytes = 8; > product(bool, RestrictContended, true, \ > "Restrict @Contended to trusted classes") \ > \ > - product(int, DiagnoseSyncOnValueBasedClasses, 0, DIAGNOSTIC, \ > + product(int, DiagnoseSyncOnValueBasedClasses, 1, DIAGNOSTIC, \ > "Detect and take action upon identifying synchronization on " \ > "value based classes. Modes: " ... @offamitkumar Thank you for finding this bug. These flags have asserts that they're only set once, but CDS restores sets the value for this flag. Since it was set when dumping the archive, it resets it, which is okay in this case. I have a fix for this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20719#issuecomment-2326395437 From coleenp at openjdk.org Tue Sep 3 12:33:48 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 12:33:48 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v5] In-Reply-To: References: Message-ID: <5-nmLoyCJm-LswMCIwx8ieuePekAgdTJbq1mCuDCpS8=.6a6ace3b-24e3-4dbb-ab76-f4d02f3802bf@github.com> On Sat, 31 Aug 2024 10:22:29 GMT, ExE Boss wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add parameters and rename generate_klass_flags_guard. > > src/hotspot/share/opto/library_call.hpp line 161: > >> 159: Node* generate_mods_flags_guard(Node* kls, >> 160: int modifier_mask, int modifier_bits, >> 161: RegionNode* region); > > This?method was?removed. > > Suggestion: Thank you for noticing this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1741975860 From coleenp at openjdk.org Tue Sep 3 12:43:23 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 12:43:23 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 08:42:23 GMT, Yudi Zheng wrote: >> I made the change to refactor the getMiscFlags function, but if you want to add knowledge of the KlassFlags class (and InstanceKlassFlags also), you could do that separately from this PR. > > I think JVMCI already knows these type via the objArrayKlass import, as it knows about KlassFlags. I will open another PR for refactoring these and other things unrelated to this PR in `HotSpotVMConfig` Ok, yes, thanks for opening a new issue for this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1741993231 From coleenp at openjdk.org Tue Sep 3 12:43:24 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 12:43:24 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: References: <-dyvOdrDMU8UERNjLmg8NhFNta6ukiqRXrM1oJvyzc4=.f9f80021-8b7a-425f-807a-89b7dab293dc@github.com> Message-ID: On Fri, 30 Aug 2024 22:25:54 GMT, Dean Long wrote: >> Really, this is better? it adds three parameters. I made this change. > > It reduces duplicate code, which is usually good. Yes, I like it better. Ok! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1741992407 From darcy at openjdk.org Tue Sep 3 22:58:22 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 3 Sep 2024 22:58:22 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 20:26:05 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Add stub initialization and extra tanh tests test/jdk/java/lang/Math/HyperbolicTests.java line 984: > 982: double b1 = 0.02; > 983: double b2 = 5.1; > 984: double b3 = 55 * Math.log(2)/2; // ~19.062 Probably better to use StrictMath.log here or, better use, precompute the value as a constant and document its conceptual origin. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1742790463 From darcy at openjdk.org Wed Sep 4 00:03:20 2024 From: darcy at openjdk.org (Joe Darcy) Date: Wed, 4 Sep 2024 00:03:20 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 20:26:05 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Add stub initialization and extra tanh tests test/jdk/java/lang/Math/HyperbolicTests.java line 1009: > 1007: for(int i = 0; i < testCases.length; i++) { > 1008: double testCase = testCases[i]; > 1009: failures += testTanhWithReferenceUlpDiff(testCase, StrictMath.tanh(testCase), 2.5); The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). If the test is going to use randomness, then its jtreg tags should include `@key randomness` and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1742826418 From jbhateja at openjdk.org Wed Sep 4 02:00:25 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Sep 2024 02:00:25 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: <8CAXws7Rp6HKERu5hSTOrXi8GRFRdV4I670Nf8NSZlI=.ba6acccb-77e5-46a6-bec2-e0ea97dfe85d@github.com> Message-ID: <2nRoXBr_v8DjlG4wJlWF9OhYMmgpTUDX6VAQnvO3DCY=.596e5e39-c5ba-4d20-b5e0-aa301f7c9d76@github.com> On Tue, 27 Aug 2024 22:23:44 GMT, Srinivas Vamsi Parasa wrote: >> I agree, this is all rather obscure. Ideally the same names that are used in wherever this comes from. >> >> Where does the algorithm come from? What are its accuracy guarantees? >> >> In addition, given the rarity of hyperbolic tangents in Java applications, do we need this? > > @theRealAph, this implementation is based on Intel libm math library and meets the accuracy requirements. The algorithm is provided in the comments. @vamsi-parasa don't hesitate in adding as much and explicit information about the original source from where the algorithm has been picked up, even though the PR explicitly mentions libm. Adding the link to source references is a good practice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1742887385 From matsaave at openjdk.org Wed Sep 4 15:37:24 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 4 Sep 2024 15:37:24 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v6] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 12:33:47 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - Remove unused function declaration. > - Add parameters and rename generate_klass_flags_guard. Updates look good! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2280590336 From coleenp at openjdk.org Wed Sep 4 15:51:32 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 4 Sep 2024 15:51:32 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v6] In-Reply-To: References: Message-ID: <4l8wz-JmihCu8GfhNpa9n9zmFL8kgUohfFIiiFDzRdA=.d6e6bed5-6080-4584-863f-6846b1a65f3c@github.com> On Tue, 3 Sep 2024 12:33:47 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - Remove unused function declaration. > - Add parameters and rename generate_klass_flags_guard. Thank you for reviewing, Dean, Tobias, Amit, Exe-Boss and reviewing the updates also Matias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20719#issuecomment-2329419116 From coleenp at openjdk.org Wed Sep 4 15:51:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 4 Sep 2024 15:51:33 GMT Subject: Integrated: 8339112: Move JVM Klass flags out of AccessFlags In-Reply-To: References: Message-ID: <65Xo_ExETiiaZqM_SqeQ-ZOd2A6tyYUDvq5x15xLQhs=.cd87cc4a-e5c7-451c-bf06-f0ba00ad4326@github.com> On Mon, 26 Aug 2024 23:54:22 GMT, Coleen Phillimore wrote: > Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. > > Tested with tier1-7. > > NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. This pull request has now been integrated. Changeset: 0cfd08f5 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/0cfd08f55aa166dc3f027887c886fa0b40a2ca21 Stats: 329 lines in 53 files changed: 163 ins; 51 del; 115 mod 8339112: Move JVM Klass flags out of AccessFlags Reviewed-by: matsaave, cjplummer, dlong, thartmann, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/20719 From duke at openjdk.org Thu Sep 5 19:10:34 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 5 Sep 2024 19:10:34 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v3] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: update libm tanh reference test with code review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/4739ad45..39350a37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=01-02 Stats: 7 lines in 1 file changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Thu Sep 5 19:10:34 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 5 Sep 2024 19:10:34 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 22:55:18 GMT, Joe Darcy wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add stub initialization and extra tanh tests > > test/jdk/java/lang/Math/HyperbolicTests.java line 984: > >> 982: double b1 = 0.02; >> 983: double b2 = 5.1; >> 984: double b3 = 55 * Math.log(2)/2; // ~19.062 > > Probably better to use StrictMath.log here or, better use, precompute the value as a constant and document its conceptual origin. Please see the updated code which uses the precomputed value of `b3`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1746031432 From duke at openjdk.org Thu Sep 5 19:12:55 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 5 Sep 2024 19:12:55 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 00:01:09 GMT, Joe Darcy wrote: > If the test is going to use randomness, then its jtreg tags should include > > `@key randomness` > > and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. Please see the test updated to use `@key randomness` and` jdk.test.lib.RandomFactory` to get and Random object. > The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. > For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). > So far the tests haven't failed with error of 2.5ulp. Would it be better to make it 5ulp? Please let me know. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1746034895 From jbhateja at openjdk.org Fri Sep 6 18:13:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 18:13:35 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:40:35 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/hotspot/share/opto/vectornode.cpp line 2159: > >> 2157: >> 2158: vmask_type = TypeVect::makemask(elem_bt, num_elem); >> 2159: mask = phase->transform(new VectorMaskCastNode(mask, vmask_type)); > > I would just have two variables, and not overwrite it: `integral_vmask_type` and `vmask_type`. Maybe also `mask` could be split into two variables? I think the variable names are appropriate and in accordance with convention. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2770: > >> 2768: >> 2769: /** >> 2770: * Rearranges the lane elements of two vectors, selecting lanes > > I have a bit of a name concern here. Why are we calling it "select" and not "rearrange"? Because for a single "from" vector we also call it "rearrange", right? Is "select" not often synonymous to "blend", which works also with two "from" vectors, but with a mask and not indexing for "selection/rearranging"? We already have another flavor of [selectFrom](https://docs.oracle.com/en/java/javase/22/docs/api/jdk.incubator.vector/jdk/incubator/vector/Vector.html#selectFrom(jdk.incubator.vector.Vector)) which permutes single vector, new API extents its semantics to two vector selection, so we kept the nomenclature consistent. > test/jdk/jdk/incubator/vector/Byte128VectorTests.java line 324: > >> 322: boolean is_exceptional_idx = (int)order[idx] >= vector_len; >> 323: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; >> 324: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx])); > > I thought general Java style is camelCase? Is that not followed in the VectorAPI code? I agree, but somehow we are using non camelCase conventions in this file, look for uses of 'vector_len'. just preserving file level convention. > test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 1048: > >> 1046: return SHORT_GENERATOR_SELECT_FROM_TRIPLES.stream().map(List::toArray). >> 1047: toArray(Object[][]::new); >> 1048: } > > Just a control question: does this also occasionally generate examples with out-of-bounds indices? Negative out of bounds and positive out of bounds? Original API did throw IndexOutOfBoundsException, but later on we have moved away from exception throwing semantics to wrapping semantics. Please find details at following comment https://github.com/openjdk/jdk/pull/20508#issuecomment-2306344606 > test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 5812: > >> 5810: ShortVector bv = ShortVector.fromArray(SPECIES, b, i); >> 5811: ShortVector idxv = ShortVector.fromArray(SPECIES, idx, i); >> 5812: idxv.selectFrom(av, bv).intoArray(r, i); > > Would this test catch a bug where the backend would generate vectors that are too long or too short? Existing vectorAPI inline expansion entry points explicitly pass lane type and count as intrinsic arguments, this is used to create concrete ideal vector types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532692 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532456 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532419 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532340 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532307 From jbhateja at openjdk.org Fri Sep 6 18:13:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 18:13:35 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:57:31 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2183: >> >>> 2181: }; >>> 2182: // Targets emulating unsupported permutation for certain vector types >>> 2183: // may need to message the indexes to match the users intent. >> >> Suggestion: >> >> // may need to massage the indexes to match the users intent. > > This optimization for now seems quite specific to your `SelectFromTwoVectorNode::Ideal` lowering code. Can this conversion not be done there already? > > What is the semantics of `VectorRearrangeNode`? Should its shuffle vector always be bytes, and we now violated that "for a quick second"? Or is it going to be generally the idea to create all sorts of shuffle types and then fix that up? But then why do we need the `vector_indexes_needs_massaging`? > > Can you help me understand the concept/strategy behind this? Ok, IIRC variable index permutation instruction on every target expects shape conformance b/w data vector and permute index vector. Rearrange expects indices to be passed throug shuffle, idealization routines automatically injects a VectorLoadShuffle after loading indexes held in shuffle's backing storage i.e. a byte array. In all the cases apart from byte vector permute , VectorLoadShuffle expands the index byte lanes to match the data vector lane. So we always end up emitting a lane expansion instruction before permute instruction (scenario 1). Apart from usual expansions VectorLoadShuffle may also do additional magic for some targets where it may need to prune / massage the index vector if target does not support destination vector type (scenario 2). For our case, new selectFrom accepts the indices though vectors which save redundant expansions, but to leverage existing backend support for scenario 2 we do target specific pruning ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532612 From jbhateja at openjdk.org Fri Sep 6 18:13:34 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 18:13:34 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/8d71f175..d3ee3104 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=06-07 Stats: 115 lines in 18 files changed: 12 ins; 15 del; 88 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From sviswanathan at openjdk.org Fri Sep 6 21:45:15 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 21:45:15 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v3] In-Reply-To: <2nRoXBr_v8DjlG4wJlWF9OhYMmgpTUDX6VAQnvO3DCY=.596e5e39-c5ba-4d20-b5e0-aa301f7c9d76@github.com> References: <8CAXws7Rp6HKERu5hSTOrXi8GRFRdV4I670Nf8NSZlI=.ba6acccb-77e5-46a6-bec2-e0ea97dfe85d@github.com> <2nRoXBr_v8DjlG4wJlWF9OhYMmgpTUDX6VAQnvO3DCY=.596e5e39-c5ba-4d20-b5e0-aa301f7c9d76@github.com> Message-ID: On Wed, 4 Sep 2024 01:57:42 GMT, Jatin Bhateja wrote: >> @theRealAph, this implementation is based on Intel libm math library and meets the accuracy requirements. The algorithm is provided in the comments. > > @vamsi-parasa don't hesitate in adding as much and explicit information about the original source from where the algorithm has been picked up, even though the PR explicitly mentions libm. Adding the link to source references is a good practice. @jatin-bhateja This is based on Intel internal LIBM sources and so there is no public link available. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1747726562 From sviswanathan at openjdk.org Fri Sep 6 21:45:16 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 21:45:16 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v3] In-Reply-To: References: <8CAXws7Rp6HKERu5hSTOrXi8GRFRdV4I670Nf8NSZlI=.ba6acccb-77e5-46a6-bec2-e0ea97dfe85d@github.com> <2nRoXBr_v8DjlG4wJlWF9OhYMmgpTUDX6VAQnvO3DCY=.596e5e39-c5ba-4d20-b5e0-aa301f7c9d76@github.com> Message-ID: On Fri, 6 Sep 2024 21:15:07 GMT, Sandhya Viswanathan wrote: >> @vamsi-parasa don't hesitate in adding as much and explicit information about the original source from where the algorithm has been picked up, even though the PR explicitly mentions libm. Adding the link to source references is a good practice. > > @jatin-bhateja This is based on Intel internal LIBM sources and so there is no public link available. > Do you have a copy of this information? Should it be in the commit? @theRealAph The accuracy of standard (non fast mode) LIBM functions ensures errors of < 1 ulp. LIBM is part of Intel C++ compiler. The documentation can be found here: https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-8/programming-tradeoffs-floating-point-applications.html. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1747742612 From galder at openjdk.org Mon Sep 9 05:10:07 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 9 Sep 2024 05:10:07 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Tue, 3 Sep 2024 07:37:33 GMT, Francesco Nigro wrote: >> Working on it > > @galderz in the benchmark did you collected the mispredicts/branches? @franz1981 No I hadn't done so until now, but I will be tracking those more closely. Context: I have been running some reduction JMH benchmarks and I could see a big drop in non AVX-512 performance compared to the unpatched code. E.g. @Benchmark public long reductionSingleLongMax() { long result = 0; for (int i = 0; i < size; i++) { final long v = 11 * aLong[i]; result = Math.max(result, v); } return result; } This is caused by keeping the Max/Min nodes in the IR, which get translated into `cmpq+cmovlq` instructions (via the macro expansion). The code gets unrolled but a dependency chain on the current max value. In the unpatched code the intrinsic does not kick in and uses a standard ternary operation, which gets translated into a normal control flow. The system is able to handle this better due to branch prediction. @franz1981's comment is precisely about this. I need to enhance the benchmark to control the branchiness of the test (e.g. how often it goes one side or the other of a max/min call) and measure the mispredictions and branches...etc. FYI: A similar situation can be replicated with reduction benchmarks that use max/min integer, but for the code to fallback into `cmov`, both AVX and SSE have be turned off. I also need to see what the performance looks on like on a system with AVX-512, and also look at how non-reduction JMH benchmarks behave on systems with/without AVX-512. Finally, I'm also looking at an experiment to see what would happen in cmovl was implemented with branch+mov instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2337131179 From rkennke at openjdk.org Mon Sep 9 10:29:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 10:29:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Fix compiler/c2/irTests/TestPadding.java for +COH - Simplify arrayOopDesc::length_offset_in_bytes and oopDesc::base_offset_in_bytes - Nit in header_size - GC code tweaks - Fix runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java - Fix jdk/tools/jlink/plugins/CDSPluginTest.java - Cleanup markWord bits and comments - x86_64: Fix loadNKlassCompactHeaders - aarch64: Fix loadNKlassCompactHeaders - Use FLAG_SET_ERGO when turning off UseCompactObjectHeaders - ... and 16 more: https://git.openjdk.org/jdk/compare/b45fe174...49126383 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=06 Stats: 4465 lines in 189 files changed: 3175 ins; 678 del; 612 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Mon Sep 9 11:55:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 11:55:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Try to avoid lea in loadNklass (aarch64) - Fix release build error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/49126383..70f492d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=06-07 Stats: 24 lines in 5 files changed: 12 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From tschatzl at openjdk.org Mon Sep 9 12:40:13 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:40:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 10:29:55 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Fix compiler/c2/irTests/TestPadding.java for +COH > - Simplify arrayOopDesc::length_offset_in_bytes and oopDesc::base_offset_in_bytes > - Nit in header_size > - GC code tweaks > - Fix runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java > - Fix jdk/tools/jlink/plugins/CDSPluginTest.java > - Cleanup markWord bits and comments > - x86_64: Fix loadNKlassCompactHeaders > - aarch64: Fix loadNKlassCompactHeaders > - Use FLAG_SET_ERGO when turning off UseCompactObjectHeaders > - ... and 16 more: https://git.openjdk.org/jdk/compare/b45fe174...49126383 src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 481: > 479: Klass* klass = UseCompactObjectHeaders > 480: ? old_mark.klass() > 481: : old->klass(); To be exact "promotion" only refers to copying to an older generation, so this comment does not cover objects copied within the generation. Suggestion: // NOTE: With compact headers, it is not safe to load the Klass* from old, because // that would access the mark-word, that might change at any time by concurrent // workers. // This mark word would refer to a forwardee, which may not yet have completed // copying. Therefore we must load the Klass* from the mark-word that we already // loaded. This is safe, because we only enter here if not yet forwarded. src/hotspot/share/gc/parallel/mutableSpace.cpp line 225: > 223: // header-based forwarding during promotion. Full GC doesn't > 224: // use the object header for forwarding at all. > 225: p += obj->forwardee()->size(); Better use `!obj->is_self_forwarded()` here. src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 174: > 172: // may not yet have completed copying. Therefore we must load the Klass* from > 173: // the mark-word that we have already loaded. This is safe, because we have checked > 174: // that this is not yet forwarded in the caller.) Same adjustment needed as for G1. src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 711: > 709: // 8 - 32-bit VM > 710: // 12 - 64-bit VM, compressed klass > 711: // 16 - 64-bit VM, normal klass The comment needs to be adapted to include the case for compact object headers. src/hotspot/share/oops/arrayOop.hpp line 83: > 81: // The _length field is not declared in C++. It is allocated after the > 82: // declared nonstatic fields in arrayOopDesc if not compressed, otherwise > 83: // it occupies the second half of the _klass field in oopDesc. Needs update. src/hotspot/share/oops/instanceOop.hpp line 36: > 34: class instanceOopDesc : public oopDesc { > 35: public: > 36: // If compressed, the offset of the fields of the instance may not be aligned. Needs fixing (or removal) wrt to compact object headers, or move to the particular case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750046114 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750056160 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750074607 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750080552 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750027009 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750116336 From tschatzl at openjdk.org Mon Sep 9 12:40:14 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:40:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/gc/shared/collectedHeap.cpp line 232: > 230: } > 231: > 232: // With compact headers, we can't safely access the class, due Suggestion: // With compact headers, we can't safely access the klass, due This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable? Given this is used for verification only afaik, we should make an effort to provide that check. src/hotspot/share/gc/shared/gcForwarding.hpp line 34: > 32: > 33: /* > 34: * Implements forwarding for the full-GCs of Serial, Parallel, G1 and Shenandoah in Suggestion: * Implements forwarding for the Full GCs of Serial, Parallel, G1 and Shenandoah in src/hotspot/share/gc/shared/gcForwarding.hpp line 41: > 39: * bits (to indicate 'forwarded' state as usual). > 40: */ > 41: class GCForwarding : public AllStatic { Since this class is only used for Full GCs, it may be useful to include that information, i.e. something like `FullGCForwarding` to avoid confusion why it is not used for other GCs too. (Unless this has been discussed and even rejected by me before). src/hotspot/share/oops/compressedKlass.hpp line 43: > 41: > 42: // Tiny-class-pointer mode > 43: static int _tiny_cp; // -1, 0=true, 1=false Suggestion: static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749995275 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749980748 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749987945 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749969456 From tschatzl at openjdk.org Mon Sep 9 12:40:18 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:40:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Try to avoid lea in loadNklass (aarch64) > - Fix release build error src/hotspot/share/oops/klass.hpp line 169: > 167: // contention that may happen when a nearby object is modified. > 168: AccessFlags _access_flags; // Access flags. The class/interface distinction is stored here. > 169: // Some flags created by the JVM, not in the class file itself, Suggestion: markWord _prototype_header; // Used to initialize objects' header with compact headers. Maybe some comment why this is an instance member. src/hotspot/share/oops/objArrayKlass.inline.hpp line 74: > 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { > 73: // In this assert, we cannot safely access the Klass* with compact headers. > 74: assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array"); If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe? src/hotspot/share/oops/oop.cpp line 157: > 155: bool oopDesc::has_klass_gap() { > 156: // Only has a klass gap when compressed class pointers are used. > 157: // Except when using compact headers. Suggestion: // Only has a klass gap when compressed class pointers are used and not // using compact headers. (Not sure if repeating the fairly simple disjunction below makes sense, but there has been a comment before too) src/hotspot/share/oops/oop.cpp line 230: > 228: // disjunct below to fail if the two comparands are computed across such > 229: // a concurrent change. > 230: return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC); Is this still true after the recent changes like JDK-8311163? It might be worth waiting for. src/hotspot/share/oops/oop.hpp line 103: > 101: static inline void set_klass_gap(HeapWord* mem, int z); > 102: > 103: // size of object header, aligned to platform wordSize Suggestion: // Size of object header, aligned to platform wordSize Pre-existing src/hotspot/share/oops/oop.hpp line 108: > 106: return sizeof(markWord) / HeapWordSize; > 107: } else { > 108: return sizeof(oopDesc) / HeapWordSize; Suggestion: return sizeof(oopDesc) / HeapWordSize; src/hotspot/share/oops/oop.hpp line 134: > 132: inline Klass* forward_safe_klass(markWord m) const; > 133: inline size_t forward_safe_size(); > 134: inline void forward_safe_init_mark(); Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them. Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe". src/hotspot/share/oops/oop.hpp line 295: > 293: // this call returns null for that thread; any other thread has the > 294: // value of the forwarding pointer returned and does not modify "this". > 295: inline oop forward_to_atomic(oop p, markWord compare, atomic_memory_order order = memory_order_conservative); Maybe add an assert in the implementation so that it is not used for self-forwarding. Same for `forward_to`. src/hotspot/share/oops/oop.hpp line 356: > 354: return mark_offset_in_bytes() + sizeof(markWord) / 2; > 355: } else > 356: #endif Maybe instead of trying to calculate some random, meaningless value just use some "random" value directly? I am fine with the existing code, but first stating directly that "any value" works here, this additional code seems to confuse the message. (Fwiw, the method is also used during Universe initialization). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750118470 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750143956 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750145460 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750150640 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750154114 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750153663 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750157781 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750159516 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750163768 From tschatzl at openjdk.org Mon Sep 9 12:45:07 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:45:07 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Try to avoid lea in loadNklass (aarch64) > - Fix release build error Only looked at GC and runtime changes, only very briefly at compiler stuff. Only looked at GC and runtime changes, only very briefly at compiler stuff. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2289786482 PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2289800458 From rkennke at openjdk.org Mon Sep 9 12:52:07 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 12:52:07 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 18:10:44 GMT, Albert Mingkun Yang wrote: >> FWIW, the ParallelGC does something very similar to what you propose, except that it walks bitmaps instead of paring the space to find the self-forwarded objects. It then has a check inside object_iterate to make sure that it doesn't expose the dead objects (in eden and the from space) to heap dumpers and histogram printers. >> >> Because of the the code above, the SerialGC clears away the information about what objects are dead in eden and the from space, so heap dumpers and histogram printers will include these dead objects. We might want to fix that as a future RFE. > >> If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. > > True; need to do sth like `obj->init_mark();` for the non-self-forwarded case. The postcondition is that no forwarded objs in eden/from. ParallelGC actually doesn't use bitmaps, it pushes all forwarded objs to preserved-marks-table, and uses that to find forwarded objects, which is why we can't remove the preserved-marks table in ParallelGC (IOW, after this patch, the preserved-marks-stuff in Parallel scavenger is *only* used to find forwarded objects. We might want to think about more efficient solutions for this). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750199051 From rkennke at openjdk.org Mon Sep 9 13:02:08 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 13:02:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com> References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com> Message-ID: On Fri, 30 Aug 2024 07:42:39 GMT, Thomas Stuefe wrote: >> Yes. This silent setting of UseCompactObjectHeaders ended up hiding why we got CDS failures. I would also suggest that we change this to FLAG_SET_ERGO. > > Seems we run all into the same thoughts :) > > I added > > Suggestion: > > FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); > warning("Compact object headers require a java heap size smaller than %zu (given: %zu). " > "Disabling compact object headers.", max_narrow_heap_size * HeapWordSize, max_heap_size); That %zu is SIZE_FORMAT, right? This should probably use proper_unit_for_byte_size()/byte_size_in_proper_unit(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750215510 From rkennke at openjdk.org Mon Sep 9 13:31:10 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 13:31:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> Message-ID: On Thu, 22 Aug 2024 19:50:21 GMT, Albert Mingkun Yang wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix hash shift for 32 bit builds > > src/hotspot/share/gc/shared/gcForwarding.hpp line 36: > >> 34: * Implements forwarding for the full-GCs of Serial, Parallel, G1 and Shenandoah in >> 35: * a way that preserves upper N bits of object mark-words, which contain crucial >> 36: * Klass* information when running with compact headers. The encoding is similar to > > This doc suggests this forwarding is only for compact-header so I wonder if we can check `UseCompactObjectHeaders` directly instead of heap-size in `GCForwarding::initialize`. Right. The original implementation was more complex and then the consensus was to not sprinkle UseCompactHeaders all over the place, but with that new/simpler implementation it makes sense to simply check the UCOH flag. > src/hotspot/share/gc/shared/gcForwarding.hpp line 40: > >> 38: * heap-base, shifts that difference into the right place, and sets the lowest two >> 39: * bits (to indicate 'forwarded' state as usual). >> 40: */ > >> "can use 40 bits for forwardee encoding. That's enough for 8TB of heap." > > I feel this 8T-constraint is significant and should be in the doc. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750264571 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750265026 From rkennke at openjdk.org Mon Sep 9 14:11:08 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 14:11:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com> On Tue, 27 Aug 2024 07:43:07 GMT, Hamlin Li wrote: >> @Hamlin-Li : AFAIK, porting to linux-riscv platform has NOT been started yet. To avoid duplicate work, please let me know if anyone is interested or has been working on it :-) > > Yes, I'm interested in it. Thanks for raising the discussion. :) If anybody is doing it, please send me a patch, or we can do it as a follow-up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750345203 From rkennke at openjdk.org Mon Sep 9 14:11:10 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 14:11:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 11:38:39 GMT, Hamlin Li wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/oops/oop.inline.hpp line 94: > >> 92: >> 93: void oopDesc::init_mark() { >> 94: if (UseCompactObjectHeaders) { > > Seems only `set_mark(prototype_mark());` is fine for both cases? Right. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750342555 From rkennke at openjdk.org Mon Sep 9 14:35:08 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 14:35:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 21:52:58 GMT, Chris Plummer wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 169: > >> 167: } else { >> 168: visitor.doMetadata(klass, true); >> 169: } > > Why is there no `visitor.doMetadata()` call for the compact object header case? There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750386024 From stefank at openjdk.org Mon Sep 9 14:50:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 14:50:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com> On Fri, 30 Aug 2024 08:06:31 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/cds/filemap.cpp line 2507: > >> 2505: } >> 2506: >> 2507: if (compact_headers() != UseCompactObjectHeaders) { > > (Commenting here, but the comment applies to code a bit above) While debugging CDS, it would have been useful to print the value of UseCompactObjectHeaders. > > Could we change the code to be: > > log_info(cds)("Archive was created with UseCompressedOops = %d, UseCompressedClassPointers = %d, UseCompactObjectHeaders = %d", > compressed_oops(), compressed_class_pointers(), compact_headers()); Resolved. > src/hotspot/share/cds/filemap.cpp line 2508: > >> 2506: >> 2507: if (compact_headers() != UseCompactObjectHeaders) { >> 2508: log_info(cds)("The shared archive file's UseCompactObjectHeaders setting (%s)" > > Printing on the `info` level mimics what we do when there's a mismatch for compressed classes (and oops), but I wonder if that one is intentional or if it is accidentally printing to 'info' instead of 'warning'. @iklam informed me that some of the info levels (including this line) should be converted to warning. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750408043 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750410679 From rkennke at openjdk.org Mon Sep 9 15:04:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 15:04:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com> References: <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com> Message-ID: <-2JWx3F8EdyQ0Uf-mI62ImLXgjgIy9PEydjtKHhx12Q=.4d944301-6f1c-4270-953c-ec6c86df946a@github.com> On Mon, 9 Sep 2024 14:47:28 GMT, Stefan Karlsson wrote: >> src/hotspot/share/cds/filemap.cpp line 2508: >> >>> 2506: >>> 2507: if (compact_headers() != UseCompactObjectHeaders) { >>> 2508: log_info(cds)("The shared archive file's UseCompactObjectHeaders setting (%s)" >> >> Printing on the `info` level mimics what we do when there's a mismatch for compressed classes (and oops), but I wonder if that one is intentional or if it is accidentally printing to 'info' instead of 'warning'. > > @iklam informed me that some of the info levels (including this line) should be converted to warning. Yeah that looks inconsistent with other places where we print a warning instead. I'll change it to warning for the UCOH check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750430001 From stefank at openjdk.org Mon Sep 9 15:04:12 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 15:04:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 12:21:19 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/oop.hpp line 134: > >> 132: inline Klass* forward_safe_klass(markWord m) const; >> 133: inline size_t forward_safe_size(); >> 134: inline void forward_safe_init_mark(); > > Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them. > > Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe". Restating my earlier comment about this: These functions are mainly used by the GCs. In one of the patches I've cleaned away all usages except for those in Shenandoah. I would prefer to see these completely removed from the oops/ directory and let the GCs decide when and how to perform "safe" reads of these values. > src/hotspot/share/oops/oop.hpp line 356: > >> 354: return mark_offset_in_bytes() + sizeof(markWord) / 2; >> 355: } else >> 356: #endif > > Maybe instead of trying to calculate some random, meaningless value just use some "random" value directly? > I am fine with the existing code, but first stating directly that "any value" works here, this additional code seems to confuse the message. (Fwiw, the method is also used during Universe initialization). Just to be clear, the second part of the quoted sentence is important: > could be any value *that is not a valid field offset* ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750428581 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750432186 From tschatzl at openjdk.org Mon Sep 9 15:04:12 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 15:04:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 15:00:09 GMT, Stefan Karlsson wrote: > could be any value that is not a valid field offset I understand that that "random value" needs to satisfy this condition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750433800 From stefank at openjdk.org Mon Sep 9 15:34:09 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 15:34:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 12:49:05 GMT, Roman Kennke wrote: >>> If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. >> >> True; need to do sth like `obj->init_mark();` for the non-self-forwarded case. The postcondition is that no forwarded objs in eden/from. > > ParallelGC actually doesn't use bitmaps, it pushes all forwarded objs to preserved-marks-table, and uses that to find forwarded objects, which is why we can't remove the preserved-marks table in ParallelGC (IOW, after this patch, the preserved-marks-stuff in Parallel scavenger is *only* used to find forwarded objects. We might want to think about more efficient solutions for this). (Just to clarify if others are reading this) Right, what I referred to above was how we found the object to forward, which is done via the bitmaps: while (cur_addr < region_end) { cur_addr = mark_bitmap()->find_obj_beg(cur_addr, region_end); If the Parallel Old collector didn't do that, but instead parsed the heap like Serial does, then the Parallel Young collector would also have to fix the from space copies of moved objects when when it hits a promotion failure, just like Serial does. This was just meant to point out the differences between the two collectors and why the young GC code is different. I realize that in earlier comments I called the from-space copy of the objects "dead objects", but they are not dead they are just the stale objects that are discoverable because of promotion failure keeping the eden and from spaces. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750480983 From stefank at openjdk.org Mon Sep 9 15:34:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 15:34:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com> Message-ID: On Mon, 9 Sep 2024 12:59:36 GMT, Roman Kennke wrote: > That %zu is SIZE_FORMAT, right? Yes. Reviewers have lately encouraged people to use %zu instead of SIZE_FORMAT. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750482486 From cjplummer at openjdk.org Mon Sep 9 16:56:08 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 16:56:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <1VACYSoQRtP9m4BJkCVrdFxueC75Kg4Kp3wjGsAA2Dw=.53563f62-70cf-4d93-8d99-69b737812ba6@github.com> On Mon, 26 Aug 2024 21:30:51 GMT, Chris Plummer wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 85: > >> 83: >> 84: private static Klass getKlass(Mark mark) { >> 85: assert(VM.getVM().isCompactObjectHeadersEnabled()); > > `mark.getKlass()` already does this assert. I don't see any value in this `getKlass()` method. The caller should just call `getMark().getKlass()` rather than `getKlass(getMark())`. I'm not sure why this got marked as resolved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750600652 From cjplummer at openjdk.org Mon Sep 9 16:56:08 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 16:56:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 14:32:49 GMT, Roman Kennke wrote: >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 169: >> >>> 167: } else { >>> 168: visitor.doMetadata(klass, true); >>> 169: } >> >> Why is there no `visitor.doMetadata()` call for the compact object header case? > > There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt). I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following: hsdb> + inspect 0x00000007cff154b8 instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24) _mark: 1 _metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750598648 From rkennke at openjdk.org Mon Sep 9 17:45:47 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 17:45:47 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: Message-ID: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: - Print as warning when UCOH doesn't match in CDS archive - Improve initialization of mark-word in CDS ArchiveHeapWriter - Simplify getKlass() in SA - Simplify oopDesc::init_mark() - Get rid of forward_safe_* methods - GCForwarding touch-ups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/70f492d3..2884499a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=07-08 Stats: 132 lines in 17 files changed: 26 ins; 73 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From cjplummer at openjdk.org Mon Sep 9 18:37:09 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 18:37:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 16:51:35 GMT, Chris Plummer wrote: >> There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt). > > I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following: > > > hsdb> + inspect 0x00000007cff154b8 > instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24) > _mark: 1 > _metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject > firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 > lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 > this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498 I pulled your changes and I see one slight difference in the output. The following line is missing: `_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject` I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output: _mark: 16294762323640321 So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750743693 From cjplummer at openjdk.org Mon Sep 9 19:07:10 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 19:07:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 18:34:10 GMT, Chris Plummer wrote: >> I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following: >> >> >> hsdb> + inspect 0x00000007cff154b8 >> instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24) >> _mark: 1 >> _metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject >> firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 >> lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 >> this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498 > > I pulled your changes and I see one slight difference in the output. The following line is missing: > > `_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject` > > I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output: > > _mark: 16294762323640321 > > So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this. Thinking about this a bit more, maybe _mark needs to be a MetadataFile rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two seprate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750788243 From coleenp at openjdk.org Mon Sep 9 19:55:16 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 19:55:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Mon, 9 Sep 2024 17:45:47 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: > > - Print as warning when UCOH doesn't match in CDS archive > - Improve initialization of mark-word in CDS ArchiveHeapWriter > - Simplify getKlass() in SA > - Simplify oopDesc::init_mark() > - Get rid of forward_safe_* methods > - GCForwarding touch-ups I reviewed the oops code so far. src/hotspot/share/oops/compressedKlass.cpp line 116: > 114: _range = end - _base; > 115: > 116: DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);) Can you refactor so the aarch64 path runs this same code without duplication? src/hotspot/share/oops/klass.hpp line 173: > 171: > 172: markWord _prototype_header; // Used to initialize objects' header > 173: I think you should move this up after ClassLoaderData, as there might be an alignment gap (you can run pahole to check). src/hotspot/share/oops/klass.hpp line 718: > 716: > 717: markWord prototype_header() const { > 718: assert(UseCompactObjectHeaders, "only use with compact object headers"); Should this unconditionally return _prototype_header since it's initialized to markWord::prototype_header(), or would that decrease performance for the non-compact headers case? src/hotspot/share/oops/klass.inline.hpp line 54: > 52: } > 53: > 54: inline void Klass::set_prototype_header(markWord header) { Can you put a comment that this is only used when dumping the archive? Because otherwise the Klass::_prototype_header field should always be initialized to the right thing (either with Klass encoded or as markWord::protoytpe_header()) and doesn't change. src/hotspot/share/oops/markWord.inline.hpp line 90: > 88: ShouldNotReachHere(); > 89: return markWord(); > 90: #endif Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits? src/hotspot/share/oops/oop.inline.hpp line 90: > 88: } else { > 89: return markWord::prototype(); > 90: } Could this be unconditional since prototoype_header is initialized for all Klasses? src/hotspot/share/oops/typeArrayKlass.cpp line 175: > 173: size_t TypeArrayKlass::oop_size(oop obj) const { > 174: // In this assert, we cannot safely access the Klass* with compact headers. > 175: assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array"); Why not? I think I'm missing something. Klass should be in the markWord and that should be ok (?) ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2290316150 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750529270 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750727211 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750730078 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750736547 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750739441 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750842383 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750721069 From coleenp at openjdk.org Mon Sep 9 19:55:19 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 19:55:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Try to avoid lea in loadNklass (aarch64) > - Fix release build error src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147: > 145: #endif > 146: > 147: return true; This should only be in the compressedKlass.cpp file. src/hotspot/share/oops/compressedKlass.cpp line 214: > 212: ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)", > 213: len, max_encoding_range_size()); > 214: vm_exit_during_initialization(ss.base()); Why does this exit and not turn off compressed klass pointers and compact object headers? src/hotspot/share/oops/compressedKlass.cpp line 222: > 220: return; > 221: } > 222: #endif Why not add null pd_initialize to zero to remove this conditional code? src/hotspot/share/oops/compressedKlass.cpp line 224: > 222: #endif > 223: > 224: if (tiny_classpointer_mode()) { I kind of agree with Thomas Schatzl for this. Maybe it should be compact_classpointer_mode(). It's nice to have a new string for grep, but they're not really that tiny. src/hotspot/share/oops/compressedKlass.cpp line 234: > 232: _range = len; > 233: > 234: constexpr int log_cacheline = 6; Is 6 the log of DEFAULT_CACHE_LINE_SIZE? src/hotspot/share/oops/compressedKlass.cpp line 243: > 241: } else { > 242: > 243: // In legacy mode, we try, in order of preference: Can you not use the word 'legacy' here? Maybe in "non-compact object header mode"... src/hotspot/share/oops/compressedKlass.inline.hpp line 100: > 98: check_valid_klass(k, base(), shift()); > 99: // Also assert that k falls into what we know is the valid Klass range. This is usually smaller > 100: // than the encoding range (e.g. encoding range covers 4G, but we only have 1G class space and a 1G is the default CompressedClassSpaceSize but can be larger, right? So the comment isn't quite accurate. Or with tiny class pointers can it only be 1G? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750527537 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750511912 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750513660 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750515923 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750520712 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750524690 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750662637 From coleenp at openjdk.org Mon Sep 9 19:55:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 19:55:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com> On Mon, 9 Sep 2024 10:02:53 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/oops/compressedKlass.hpp line 43: > >> 41: >> 42: // Tiny-class-pointer mode >> 43: static int _tiny_cp; // -1, 0=true, 1=false > > Suggestion: > > static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false > > In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style. I agree with this. 'cp' reads as ConstantPool for me even though this is a different context. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750531167 From stefank at openjdk.org Mon Sep 9 20:07:13 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 20:07:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com> On Mon, 9 Sep 2024 18:15:38 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: >> >> - Print as warning when UCOH doesn't match in CDS archive >> - Improve initialization of mark-word in CDS ArchiveHeapWriter >> - Simplify getKlass() in SA >> - Simplify oopDesc::init_mark() >> - Get rid of forward_safe_* methods >> - GCForwarding touch-ups > > src/hotspot/share/oops/typeArrayKlass.cpp line 175: > >> 173: size_t TypeArrayKlass::oop_size(oop obj) const { >> 174: // In this assert, we cannot safely access the Klass* with compact headers. >> 175: assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array"); > > Why not? I think I'm missing something. Klass should be in the markWord and that should be ok (?) I tracked this down to only (at least in my testing) happen from `size_given_klass` when called from the GC when it is about to copy an object. While that happens another thread can racingly succeed to copy the object and install a forwarding pointer over the old copy. When that happens the klass pointer is broken and the call to oopDesc::is_typeArray() crashes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750862842 From coleenp at openjdk.org Mon Sep 9 20:23:11 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 20:23:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com> Message-ID: On Mon, 9 Sep 2024 20:04:22 GMT, Stefan Karlsson wrote: >> src/hotspot/share/oops/typeArrayKlass.cpp line 175: >> >>> 173: size_t TypeArrayKlass::oop_size(oop obj) const { >>> 174: // In this assert, we cannot safely access the Klass* with compact headers. >>> 175: assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array"); >> >> Why not? I think I'm missing something. Klass should be in the markWord and that should be ok (?) > > I tracked this down to only (at least in my testing) happen from `size_given_klass` when called from the GC when it is about to copy an object. While that happens another thread can racingly succeed to copy the object and install a forwarding pointer over the old copy. When that happens the klass pointer is broken and the call to oopDesc::is_typeArray() crashes. I did miss something. I thought the markWord was never overwritten by the forwarding pointer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750882259 From rkennke at openjdk.org Tue Sep 10 07:23:13 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 07:23:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: On Mon, 9 Sep 2024 10:16:24 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/gc/shared/gcForwarding.hpp line 41: > >> 39: * bits (to indicate 'forwarded' state as usual). >> 40: */ >> 41: class GCForwarding : public AllStatic { > > Since this class is only used for Full GCs, it may be useful to include that information, i.e. something like `FullGCForwarding` to avoid confusion why it is not used for other GCs too. > (Unless this has been discussed and even rejected by me before). I agree. In-fact, that has been my original name. It has been suggested that I change it to SlidingForwarding when that was the approach that we were going to take, but with the new implementation, FullGCForwarding makes most sense. I'll change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751400378 From rkennke at openjdk.org Tue Sep 10 07:56:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 07:56:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: On Mon, 9 Sep 2024 10:21:54 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/gc/shared/collectedHeap.cpp line 232: > >> 230: } >> 231: >> 232: // With compact headers, we can't safely access the class, due > > Suggestion: > > // With compact headers, we can't safely access the klass, due > > > This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable? > Given this is used for verification only afaik, we should make an effort to provide that check. With compact headers, we can't safely access the Klass* when the object has been forwarded, because non-full-GC-forwarding temporarily overwrites the mark-word, and thus the Klass*, with the forwarding pointer, and here we have no way to make a distinction between Full-GC and regular GC forwarding. I improved the code to make the check when the object is not forwarded. Not sure if we could/should do more (e.g. pass around is_full argument to make the distinction, or find the - possibly few - places where we might call is_oop() on from-space objects in regular GC and do the check in a forwardee-safe way?). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751448814 From rkennke at openjdk.org Tue Sep 10 08:36:13 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 08:36:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 14:58:07 GMT, Stefan Karlsson wrote: >> src/hotspot/share/oops/oop.hpp line 134: >> >>> 132: inline Klass* forward_safe_klass(markWord m) const; >>> 133: inline size_t forward_safe_size(); >>> 134: inline void forward_safe_init_mark(); >> >> Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them. >> >> Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe". > > Restating my earlier comment about this: These functions are mainly used by the GCs. In one of the patches I've cleaned away all usages except for those in Shenandoah. I would prefer to see these completely removed from the oops/ directory and let the GCs decide when and how to perform "safe" reads of these values. I've removed those methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751514466 From rkennke at openjdk.org Tue Sep 10 08:40:12 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 08:40:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 15:01:10 GMT, Thomas Schatzl wrote: >> Just to be clear, the second part of the quoted sentence is important: >>> could be any value *that is not a valid field offset* > >> could be any value that is not a valid field offset > > I understand that that "random value" needs to satisfy this condition. With compact headers, this value should only be used in C2, and not really as an actual offset. An earlier version of the change had the value in src/hotspot/share/opto/type.hpp instead, and only an assert(!UCOH) in oopDesc::klass_offset_in_bytes(). I think this would be a better solution overall, because it prevents accidental (and wrong) usage of the klass_offset in the runtime. Back then it has been rejected by somebody (don't remember), because it made the C2 diff a little messier, so I kept it like it is now. I would prefer to reinstate it, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751522091 From rkennke at openjdk.org Tue Sep 10 08:44:11 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 08:44:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 12:12:23 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/objArrayKlass.inline.hpp line 74: > >> 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { >> 73: // In this assert, we cannot safely access the Klass* with compact headers. >> 74: assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array"); > > If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe? Good question. This comment and assert can probably be removed (same for the similar comment/assert in TypeArrayKlass::oop_oop_iterate_impl(). Could be a left-over from a time when we had to deal with OM and/or stack-locks in the header. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751527745 From mli at openjdk.org Tue Sep 10 08:54:13 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 10 Sep 2024 08:54:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com> References: <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com> Message-ID: On Mon, 9 Sep 2024 14:08:53 GMT, Roman Kennke wrote: >> Yes, I'm interested in it. Thanks for raising the discussion. :) > > If anybody is doing it, please send me a patch, or we can do it as a follow-up PR. Thanks. I'll send it to you if I finish it in time, otherwise I will do it in a separate pr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751544394 From rkennke at openjdk.org Tue Sep 10 09:31:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 09:31:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Tue, 10 Sep 2024 08:37:43 GMT, Roman Kennke wrote: >>> could be any value that is not a valid field offset >> >> I understand that that "random value" needs to satisfy this condition. > > With compact headers, this value should only be used in C2, and not really as an actual offset. An earlier version of the change had the value in src/hotspot/share/opto/type.hpp instead, and only an assert(!UCOH) in oopDesc::klass_offset_in_bytes(). I think this would be a better solution overall, because it prevents accidental (and wrong) usage of the klass_offset in the runtime. Back then it has been rejected by somebody (don't remember), because it made the C2 diff a little messier, so I kept it like it is now. I would prefer to reinstate it, though. > (Fwiw, the method is also used during Universe initialization). Yes, but only in the -UCOH branch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751604467 From stefank at openjdk.org Tue Sep 10 10:05:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 10 Sep 2024 10:05:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Tue, 10 Sep 2024 08:41:16 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/objArrayKlass.inline.hpp line 74: >> >>> 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { >>> 73: // In this assert, we cannot safely access the Klass* with compact headers. >>> 74: assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array"); >> >> If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe? > > Good question. This comment and assert can probably be removed (same for the similar comment/assert in TypeArrayKlass::oop_oop_iterate_impl(). Could be a left-over from a time when we had to deal with OM and/or stack-locks in the header. FWIW, I've been running tests with this assert restored (and the one in TypeArrayKlass) without hitting any problems. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751656595 From rkennke at openjdk.org Tue Sep 10 11:29:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 11:29:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: On Tue, 10 Sep 2024 07:53:23 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shared/collectedHeap.cpp line 232: >> >>> 230: } >>> 231: >>> 232: // With compact headers, we can't safely access the class, due >> >> Suggestion: >> >> // With compact headers, we can't safely access the klass, due >> >> >> This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable? >> Given this is used for verification only afaik, we should make an effort to provide that check. > > With compact headers, we can't safely access the Klass* when the object has been forwarded, because non-full-GC-forwarding temporarily overwrites the mark-word, and thus the Klass*, with the forwarding pointer, and here we have no way to make a distinction between Full-GC and regular GC forwarding. > > I improved the code to make the check when the object is not forwarded. Not sure if we could/should do more (e.g. pass around is_full argument to make the distinction, or find the - possibly few - places where we might call is_oop() on from-space objects in regular GC and do the check in a forwardee-safe way?). Ah, I found it! It seems only the ShenandoahVerifier calls oop_iterate() on from_space objects, which can have a forwarding, which would mess with the object's Klass*. We're lucky because that iterator doesn't visit the Klass*. I see the following ways out: - The caller must ensure that the oop is ok and Klass* is accessible. I could do that in the ShenandoahVerifier. It kinda defeats the point, though, we want the verifier operate on the 'raw' object, not necessarily the forwardee. - Next easy way out would be to use 'this' instead of obj->klass(). Should makes sense, because it should always be the same. Using 'this' in the assert (this->is_array_klass()) is kinda bogus, though. And asserting (this == obj->klass()) would be nice, but would have the same problem as before where we would need to exclude UCOH for the case where Shenandoah needs it. In-fact, this is done already in oopDesc::oop_iterate_backwards(), but also excluding UCOH. - We could add a hook in the iterator that gives the Klass* for a given oop, which can then be overridden by the actual iterator to do the right thing, e.g. load the Klass* from the forwardee. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751770293 From stuefe at openjdk.org Tue Sep 10 12:07:09 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:07:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:49:57 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/compressedKlass.cpp line 214: > >> 212: ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)", >> 213: len, max_encoding_range_size()); >> 214: vm_exit_during_initialization(ss.base()); > > Why does this exit and not turn off compressed klass pointers and compact object headers? This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751819814 From stuefe at openjdk.org Tue Sep 10 12:16:11 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:16:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:50:50 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/compressedKlass.cpp line 222: > >> 220: return; >> 221: } >> 222: #endif > > Why not add null pd_initialize to zero to remove this conditional code? I can do that. Added to backlist (https://wiki.openjdk.org/display/lilliput/JEP-450+Review+Todo) > src/hotspot/share/oops/compressedKlass.cpp line 224: > >> 222: #endif >> 223: >> 224: if (tiny_classpointer_mode()) { > > I kind of agree with Thomas Schatzl for this. Maybe it should be compact_classpointer_mode(). It's nice to have a new string for grep, but they're not really that tiny. Yes, makes sense. Added to backlist. This coding was developed somewhat independently from +COH at the beginning, but now the two parts (tinycp and the rest of COH) depend on each other anyway. I should just use UseCompactObjectHeaders or a flag directly derived from it. > src/hotspot/share/oops/compressedKlass.cpp line 234: > >> 232: _range = len; >> 233: >> 234: constexpr int log_cacheline = 6; > > Is 6 the log of DEFAULT_CACHE_LINE_SIZE? 64, yes > src/hotspot/share/oops/compressedKlass.cpp line 243: > >> 241: } else { >> 242: >> 243: // In legacy mode, we try, in order of preference: > > Can you not use the word 'legacy' here? Maybe in "non-compact object header mode"... okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751828214 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751831035 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751831994 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751833034 From coleenp at openjdk.org Tue Sep 10 12:22:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 10 Sep 2024 12:22:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:03:59 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 214: >> >>> 212: ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)", >>> 213: len, max_encoding_range_size()); >>> 214: vm_exit_during_initialization(ss.base()); >> >> Why does this exit and not turn off compressed klass pointers and compact object headers? > > This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. > > Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too. Ok, in this case, that's fine if we already asserted. A fatal error is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751840556 From rkennke at openjdk.org Tue Sep 10 12:42:48 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 12:42:48 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v10] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: - More touch-ups, fix Shenandoah oop iterator - Remove asserts in XArrayKlass::oop_oop_iterate() - Various touch-ups - Improve is_oop() - Rename GCForwarding -> FullGCForwarding; some touch-ups - Fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/2884499a..5da250cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=08-09 Stats: 238 lines in 36 files changed: 74 ins; 65 del; 99 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stuefe at openjdk.org Tue Sep 10 12:42:49 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:42:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com> Message-ID: On Mon, 9 Sep 2024 16:01:10 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/compressedKlass.hpp line 43: >> >>> 41: >>> 42: // Tiny-class-pointer mode >>> 43: static int _tiny_cp; // -1, 0=true, 1=false >> >> Suggestion: >> >> static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false >> >> In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style. > > I agree with this. 'cp' reads as ConstantPool for me even though this is a different context. Okay, I will change that ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751867998 From stuefe at openjdk.org Tue Sep 10 12:42:49 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:42:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v10] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Mon, 9 Sep 2024 15:59:43 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: >> >> - More touch-ups, fix Shenandoah oop iterator >> - Remove asserts in XArrayKlass::oop_oop_iterate() >> - Various touch-ups >> - Improve is_oop() >> - Rename GCForwarding -> FullGCForwarding; some touch-ups >> - Fix comment > > src/hotspot/share/oops/compressedKlass.cpp line 116: > >> 114: _range = end - _base; >> 115: >> 116: DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);) > > Can you refactor so the aarch64 path runs this same code without duplication? In tinycp mode, aarch64 runs this code though? The aarch64 variant of pd_initialize just returns then. In non-COH mode (preexisting, not touched by this patch) Aarch64 needs its own handling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751866773 From rkennke at openjdk.org Tue Sep 10 19:11:30 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 19:11:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix FullGCForwarding initialization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/5da250cf..6abda7bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=09-10 Stats: 8 lines in 7 files changed: 1 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stuefe at openjdk.org Tue Sep 10 19:11:30 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 19:11:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 17:40:03 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/compressedKlass.inline.hpp line 100: > >> 98: check_valid_klass(k, base(), shift()); >> 99: // Also assert that k falls into what we know is the valid Klass range. This is usually smaller >> 100: // than the encoding range (e.g. encoding range covers 4G, but we only have 1G class space and a > > 1G is the default CompressedClassSpaceSize but can be larger, right? So the comment isn't quite accurate. Or with tiny class pointers can it only be 1G? The comment was misleading, it referred to the 1g default class space. I recently changed class space (in mainline) to be max. 4GB (minus whatever little CDS needs), and for +COH, this is still true. 22 bit class pointer and 10 bit shift still gives us a max encoding range size of 4GB. I will update the comment. (->backlist) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751872461 From sviswanathan at openjdk.org Tue Sep 10 21:36:09 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 10 Sep 2024 21:36:09 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v3] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 19:10:34 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > update libm tanh reference test with code review suggestions src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 810: > 808: x->id() == vmIntrinsics::_dpow || x->id() == vmIntrinsics::_dcos || > 809: x->id() == vmIntrinsics::_dsin || x->id() == vmIntrinsics::_dtan || > 810: x->id() == vmIntrinsics::_dlog10 || x->id() == vmIntrinsics::_dtanh) { Need to have the tanh under #Ifdef _LP64 as we are generating stub only for 64 bit. src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 1000: > 998: if (StubRoutines::dtanh() != nullptr) { > 999: __ call_runtime_leaf(StubRoutines::dtanh(), getThreadTemp(), result_reg, cc->args()); > 1000: } // TODO: else clause? You could instead have an assert here that StubRoutines::dtanh() is not null. Thereby no need for the else clause. src/hotspot/cpu/x86/templateInterpreterGenerator_x86_32.cpp line 376: > 374: // [ hi(arg) ] > 375: // > 376: if (kind == Interpreter::java_lang_math_tanh) { Need to update the copyright year to 2024 in this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752286133 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752289575 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752304061 From duke at openjdk.org Wed Sep 11 00:29:30 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 11 Sep 2024 00:29:30 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v4] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: c1 and template generator fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/39350a37..4aa52bfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=02-03 Stats: 8 lines in 2 files changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Wed Sep 11 00:29:30 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 11 Sep 2024 00:29:30 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v3] In-Reply-To: References: Message-ID: <0BMvrr-HLplOb73v8G3cdcG063jjNDkkBlCCnH8MH9c=.d7f172bf-750d-4ba4-840f-d2f492cac2c9@github.com> On Tue, 10 Sep 2024 16:26:38 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> update libm tanh reference test with code review suggestions > > src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 810: > >> 808: x->id() == vmIntrinsics::_dpow || x->id() == vmIntrinsics::_dcos || >> 809: x->id() == vmIntrinsics::_dsin || x->id() == vmIntrinsics::_dtan || >> 810: x->id() == vmIntrinsics::_dlog10 || x->id() == vmIntrinsics::_dtanh) { > > Need to have the tanh under #Ifdef _LP64 as we are generating stub only for 64 bit. Please see the newly added `#ifdef `in the updated code. > src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 1000: > >> 998: if (StubRoutines::dtanh() != nullptr) { >> 999: __ call_runtime_leaf(StubRoutines::dtanh(), getThreadTemp(), result_reg, cc->args()); >> 1000: } // TODO: else clause? > > You could instead have an assert here that StubRoutines::dtanh() is not null. Thereby no need for the else clause. Please see the newly added assert in the updated code. > src/hotspot/cpu/x86/templateInterpreterGenerator_x86_32.cpp line 376: > >> 374: // [ hi(arg) ] >> 375: // >> 376: if (kind == Interpreter::java_lang_math_tanh) { > > Need to update the copyright year to 2024 in this file. Please see the year updated to 2024 in the updated code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752962967 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752962670 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752963446 From darcy at openjdk.org Wed Sep 11 02:02:13 2024 From: darcy at openjdk.org (Joe Darcy) Date: Wed, 11 Sep 2024 02:02:13 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 19:10:08 GMT, Srinivas Vamsi Parasa wrote: >> test/jdk/java/lang/Math/HyperbolicTests.java line 1009: >> >>> 1007: for(int i = 0; i < testCases.length; i++) { >>> 1008: double testCase = testCases[i]; >>> 1009: failures += testTanhWithReferenceUlpDiff(testCase, StrictMath.tanh(testCase), 2.5); >> >> The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. >> >> For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). >> >> If the test is going to use randomness, then its jtreg tags should include >> >> `@key randomness` >> >> and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. > >> If the test is going to use randomness, then its jtreg tags should include >> >> `@key randomness` >> >> and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. > > Please see the test updated to use `@key randomness` and` jdk.test.lib.RandomFactory` to get and Random object. > >> The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. >> For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). >> > So far the tests haven't failed with error of 2.5ulp. Would it be better to make it 5ulp? Please let me know. So far, this will be the only intrinsic implementation of tanh. Therefore, at the moment it is just checking the consistency of the intrinsic implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp accuracy, it would be expected to often be within 2.5 ulps of FDLIBM tanh. However, as written the regression test would not necessarily pass against any allowable Math.tanh implementation, which is the usual criteria for java.lang.Math tests that aren't otherwise constrained (such as by being limited to a given subset of platforms). If there was a correctly rounded tanh to compare against, then this style of testing would be valid. Are there any plan to intrinsify sinh or cosh? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1753010811 From epeter at openjdk.org Wed Sep 11 08:28:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Sep 2024 08:28:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization @rkennke Can you please explain the changes in these tests: test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2342983487 From yzheng at openjdk.org Wed Sep 11 13:21:42 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 11 Sep 2024 13:21:42 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses Message-ID: https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses ------------- Commit messages: - trim trailing whitespace - make JVMCI aware that some klass pointers are not compressible Changes: https://git.openjdk.org/jdk/pull/20949/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20949&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339939 Stats: 63 lines in 7 files changed: 56 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20949.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20949/head:pull/20949 PR: https://git.openjdk.org/jdk/pull/20949 From rkennke at openjdk.org Wed Sep 11 13:37:16 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 11 Sep 2024 13:37:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 08:24:16 GMT, Emanuel Peter wrote: > @rkennke Can you please explain the changes in these tests: > > ``` > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > ``` > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2343693629 From jsjolen at openjdk.org Wed Sep 11 14:00:17 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 11 Sep 2024 14:00:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization Hi, Me and @caspernorrbin are reviewing the Metaspace changes (so anything in the `memory` and `metaspace` folders). We have found minor improvements that can be made and some nits, but the code over all looks OK. We are finishing up a first round of review now, and will have a second one. Thank you for your hard work and your patience with the review process. src/hotspot/share/memory/classLoaderMetaspace.cpp line 87: > 85: klass_alignment_words, > 86: "class arena"); > 87: } As per my comment in the header file, change the code to this: ```c++ if (class_context != nullptr) { // ... Same as in PR } else { _class_space_arena = _non_class_space_arena; } src/hotspot/share/memory/classLoaderMetaspace.cpp line 115: > 113: if (wastage.is_nonempty()) { > 114: non_class_space_arena()->deallocate(wastage); > 115: } This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example: ```c++ // Any wasted memory is presumably too small for any class. // Therefore, give it back to the non-class space arena's free list. src/hotspot/share/memory/classLoaderMetaspace.cpp line 118: > 116: #ifdef ASSERT > 117: if (result.is_nonempty()) { > 118: const bool in_class_arena = class_space_arena() != nullptr ? class_space_arena()->contains(result) : false; Unnecessary nullptr check if you take my suggestion, or you should switch to `have_class_space_arena`. src/hotspot/share/memory/classLoaderMetaspace.cpp line 165: > 163: MetaBlock bl(ptr, word_size); > 164: // If the block would be reusable for a Klass, add to class arena, otherwise to > 165: // then non-class arena. Nit: spelling, "the" src/hotspot/share/memory/classLoaderMetaspace.hpp line 81: > 79: metaspace::MetaspaceArena* class_space_arena() const { return _class_space_arena; } > 80: > 81: bool have_class_space_arena() const { return _class_space_arena != nullptr; } This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers` src/hotspot/share/memory/metaspace.cpp line 656: > 654: // Adjust size of the compressed class space. > 655: > 656: const size_t res_align = reserve_alignment(); Can you change the name to `root_chunk_size`? src/hotspot/share/memory/metaspace.hpp line 112: > 110: static size_t max_allocation_word_size(); > 111: > 112: // Minimum allocation alignment, in bytes. All MetaData shall be aligned correclty Nit: Spelling, "correctly" src/hotspot/share/memory/metaspace/metablock.hpp line 48: > 46: > 47: MetaWord* base() const { return _base; } > 48: const MetaWord* end() const { return _base + _word_size; } `assert(is_nonempty())` src/hotspot/share/memory/metaspace/metablock.hpp line 51: > 49: size_t word_size() const { return _word_size; } > 50: bool is_empty() const { return _base == nullptr; } > 51: bool is_nonempty() const { return _base != nullptr; } Can `_base == nullptr` but `_word_size != 0`? src/hotspot/share/memory/metaspace/metablock.hpp line 52: > 50: bool is_empty() const { return _base == nullptr; } > 51: bool is_nonempty() const { return _base != nullptr; } > 52: void reset() { _base = nullptr; _word_size = 0; } Is this function really necessary? According to my IDE it's only used in tests and even then the `MetaBlock` isn't used afterwards (so it has no effect). src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 44: > 42: class FreeBlocks; > 43: > 44: struct ArenaStats; Nit: Sort? src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 84: > 82: // between threads and needs to be synchronized in CLMS. > 83: > 84: const size_t _allocation_alignment_words; Nit: Document this? All other members are documented. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2296528491 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754335269 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754398993 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754343513 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754459464 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754330432 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754619023 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754508321 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754142822 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754142098 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754153662 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754192464 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754197251 From dnsimon at openjdk.org Wed Sep 11 14:01:06 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 11 Sep 2024 14:01:06 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:09:07 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses Marked as reviewed by dnsimon (Reviewer). src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstant.java line 29: > 27: /** > 28: * Marker interface for hotspot specific constants. > 29: */ Let's take this opportunity to improve this javadoc: /** * A value in a space managed by Hotspot (e.g. heap or metaspace). * Some of these values can be referenced with a compressed pointer (32 bits) * instead of a full word-sized pointer. */ ------------- PR Review: https://git.openjdk.org/jdk/pull/20949#pullrequestreview-2297174735 PR Review Comment: https://git.openjdk.org/jdk/pull/20949#discussion_r1754641618 From epeter at openjdk.org Wed Sep 11 14:17:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Sep 2024 14:17:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:34:28 GMT, Roman Kennke wrote: > > @rkennke Can you please explain the changes in these tests: > > ``` > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > > ``` > > > > > > > > > > > > > > > > > > > > > > > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > I will re-evaluate those tests, and add comments or remove the restrictions. If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;) My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2343797957 From rcastanedalo at openjdk.org Wed Sep 11 14:17:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 14:17:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization src/hotspot/share/memory/metaspace/binList.hpp line 202: > 200: b_last = b; > 201: } > 202: if (UseNewCode)printf("\n"); I guess this line is a leftover to be removed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754702742 From gdub at openjdk.org Wed Sep 11 14:38:05 2024 From: gdub at openjdk.org (Gilles Duboscq) Date: Wed, 11 Sep 2024 14:38:05 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: <5Hae284Qb3b8eW5zJUliUCw9HqdUBZV3wkZ6tCzTnpg=.342e236a-d7aa-4783-bc41-f4754efd48a7@github.com> On Wed, 11 Sep 2024 13:09:07 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstant.java line 40: > 38: * Determines if this constant is compressible. > 39: */ > 40: boolean isCompressible(); It might be worth adding a note about the fact that even if this returns true, `compress()` might still throw `IllegalArgumentException` if `isCompressed()` is also true. Or is might be a bit more intuitive to ask for the invariant that `isCompressible()` should return `false` if `isCompressed()` is true, an reword the javadoc below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20949#discussion_r1754773712 From rcastanedalo at openjdk.org Wed Sep 11 14:50:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 14:50:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization src/hotspot/share/opto/machnode.cpp line 390: > 388: t = t->make_ptr(); > 389: } > 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754813751 From stuefe at openjdk.org Wed Sep 11 16:17:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Sep 2024 16:17:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:47:30 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.cpp line 115: > >> 113: if (wastage.is_nonempty()) { >> 114: non_class_space_arena()->deallocate(wastage); >> 115: } > > This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example: > > ```c++ > // Any wasted memory is presumably too small for any class. > // Therefore, give it back to the non-class space arena's free list. Yes. Some background: - wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert) - wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small Yes, I will write a better comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755111131 From stuefe at openjdk.org Wed Sep 11 16:17:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Sep 2024 16:17:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 14:15:12 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/binList.hpp line 202: > >> 200: b_last = b; >> 201: } >> 202: if (UseNewCode)printf("\n"); > > I guess this line is a leftover to be removed? Yep thanks for spotting ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755115905 From stuefe at openjdk.org Wed Sep 11 16:17:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Sep 2024 16:17:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 16:14:39 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/metaspace/binList.hpp line 202: >> >>> 200: b_last = b; >>> 201: } >>> 202: if (UseNewCode)printf("\n"); >> >> I guess this line is a leftover to be removed? > > Yep thanks for spotting So that was causing the empty lines in my logs (facepalm) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755116656 From sviswanathan at openjdk.org Wed Sep 11 17:24:11 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 11 Sep 2024 17:24:11 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 01:59:54 GMT, Joe Darcy wrote: >>> If the test is going to use randomness, then its jtreg tags should include >>> >>> `@key randomness` >>> >>> and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. >> >> Please see the test updated to use `@key randomness` and` jdk.test.lib.RandomFactory` to get and Random object. >> >>> The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. >>> For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). >>> >> So far the tests haven't failed with error of 2.5ulp. Would it be better to make it 5ulp? Please let me know. > > So far, this will be the only intrinsic implementation of tanh. Therefore, at the moment it is just checking the consistency of the intrinsic implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp accuracy, it would be expected to often be within 2.5 ulps of FDLIBM tanh. However, as written the regression test would not necessarily pass against any allowable Math.tanh implementation, which is the usual criteria for java.lang.Math tests that aren't otherwise constrained (such as by being limited to a given subset of platforms). > > If there was a correctly rounded tanh to compare against, then this style of testing would be valid. > > Are there any plan to intrinsify sinh or cosh? I think instead of random we should generate offline additional correctly rounded fixed test points to cater to new algorithm using high precision arithmetic library and then simply extend the HyperbolicTests.java with these new fixed test points using existing ulp testing mechanism in the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1755203926 From rkennke at openjdk.org Wed Sep 11 17:31:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 11 Sep 2024 17:31:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v12] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Make is_oop() MT-safe - Re-enable some vectorization tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/6abda7bc..b6c11f74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=10-11 Stats: 32 lines in 6 files changed: 7 ins; 8 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Wed Sep 11 17:38:57 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 11 Sep 2024 17:38:57 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Revert accidental change of UCOH default ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/b6c11f74..9e008ac1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From duke at openjdk.org Wed Sep 11 18:02:06 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 11 Sep 2024 18:02:06 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 17:21:36 GMT, Sandhya Viswanathan wrote: >> So far, this will be the only intrinsic implementation of tanh. Therefore, at the moment it is just checking the consistency of the intrinsic implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp accuracy, it would be expected to often be within 2.5 ulps of FDLIBM tanh. However, as written the regression test would not necessarily pass against any allowable Math.tanh implementation, which is the usual criteria for java.lang.Math tests that aren't otherwise constrained (such as by being limited to a given subset of platforms). >> >> If there was a correctly rounded tanh to compare against, then this style of testing would be valid. >> >> Are there any plan to intrinsify sinh or cosh? > > I think instead of random we should generate offline additional correctly rounded fixed test points to cater to new algorithm using high precision arithmetic library and then simply extend the HyperbolicTests.java with these new fixed test points using existing ulp testing mechanism in the test. Thank you Sandhya(@sviswa7) for the suggestion! Will update the existing HyperbolicTests.java with new fixed point tests with quad precision reference values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1755258108 From coleenp at openjdk.org Wed Sep 11 21:18:12 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 11 Sep 2024 21:18:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Revert accidental change of UCOH default I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor. diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp index fd198f54fc9..7aa4bd24948 100644 --- a/src/hotspot/share/oops/instanceKlass.cpp +++ b/src/hotspot/share/oops/instanceKlass.cpp @@ -511,7 +511,7 @@ InstanceKlass::InstanceKlass() { } InstanceKlass::InstanceKlass(const ClassFileParser& parser, KlassKind kind, ReferenceType reference_type) : - Klass(kind), + Klass(kind, (!parser.is_interface() && !parser.is_abstract())), _nest_members(nullptr), _nest_host(nullptr), _permitted_subclasses(nullptr), ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2344715540 From rcastanedalo at openjdk.org Thu Sep 12 10:20:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 12 Sep 2024 10:20:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Revert accidental change of UCOH default src/hotspot/share/opto/lcm.cpp line 272: > 270: const TypePtr* tptr; > 271: if ((UseCompressedOops || UseCompressedClassPointers) && > 272: (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) { Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled: (!UseCompressedOops, UseCompressedClassPointers, CompressedKlassPointers::shift() != 0) ( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1756570168 From rcastanedalo at openjdk.org Thu Sep 12 11:49:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 12 Sep 2024 11:49:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Revert accidental change of UCOH default src/hotspot/share/cds/filemap.cpp line 2457: > 2455: compressed_oops(), compressed_class_pointers()); > 2456: if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) { > 2457: log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is " The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1756699774 From rkennke at openjdk.org Thu Sep 12 13:16:14 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 12 Sep 2024 13:16:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:34:28 GMT, Roman Kennke wrote: >> @rkennke Can you please explain the changes in these tests: >> >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >> >> >> You added these IR rule restriction: >> `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >> >> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >> >> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >> >> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >> >> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > >> @rkennke Can you please explain the changes in these tests: >> >> ``` >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >> ``` >> >> You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >> >> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >> >> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >> >> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >> >> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > I will re-evaluate those tests, and add comments or remove the restrictions. > > > @rkennke Can you please explain the changes in these tests: > > > ``` > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > > > > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > I will re-evaluate those tests, and add comments or remove the restrictions. > > If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;) > > My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization. Indeed, I could re-enable all tests in: test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java but unfortunately not those others: > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset. I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2346250313 From epeter at openjdk.org Thu Sep 12 13:23:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 13:23:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:13:01 GMT, Roman Kennke wrote: > > > > @rkennke Can you please explain the changes in these tests: > > > > ``` > > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > > > > > > > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > > I will re-evaluate those tests, and add comments or remove the restrictions. > > > > > > If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;) > > My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization. > > Indeed, I could re-enable all tests in: > > ``` > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > ``` > > but unfortunately not those others: > > ``` > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > ``` > > I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset. > > I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it. Excellent, that is what I hoped for! Thanks for filing the bug, I'll look into it once this is integrated. You should probably mark it as "blocked by", not "related to" ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2346266568 From stuefe at openjdk.org Thu Sep 12 15:41:22 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 12 Sep 2024 15:41:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Thu, 12 Sep 2024 10:17:47 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > src/hotspot/share/opto/lcm.cpp line 272: > >> 270: const TypePtr* tptr; >> 271: if ((UseCompressedOops || UseCompressedClassPointers) && >> 272: (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) { > > Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled: > > (!UseCompressedOops, UseCompressedClassPointers, CompressedKlassPointers::shift() != 0) > ( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0) Hi @robcasloz The `CompressedKlassPointers` utility class is not usable anymore with `-UseCompressedClassPointers`. One change is that if `UseCompressedClassPointers` is off, `CompressedKlassPointers` stays uninitialized. And that makes more sense then to rely on the static initialization values of `CompressedOops::_shift`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757126946 From stuefe at openjdk.org Thu Sep 12 15:46:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 12 Sep 2024 15:46:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> On Wed, 11 Sep 2024 14:47:07 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > src/hotspot/share/opto/machnode.cpp line 390: > >> 388: t = t->make_ptr(); >> 389: } >> 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { > > Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757135035 From stuefe at openjdk.org Thu Sep 12 16:08:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 12 Sep 2024 16:08:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:58:29 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147: > >> 145: #endif >> 146: >> 147: return true; > > This should only be in the compressedKlass.cpp file. Okay. I will remove the whole `CompressedKlassPointers::pd_initialize` logic. We only need it for one architecture (aarch) and one case (+UseCCP -UseCOH), so maybe its not worth fanning out across all platforms, including Zero. Instead, I will add a short `ifdef` section to `CompressedKlassPointers::initialize`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757169570 From coleenp at openjdk.org Thu Sep 12 17:37:15 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 12 Sep 2024 17:37:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 16:04:45 GMT, Thomas Stuefe wrote: >> src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147: >> >>> 145: #endif >>> 146: >>> 147: return true; >> >> This should only be in the compressedKlass.cpp file. > > Okay. I will remove the whole `CompressedKlassPointers::pd_initialize` logic. We only need it for one architecture (aarch) and one case (+UseCCP -UseCOH), so maybe its not worth fanning out across all platforms, including Zero. Instead, I will add a short `ifdef` section to `CompressedKlassPointers::initialize`. Yes, looking at this further, it does seem like a small amount of conditional compilation that sets all the same values that are set in the architecture independent version. It seems best to move it there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757300544 From sviswanathan at openjdk.org Thu Sep 12 23:17:13 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 12 Sep 2024 23:17:13 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes Message-ID: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. Summary of changes is as follows: 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code For the following source: public void test() { var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); index.selectFrom(inpvect).intoArray(byteres, j); } } The code generated for inner main now looks as follows: ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 0x00007f40d02274d0: movslq %ebx,%r13 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) 0x00007f40d022751f: add $0x40,%ebx 0x00007f40d0227522: cmp %r8d,%ebx 0x00007f40d0227525: jl 0x00007f40d02274d0 Best Regards, Sandhya ------------- Commit messages: - Merge branch 'master' of https://git.openjdk.java.net/jdk into rearrangewrap - Some cleanup - Some small fixes - Initial feedback - Optionally partial wrap shuffles during construction - Wrap shuffle on rearrange Changes: https://git.openjdk.org/jdk/pull/20634/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340079 Stats: 686 lines in 47 files changed: 548 ins; 30 del; 108 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From psandoz at openjdk.org Thu Sep 12 23:17:13 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 12 Sep 2024 23:17:13 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya API shapes are good! I see you intrinsified `selectFrom` which, IIUC, optimally generates C2 nodes that are functionally equivalent to the Java expression `v.rearrange(this.toShuffle())`. That way we can better generate an optimal set of instructions? Do you know what deficiencies there that blocks us from compiling the expression down to the same set of instructions as the intrinsic? Not suggesting we do that here, just for future reference. Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java. I think this is good enough to promote out of draft and create a CSR for the API changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2305377165 PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2305412450 PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2346848993 From sviswanathan at openjdk.org Thu Sep 12 23:17:14 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 12 Sep 2024 23:17:14 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> Message-ID: On Thu, 22 Aug 2024 18:21:50 GMT, Paul Sandoz wrote: > API shapes are good! > > I see you intrinsified `selectFrom` which, IIUC, optimally generates C2 nodes that are functionally equivalent to the Java expression `v.rearrange(this.toShuffle())`. That way we can better generate an optimal set of instructions? > > Do you know what deficiencies there that blocks us from compiling the expression down to the same set of instructions as the intrinsic? Not suggesting we do that here, just for future reference. Yes, I intrinsified to generate optimial set of instructions. In the expression `v.rearrange(this.toShuffle())` we will do first partial wrap as part of this.toShuffle() and then full wrap as part of rearrange. In the intrinsic I am only doing full wrap. Without intrinsic, if for whatever reason the this.toShuffle() is not moved out of the loop by the JIT, we incur additional overhead of the partial wrap in the hot code path. I saw this happening when the following is run as part of the jmh instead of being called from standalone java with a loop: var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); index.selectFrom(inpvect).intoArray(byteres, j); } The perf difference between the intrinsic and no intrinsic observed in this case then is about 20%. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2305521441 From rcastanedalo at openjdk.org Fri Sep 13 06:46:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Sep 2024 06:46:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> References: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> Message-ID: On Thu, 12 Sep 2024 15:42:59 GMT, Thomas Stuefe wrote: >> src/hotspot/share/opto/machnode.cpp line 390: >> >>> 388: t = t->make_ptr(); >>> 389: } >>> 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { >> >> Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. > > I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. I see, thanks. In that case, I would suggest removing the explicit `UseCompressedClassPointers` test, since it should be implied by `t->isa_narrowklass()`. `check_init()` within `CompressedKlassPointers::shift()` would already fail for the unexpected case where `t->isa_narrowklass() && !UseCompressedClassPointers`, no? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758270661 From rcastanedalo at openjdk.org Fri Sep 13 07:49:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Sep 2024 07:49:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Thu, 12 Sep 2024 15:38:18 GMT, Thomas Stuefe wrote: >> src/hotspot/share/opto/lcm.cpp line 272: >> >>> 270: const TypePtr* tptr; >>> 271: if ((UseCompressedOops || UseCompressedClassPointers) && >>> 272: (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) { >> >> Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled: >> >> (!UseCompressedOops, UseCompressedClassPointers, CompressedKlassPointers::shift() != 0) >> ( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0) > > Hi @robcasloz > > The `CompressedKlassPointers` utility class is not usable anymore with `-UseCompressedClassPointers`. One change is that if `UseCompressedClassPointers` is off, `CompressedKlassPointers` stays uninitialized. And that makes more sense then to rely on the static initialization values of `CompressedOops::_shift`. Thanks for the explanation. I wonder if the test is necessary at all, or one could simply use `base->get_ptr_type()` unconditionally, which defaults to `base->bottom_type()->isa_ptr()` anyway for non-compressed pointers. But this simplification would be in any case out of the scope of this changeset. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758356268 From rcastanedalo at openjdk.org Fri Sep 13 07:57:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Sep 2024 07:57:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Thu, 12 Sep 2024 11:46:35 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > src/hotspot/share/cds/filemap.cpp line 2457: > >> 2455: compressed_oops(), compressed_class_pointers()); >> 2456: if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) { >> 2457: log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is " > > The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary). This comment has been marked as "resolved" without any apparent action being taken, is that intentional? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758369787 From rkennke at openjdk.org Fri Sep 13 08:21:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 08:21:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: Message-ID: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Hide log timestamps in test to prevent false failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/9e008ac1..69f1ef1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=12-13 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Fri Sep 13 08:21:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 08:21:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: <99QfaesSJzBLGXsBKOdiSwjAdt18pwNMh62Pyhr-6bk=.b27f001b-e3e3-4826-9542-698eef2a9ee3@github.com> On Fri, 13 Sep 2024 07:54:30 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/cds/filemap.cpp line 2457: >> >>> 2455: compressed_oops(), compressed_class_pointers()); >>> 2456: if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) { >>> 2457: log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is " >> >> The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary). > > This comment has been marked as "resolved" without any apparent action being taken, is that intentional? I have merged your patch locally but forgot to push it. Sorry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758407575 From aturbanov at openjdk.org Fri Sep 13 09:30:18 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 13 Sep 2024 09:30:18 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 00:29:30 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > c1 and template generator fixes test/jdk/java/lang/Math/HyperbolicTests.java line 1011: > 1009: } > 1010: > 1011: for(int i = 0; i < testCases.length; i++) { Suggestion: for (int i = 0; i < testCases.length; i++) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1758522740 From stuefe at openjdk.org Fri Sep 13 09:30:24 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 09:30:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:19:32 GMT, Coleen Phillimore wrote: >> This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. >> >> Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too. > > Ok, in this case, that's fine if we already asserted. A fatal error is better. Actually, a lot of the old code had dusty side corners that were UB. Making narrowKlass smaller than 32bit exposed a lot of them, and a lot of the changes in and around CompressedKlassPointers are about cleanly making explicit what before had been implicit or just broken (e.g. a clear distinction between encoding range and Klass range, and a clear handling of narrowKlass bit width as a runtime value). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758522844 From stuefe at openjdk.org Fri Sep 13 09:38:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 09:38:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:13:58 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 243: >> >>> 241: } else { >>> 242: >>> 243: // In legacy mode, we try, in order of preference: >> >> Can you not use the word 'legacy' here? Maybe in "non-compact object header mode"... > > okay. I removed all traces of "legacy" and "tiny", reverting to "standard" or "non-coh" vs "coh". I would prefer to use the shorthand "coh" in some places since "compact object header mode" is a mouthful and gives me RSI :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758533732 From stefank at openjdk.org Fri Sep 13 09:44:19 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 09:44:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 08:21:54 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Hide log timestamps in test to prevent false failures I went over the oops/ directory and added a few cleanup requests and comments. src/hotspot/share/oops/instanceOop.hpp line 43: > 41: } else { > 42: return sizeof(instanceOopDesc); > 43: } This entire function can be removed. It returns the same value as oopDesc::base_offset_in_bytes(), but in a slightly different way. src/hotspot/share/oops/markWord.hpp line 171: > 169: return mask_bits(value(), lock_mask_in_place | self_fwd_mask_in_place) >= static_cast(marked_value); > 170: } > 171: Suggestion to retain code layout. Suggestion: src/hotspot/share/oops/markWord.inline.hpp line 29: > 27: > 28: #include "oops/markWord.hpp" > 29: #include "oops/compressedOops.inline.hpp" Suggestion: #include "oops/compressedOops.inline.hpp" #include "oops/markWord.hpp" src/hotspot/share/oops/objArrayKlass.cpp line 146: > 144: > 145: size_t ObjArrayKlass::oop_size(oop obj) const { > 146: // In this assert, we cannot safely access the Klass* with compact headers. I would like a comment stating that this assert is turned of because size_give_klass calls oop_size on an object that might be concurrently forwarded. src/hotspot/share/oops/oop.cpp line 158: > 156: // Only has a klass gap when compressed class pointers are used and not > 157: // using compact headers. > 158: return UseCompressedClassPointers && !UseCompactObjectHeaders; This comment can just be removed. src/hotspot/share/oops/oop.hpp line 340: > 338: // field offset. Use an offset halfway into the markWord, as the markWord is never > 339: // partially loaded from C2. > 340: return 4; I asked around to see what people felt about dropping references to mark_offset_in_bytes(), which we know is 0. There was a request to strive to use mark_offset_in_bytes() for clarity. Suggestion: return mark_offset_in_bytes() + 4; src/hotspot/share/oops/oop.hpp line 349: > 347: static int klass_gap_offset_in_bytes() { > 348: assert(has_klass_gap(), "only applicable to compressed klass pointers"); > 349: assert(!UseCompactObjectHeaders, "don't use klass_gap_offset_in_bytes() with compact headers"); This assert is implied by `has_klass_gap()`. I don't see the need to repeat it here. src/hotspot/share/oops/oop.hpp line 363: > 361: return sizeof(markWord) + sizeof(Klass*); > 362: } > 363: } Not a strong request for this PR, but there are many places that calculates almost the same thing, and it might be good to limit the number of places we do similar calculations. I'm wondering if it wouldn't be better for readability to structure the code as follows: static int header_size_in_bytes() { if (UseCompactObjectHeaders) { return sizeof(markWord); } else if (UseCompressedClassPointers) { return sizeof(markWord) + sizeof(narrowKlass); } else { return sizeof(markWord) + sizeof(Klass*); } } // Size of object header, aligned to platform wordSize static int header_size() { return align_up(header_size_in_bytes(), HeapWordSize) / HeapWordSize; } ... static int base_offset_in_bytes() { return header_size_in_bytes(); } src/hotspot/share/oops/oop.inline.hpp line 161: > 159: > 160: void oopDesc::set_klass_gap(HeapWord* mem, int v) { > 161: assert(!UseCompactObjectHeaders, "don't set Klass* gap with compact headers"); We might want to consider just simplifying the function to: void oopDesc::set_klass_gap(HeapWord* mem, int v) { assert(has_klass_gap(), "precondition"); *(int*)(((char*)mem) + klass_gap_offset_in_bytes()) = v; } src/hotspot/share/oops/oop.inline.hpp line 295: > 293: // Used by scavengers > 294: void oopDesc::forward_to(oop p) { > 295: assert(cast_from_oop(p) != this, Do we really need the cast here? ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2302542279 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758503206 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758482703 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758505713 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758479437 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758478106 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758472909 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758474349 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758528515 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758538380 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758540055 From stefank at openjdk.org Fri Sep 13 09:44:20 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 09:44:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 12:17:17 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/oop.cpp line 230: > >> 228: // disjunct below to fail if the two comparands are computed across such >> 229: // a concurrent change. >> 230: return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC); > > Is this still true after the recent changes like JDK-8311163? It might be worth waiting for. That bug doesn't fix all cases where the the length field is modified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758477168 From tschatzl at openjdk.org Fri Sep 13 11:15:15 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Sep 2024 11:15:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 09:00:32 GMT, Stefan Karlsson wrote: >> src/hotspot/share/oops/oop.cpp line 230: >> >>> 228: // disjunct below to fail if the two comparands are computed across such >>> 229: // a concurrent change. >>> 230: return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC); >> >> Is this still true after the recent changes like JDK-8311163? It might be worth waiting for. > > That bug doesn't fix all cases where the the length field is modified. Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163. The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here. If I am not missing some case, this whole method is unnecessary now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758672296 From stuefe at openjdk.org Fri Sep 13 12:51:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 12:51:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Tue, 10 Sep 2024 12:35:42 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 116: >> >>> 114: _range = end - _base; >>> 115: >>> 116: DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);) >> >> Can you refactor so the aarch64 path runs this same code without duplication? > > In tinycp mode, aarch64 runs this code though? The aarch64 variant of pd_initialize just returns then. In non-COH mode (preexisting, not touched by this patch) Aarch64 needs its own handling. I refactored: Now we should have no duplication (once my patch hits Romans PR branch) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758800913 From stefank at openjdk.org Fri Sep 13 12:51:18 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 12:51:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 11:10:58 GMT, Thomas Schatzl wrote: >> That bug doesn't fix all cases where the the length field is modified. > > Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163. > > The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here. > > If I am not missing some case, this whole method is unnecessary now. If you've already fixed this for GC then I agree that we could remove this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758805418 From stefank at openjdk.org Fri Sep 13 12:51:18 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 12:51:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 12:47:09 GMT, Stefan Karlsson wrote: >> Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163. >> >> The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here. >> >> If I am not missing some case, this whole method is unnecessary now. > > If you've already fixed this for GC then I agree that we could remove this. This seems like something that should be done as a separate patch that gets pushed before this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758808115 From rkennke at openjdk.org Fri Sep 13 12:56:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 12:56:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 09:39:23 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Hide log timestamps in test to prevent false failures > > src/hotspot/share/oops/oop.inline.hpp line 295: > >> 293: // Used by scavengers >> 294: void oopDesc::forward_to(oop p) { >> 295: assert(cast_from_oop(p) != this, > > Do we really need the cast here? Yes, otherwise compiler complains about ambiguous != operator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758815451 From rkennke at openjdk.org Fri Sep 13 13:03:16 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 13:03:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 09:31:39 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Hide log timestamps in test to prevent false failures > > src/hotspot/share/oops/oop.hpp line 363: > >> 361: return sizeof(markWord) + sizeof(Klass*); >> 362: } >> 363: } > > Not a strong request for this PR, but there are many places that calculates almost the same thing, and it might be good to limit the number of places we do similar calculations. > > I'm wondering if it wouldn't be better for readability to structure the code as follows: > > static int header_size_in_bytes() { > if (UseCompactObjectHeaders) { > return sizeof(markWord); > } else if (UseCompressedClassPointers) { > return sizeof(markWord) + sizeof(narrowKlass); > } else { > return sizeof(markWord) + sizeof(Klass*); > } > } > > // Size of object header, aligned to platform wordSize > static int header_size() { > return align_up(header_size_in_bytes(), HeapWordSize) / HeapWordSize; > } > ... > static int base_offset_in_bytes() { > return header_size_in_bytes(); > } Ok. I filed: https://bugs.openjdk.org/browse/JDK-8340118 for now, let's see if I can sort this out before integrating this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758825458 From rkennke at openjdk.org Fri Sep 13 13:11:45 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 13:11:45 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Various touch-ups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/69f1ef1d..990926f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=13-14 Stats: 25 lines in 8 files changed: 3 ins; 17 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stefank at openjdk.org Fri Sep 13 13:18:16 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 13:18:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 12:53:29 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/oop.inline.hpp line 295: >> >>> 293: // Used by scavengers >>> 294: void oopDesc::forward_to(oop p) { >>> 295: assert(cast_from_oop(p) != this, >> >> Do we really need the cast here? > > Yes, otherwise compiler complains about ambiguous != operator. OK, we shouldn't need to. It seems like I can silence the compiler by tweaking oopsHierarchy.hpp. I'll deal with that as a follow-up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758853099 From tschatzl at openjdk.org Fri Sep 13 13:51:16 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Sep 2024 13:51:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 12:48:53 GMT, Stefan Karlsson wrote: >> If you've already fixed this for GC then I agree that we could remove this. > > This seems like something that should be done as a separate patch that gets pushed before this PR. Will remove in JDK-8340119. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758906485 From epeter at openjdk.org Fri Sep 13 14:48:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Sep 2024 14:48:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 6 Sep 2024 18:08:04 GMT, Jatin Bhateja wrote: >> test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 1048: >> >>> 1046: return SHORT_GENERATOR_SELECT_FROM_TRIPLES.stream().map(List::toArray). >>> 1047: toArray(Object[][]::new); >>> 1048: } >> >> Just a control question: does this also occasionally generate examples with out-of-bounds indices? Negative out of bounds and positive out of bounds? > > Original API did throw IndexOutOfBoundsException, but later on we have moved away from exception throwing semantics to wrapping semantics. > Please find details at following comment > https://github.com/openjdk/jdk/pull/20508#issuecomment-2306344606 And do we test that the wrapping works correctly? >> test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 5812: >> >>> 5810: ShortVector bv = ShortVector.fromArray(SPECIES, b, i); >>> 5811: ShortVector idxv = ShortVector.fromArray(SPECIES, idx, i); >>> 5812: idxv.selectFrom(av, bv).intoArray(r, i); >> >> Would this test catch a bug where the backend would generate vectors that are too long or too short? > > Existing vectorAPI inline expansion entry points explicitly pass lane type and count as intrinsic arguments, this is used to create concrete ideal vector types. That does not answer my question. If the backend operations you implemented would have the wrong vector-length: do we have any tests that would catch that? Often that requires not just going "up" with a loop but also "counting down" with the loop iv. Do you know what I mean? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1758999902 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759002531 From epeter at openjdk.org Fri Sep 13 14:52:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Sep 2024 14:52:11 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> Message-ID: On Tue, 3 Sep 2024 11:45:53 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 544: > >> 542: byte[] vpayload1 = ((ByteVector)v1).vec(); >> 543: byte[] vpayload2 = ((ByteVector)v2).vec(); >> 544: byte[] vpayload3 = ((ByteVector)v3).vec(); > > Is there a reason you are not using more descriptive names here instead of `vpayload1`? > I also wonder if the `selectFromHelper` should not be named more specifically: `selectFromTwoVector(s)Helper`? You only gave me a thumbs up and no change - but comment resolved. Is that intentional? Makes me feel like you are ignoring my comments, and that discourages me from reviewing in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759008094 From epeter at openjdk.org Fri Sep 13 14:56:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Sep 2024 14:56:10 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 6 Sep 2024 18:13:34 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolutions. Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2349148857 From psandoz at openjdk.org Fri Sep 13 16:46:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 16:46:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 6 Sep 2024 18:08:09 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2770: >> >>> 2768: >>> 2769: /** >>> 2770: * Rearranges the lane elements of two vectors, selecting lanes >> >> I have a bit of a name concern here. Why are we calling it "select" and not "rearrange"? Because for a single "from" vector we also call it "rearrange", right? Is "select" not often synonymous to "blend", which works also with two "from" vectors, but with a mask and not indexing for "selection/rearranging"? > > We already have another flavor of [selectFrom](https://docs.oracle.com/en/java/javase/22/docs/api/jdk.incubator.vector/jdk/incubator/vector/Vector.html#selectFrom(jdk.incubator.vector.Vector)) which permutes single vector, new API extents its semantics to two vector selection, so we kept the nomenclature consistent. Select operates only on vectors where the `this` vector represents the indexes to *select* elements from the other vectors. Rearrange operates on vectors and a shuffle argument that *rearranges* elements from the other vectors. The former behavior can be specified in terms of the latter behavior, and ideally the equivalent expressions should result in ~same generated sequence of instructions. However, we are not there yet and need to further optimize shuffles to make that happen. But, we can optimize `selectFrom` with the dependent change to wrap indexes instead of throwing when out of bounds. (Separately there is an annoying issue with select, that we should not address in this PR. Using a Float/Double Vector for indexes is awkward.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759182233 From qamai at openjdk.org Fri Sep 13 17:23:05 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 13 Sep 2024 17:23:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2349535266 From jbhateja at openjdk.org Fri Sep 13 17:31:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 17:31:24 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya src/hotspot/share/opto/vectorIntrinsics.cpp line 2206: > 2204: const Type * byte_bt = Type::get_const_basic_type(T_BYTE); > 2205: const TypeVect * byte_vt = TypeVect::make(byte_bt, num_elem); > 2206: Node* byte_shuffle = gvn().transform(VectorCastNode::make(cast_vopc, v1, T_BYTE, num_elem)); We can be optimal here and prevent down casting and subsequent load shuffles in applicable scenarios, e.g. indexes held in integral vectors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1758203424 From jbhateja at openjdk.org Fri Sep 13 17:41:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 17:41:08 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 14:45:29 GMT, Emanuel Peter wrote: >> Existing vectorAPI inline expansion entry points explicitly pass lane type and count as intrinsic arguments, this is used to create concrete ideal vector types. > > That does not answer my question. If the backend operations you implemented would have the wrong vector-length: do we have any tests that would catch that? Often that requires not just going "up" with a loop but also "counting down" with the loop iv. Do you know what I mean? Patch includes tests for all the species (combination of vector type and sizes), each vector kernel is validated against equivalent scalar implementation, scenario which you are referring is implicitly handled though tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759246223 From sviswanathan at openjdk.org Fri Sep 13 18:20:07 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 18:20:07 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 17:20:40 GMT, Quan Anh Mai wrote: > Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). We will normalize the indices when we invoke `VectorShuffle::toVector` which I think is much less used than `Vector::toShuffle`. What do you think? The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2349763832 From sviswanathan at openjdk.org Fri Sep 13 18:27:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 18:27:06 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 05:30:36 GMT, Jatin Bhateja wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2206: > >> 2204: const Type * byte_bt = Type::get_const_basic_type(T_BYTE); >> 2205: const TypeVect * byte_vt = TypeVect::make(byte_bt, num_elem); >> 2206: Node* byte_shuffle = gvn().transform(VectorCastNode::make(cast_vopc, v1, T_BYTE, num_elem)); > > We can be optimal here and prevent down casting and subsequent load shuffles in applicable scenarios, e.g. indexes held in integral vectors. @jatin-bhateja If you could expand on this comment with specific cases it will be helpful. The loadShuffle generation is needed for platform specific handling of shuffles and cannot be optimized out here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759296031 From jbhateja at openjdk.org Fri Sep 13 18:30:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 18:30:08 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 14:53:18 GMT, Emanuel Peter wrote: > Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > > > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? > > You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? In case target does not directly support two vector selection instruction we lower the IR to its constituents, this is better than intrinsification failure as it saves costly vector boxing penalties. Think in terms of desired compiler IR and not rearrange API semantics, VectorRearrange IR node always expects a shape conformance b/w vector to be permuted and index vector, since shuffle indices are held in byte array based backing storage hence compiler injects VectorLoadShuffle nodes to upcast the byte vector lanes holding indexes to match the input vector lane. Since selectFrom API already passes the indexes through vector hence we can save emitting redundant toShuffle() + toVector() operations in all cases apart from some target specific scenarios e.g. AVX2 targets [do not support direct short vector permute]instruction "VPERMW", hence we need to [massage the index vector](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8771) to emulate desired permutation using byte permute instruction. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2349801299 From jbhateja at openjdk.org Fri Sep 13 18:40:53 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 18:40:53 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v9] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Documentation suggestions from Paul. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/d3ee3104..1c00f417 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=07-08 Stats: 36 lines in 1 file changed: 23 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Fri Sep 13 19:09:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 19:09:04 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 18:24:04 GMT, Sandhya Viswanathan wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 2206: >> >>> 2204: const Type * byte_bt = Type::get_const_basic_type(T_BYTE); >>> 2205: const TypeVect * byte_vt = TypeVect::make(byte_bt, num_elem); >>> 2206: Node* byte_shuffle = gvn().transform(VectorCastNode::make(cast_vopc, v1, T_BYTE, num_elem)); >> >> We can be optimal here and prevent down casting and subsequent load shuffles in applicable scenarios, e.g. indexes held in integral vectors. > > @jatin-bhateja If you could expand on this comment with specific cases it will be helpful. The loadShuffle generation is needed for platform specific handling of shuffles and cannot be optimized out here. Hi @sviswa7, I was suggesting emitting toShuffle() + toVector() only if it's needed under a target specific hook, since indexes are anyways passed though vector. Please let me know if you find blow explanation too constraining. https://github.com/openjdk/jdk/pull/20508#issuecomment-2349801299 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759345567 From sviswanathan at openjdk.org Fri Sep 13 19:17:04 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 19:17:04 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <2483R4bBJDN4UpBlRJSQVE2KjdctYIy0j__kzRGRDHc=.71baed04-9185-4111-a3ce-ce32a40cb570@github.com> On Fri, 13 Sep 2024 19:04:12 GMT, Jatin Bhateja wrote: >> @jatin-bhateja If you could expand on this comment with specific cases it will be helpful. The loadShuffle generation is needed for platform specific handling of shuffles and cannot be optimized out here. > > Hi @sviswa7, I was suggesting emitting toShuffle() + toVector() only if it's needed under a target specific hook, since indexes are anyways passed though vector. Please let me know if you find blow explanation too constraining. > https://github.com/openjdk/jdk/pull/20508#issuecomment-2349801299 I think VectorLoadShuffle removal optimizations should be a separate PR and well thought out. So far the contract has been that rearrange always gets the shuffle through VectorLoadShuffle and I would like to keep that contract in this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759361459 From psandoz at openjdk.org Fri Sep 13 19:48:04 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 19:48:04 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 18:17:21 GMT, Sandhya Viswanathan wrote: > > The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. Yes, we are trying to take smaller incremental steps. Once the we are done with this work we can step back and discuss/review what to do about shuffles. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2350039460 From psandoz at openjdk.org Fri Sep 13 20:09:05 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 20:09:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2439: > 2437: (v1, s_, m_) -> v1.uOp((i, a) -> { > 2438: int ei = s_.laneSource(i); > 2439: return ei < 0 || !m_.laneIsSet(i) ? 0 : v1.lane(ei); The `ei < 0` test is redundant. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2637: > 2635: * > 2636: * For each lane {@code N} of the shuffle, and for each lane > 2637: * source index {@code I=s.wrapIndex(s.laneSource(N))} in the shuffle, The pseudo code below starting at line 2644 needs adjusting to: Vector r = this.rearrange(s); return broadcast(0).blend(r, m); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2755: > 2753: * > 2754: * The result is the same as the expression > 2755: * {@code v.rearrange(this.toShuffle().wrapIndexes())}. Since we also adjusted `rearrange` the existing expression is fine, recommend no change here and to the mask accepting version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759431093 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759428672 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759418829 From sviswanathan at openjdk.org Fri Sep 13 22:30:36 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 22:30:36 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20634/files - new: https://git.openjdk.org/jdk/pull/20634/files/694aceb5..428f2289 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=00-01 Stats: 14 lines in 8 files changed: 0 ins; 4 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From duke at openjdk.org Fri Sep 13 22:31:01 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 13 Sep 2024 22:31:01 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v5] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: quad precision tanh tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/4aa52bfd..d4ddc313 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=03-04 Stats: 859 lines in 1 file changed: 859 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Fri Sep 13 22:33:12 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 13 Sep 2024 22:33:12 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: <0cm-1lCXQVJaCJbfp7evFyNvkLFxwUyfSukdy40aVxY=.7129df0b-d9a0-4a3b-b7f6-53f767b01ef6@github.com> On Wed, 11 Sep 2024 01:59:54 GMT, Joe Darcy wrote: >>> If the test is going to use randomness, then its jtreg tags should include >>> >>> `@key randomness` >>> >>> and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. >> >> Please see the test updated to use `@key randomness` and` jdk.test.lib.RandomFactory` to get and Random object. >> >>> The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. >>> For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). >>> >> So far the tests haven't failed with error of 2.5ulp. Would it be better to make it 5ulp? Please let me know. > > So far, this will be the only intrinsic implementation of tanh. Therefore, at the moment it is just checking the consistency of the intrinsic implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp accuracy, it would be expected to often be within 2.5 ulps of FDLIBM tanh. However, as written the regression test would not necessarily pass against any allowable Math.tanh implementation, which is the usual criteria for java.lang.Math tests that aren't otherwise constrained (such as by being limited to a given subset of platforms). > > If there was a correctly rounded tanh to compare against, then this style of testing would be valid. > > Are there any plan to intrinsify sinh or cosh? Hi Joe (@jddarcy), As suggested by Sandhya (@sviswa7), I added ~750 fixed point tests for tanh in `TanhTests.java` using the quad precision tanh implementation in libquadmath library from gcc. Please let me know if this looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1759579101 From sviswanathan at openjdk.org Fri Sep 13 22:33:18 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 22:33:18 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 19:45:11 GMT, Paul Sandoz wrote: >>> Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). We will normalize the indices when we invoke `VectorShuffle::toVector` which I think is much less used than `Vector::toShuffle`. What do you think? >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > > Yes, we are trying to take smaller incremental steps. Once the we are done with this work we can step back and discuss/review what to do about shuffles. @PaulSandoz Thanks a lot for the review. I have addressed your review comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2350535307 From duke at openjdk.org Fri Sep 13 22:36:45 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 13 Sep 2024 22:36:45 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v6] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update test/jdk/java/lang/Math/HyperbolicTests.java Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/d4ddc313..ca3314c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From sviswanathan at openjdk.org Fri Sep 13 23:13:20 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 23:13:20 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: <0cm-1lCXQVJaCJbfp7evFyNvkLFxwUyfSukdy40aVxY=.7129df0b-d9a0-4a3b-b7f6-53f767b01ef6@github.com> References: <0cm-1lCXQVJaCJbfp7evFyNvkLFxwUyfSukdy40aVxY=.7129df0b-d9a0-4a3b-b7f6-53f767b01ef6@github.com> Message-ID: On Fri, 13 Sep 2024 22:30:25 GMT, Srinivas Vamsi Parasa wrote: >> So far, this will be the only intrinsic implementation of tanh. Therefore, at the moment it is just checking the consistency of the intrinsic implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp accuracy, it would be expected to often be within 2.5 ulps of FDLIBM tanh. However, as written the regression test would not necessarily pass against any allowable Math.tanh implementation, which is the usual criteria for java.lang.Math tests that aren't otherwise constrained (such as by being limited to a given subset of platforms). >> >> If there was a correctly rounded tanh to compare against, then this style of testing would be valid. >> >> Are there any plan to intrinsify sinh or cosh? > > Hi Joe (@jddarcy), > > As suggested by Sandhya (@sviswa7), I added ~750 fixed point tests for tanh in `TanhTests.java` using the quad precision tanh implementation in libquadmath library from gcc. > > Please let me know if this looks good. @vamsi-parasa In my thoughts the best way to do this is add the additional tests points to HyperbolicTests.java itself in the testcases array of testTanh() method. We should remove all the other changes from HyperbolicTests.java. Also no need for separate TanhTests.java file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1759602199 From jbhateja at openjdk.org Sat Sep 14 09:11:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 14 Sep 2024 09:11:06 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: <2483R4bBJDN4UpBlRJSQVE2KjdctYIy0j__kzRGRDHc=.71baed04-9185-4111-a3ce-ce32a40cb570@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <2483R4bBJDN4UpBlRJSQVE2KjdctYIy0j__kzRGRDHc=.71baed04-9185-4111-a3ce-ce32a40cb570@github.com> Message-ID: On Fri, 13 Sep 2024 19:14:29 GMT, Sandhya Viswanathan wrote: >> Hi @sviswa7, I was suggesting emitting toShuffle() + toVector() only if it's needed under a target specific hook, since indexes are anyways passed though vector. Please let me know if you find blow explanation too constraining. >> https://github.com/openjdk/jdk/pull/20508#issuecomment-2349801299 > > I think VectorLoadShuffle removal optimizations should be a separate PR and well thought out. So far the contract has been that rearrange always gets the shuffle through VectorLoadShuffle and I would like to keep that contract in this PR. Hi @sviswa7, @PaulSandoz , I will modify PR#20508 accordingly to honor the contract at IR level and address VectorLoadShuffle optimization for both flavors of selectFrom API in a follow up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759701508 From stuefe at openjdk.org Sun Sep 15 06:17:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 15 Sep 2024 06:17:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 21:15:21 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor. > > diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp > index fd198f54fc9..7aa4bd24948 100644 > --- a/src/hotspot/share/oops/instanceKlass.cpp > +++ b/src/hotspot/share/oops/instanceKlass.cpp > @@ -511,7 +511,7 @@ InstanceKlass::InstanceKlass() { > } > > InstanceKlass::InstanceKlass(const ClassFileParser& parser, KlassKind kind, ReferenceType reference_type) : > - Klass(kind), > + Klass(kind, (!parser.is_interface() && !parser.is_abstract())), > _nest_members(nullptr), > _nest_host(nullptr), > _permitted_subclasses(nullptr), @coleenp > I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor. > I solved this differently (Roman will merge this into his PR). static markWord make_prototype(const Klass* kls) { markWord prototype = markWord::prototype(); #ifdef _LP64 if (UseCompactObjectHeaders) { // With compact object headers, the narrow Klass ID is part of the mark word. // We therfore seed the mark word with the narrow Klass ID. // Note that only those Klass that can be instantiated have a narrow Klass ID. // For those who don't, we leave the klass bits empty and assert if someone // tries to use those. const narrowKlass nk = CompressedKlassPointers::is_encodable(kls) ? CompressedKlassPointers::encode(const_cast(kls)) : 0; prototype = prototype.set_narrow_klass(nk); } #endif return prototype; } inline bool CompressedKlassPointers::is_encodable(const void* address) { check_init(_base); // An address can only be encoded if: // // 1) the address lies within the klass range. // 2) It is suitably aligned to 2^encoding_shift. This only really matters for // +UseCompactObjectHeaders, since the encoding shift can be large (max 10 bits -> 1KB). return is_aligned(address, klass_alignment_in_bytes()) && address >= _klass_range_start && address < _klass_range_end; } So, we put an nKlass into the prototype if we can. We can, if the Klass address is encodable. It is encodable if it lives in the encoded Klass range and is correctly aligned. No need to pass this information via another channel: its right there, in the Klass address. This works even before Klass is initialized. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2351399143 From stuefe at openjdk.org Sun Sep 15 06:17:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 15 Sep 2024 06:17:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 11:25:41 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/metablock.hpp line 51: > >> 49: size_t word_size() const { return _word_size; } >> 50: bool is_empty() const { return _base == nullptr; } >> 51: bool is_nonempty() const { return _base != nullptr; } > > Can `_base == nullptr` but `_word_size != 0`? No ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1759973362 From epeter at openjdk.org Sun Sep 15 07:19:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 15 Sep 2024 07:19:15 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 18:27:07 GMT, Jatin Bhateja wrote: > > Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > > > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? > > > > > > You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? > > In case target does not directly support two vector selection instruction we lower the IR to its constituents, this is better than intrinsification failure as it saves costly vector boxing penalties. > > Consider in terms of desired compiler IR and not rearrange API semantics, VectorRearrange IR node generally expects shape conformance b/w vector to be permuted and index vector, since shuffle indices are held in byte array based backing storage hence compiler injects VectorLoadShuffle nodes to upcast the byte vector lanes holding indexes to match the input vector lane. Since selectFrom API already passes the indexes through vector hence we can save emitting redundant toShuffle() + toVector() operations in some cases apart from some target specific scenarios e.g. AVX2 targets [do not support direct short vector permute]instruction "VPERMW", hence we need to [massage the index vector](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8771) to emulate desired permutation using byte permute instruction. > > VectorLoadShuffle abstraction hides target specific index massaging which is why adding a target specific hook like Matcher::vector_indexes_needs_pruning compiler to selectively emit VectorLoadShuffle. I still do not see a **definition of the semantics of RearrangeNode**: what inputs does it accept and what does it do with them? Can you put this explanation as comment in the code, please? It sounds like this is what the `massaging` / `pruning` is: `emulate desired permutation using byte permute instruction.` You should find an accordingly more suiting name for the method name. Maybe it is something like `must_emulate_permutation_with...`. Or maybe it is rather a `supported` kind of question? I leave that up to you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2351426263 From jbhateja at openjdk.org Mon Sep 16 02:58:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Sep 2024 02:58:41 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/1c00f417..7c80bfce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=08-09 Stats: 321 lines in 51 files changed: 57 ins; 97 del; 167 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Mon Sep 16 03:02:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Sep 2024 03:02:08 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <4IqtmftuGBNSj8_1HsI3x9eKBSf4QhpoKELYs1EanLE=.15ae8f1b-f586-403a-88d6-9193bba90fb2@github.com> On Sun, 15 Sep 2024 07:16:17 GMT, Emanuel Peter wrote: > > > Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > > > > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? > > > > > > > > > You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? > > > > > > In case target does not directly support two vector selection instruction we lower the IR to its constituents, this is better than intrinsification failure as it saves costly vector boxing penalties. > > Consider in terms of desired compiler IR and not rearrange API semantics, VectorRearrange IR node generally expects shape conformance b/w vector to be permuted and index vector, since shuffle indices are held in byte array based backing storage hence compiler injects VectorLoadShuffle nodes to upcast the byte vector lanes holding indexes to match the input vector lane. Since selectFrom API already passes the indexes through vector hence we can save emitting redundant toShuffle() + toVector() operations in some cases apart from some target specific scenarios e.g. AVX2 targets [do not support direct short vector permute]instruction "VPERMW", hence we need to [massage the index vector](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8771) to emulate desired permutation using byte permute instruction. > > VectorLoadShuffle abstraction hides target specific index massaging which is why adding a target specific hook like Matcher::vector_indexes_needs_pruning compiler to selectively emit VectorLoadShuffle. > > I still do not see a **definition of the semantics of RearrangeNode**: what inputs does it accept and what does it do with them? > > Can you put this explanation as comment in the code, please? > > It sounds like this is what the `massaging` / `pruning` is: `emulate desired permutation using byte permute instruction.` You should find an accordingly more suiting name for the method name. Maybe it is something like `must_emulate_permutation_with...`. Or maybe it is rather a `supported` kind of question? I leave that up to you. Hi @eme64 , As per discussion on [PR# 20634 ](https://github.com/openjdk/jdk/pull/20634#discussion_r1759701508), we plan to suppress VectorLoadShuffle bypassing optimization for now and address this as a follow up optimization for both the flavors of selectFrom API. I have addressed your comments. Kindly verify. Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2351944720 From rcastanedalo at openjdk.org Mon Sep 16 06:56:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 06:56:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> On Fri, 13 Sep 2024 13:11:45 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Various touch-ups src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576: > 2574: } else { > 2575: lea(dst, Address(obj, index, Address::lsl(scale))); > 2576: ldr(dst, Address(dst, offset)); Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1760617744 From epeter at openjdk.org Mon Sep 16 07:51:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:15 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> On Mon, 16 Sep 2024 02:58:41 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. Looks better, I still have a few comments. src/hotspot/share/opto/vectorIntrinsics.cpp line 2739: > 2737: return true; > 2738: } > 2739: @jatin-bhateja You still have 3x `unbox failed v1` here. I already commented this earlier, and you resolved it and gave it a thumbs up ? Can you please fix it now? src/hotspot/share/opto/vectornode.cpp line 2116: > 2114: const TypeVect* index_vect_type = index_vec->bottom_type()->is_vect(); > 2115: BasicType index_elem_bt = index_vect_type->element_basic_type(); > 2116: assert(!is_floating_point_type(index_elem_bt), ""); Why not verify this also in the constructor of `SelectFromTwoVectorNode`? Can you maybe explicitly verify what it must be rather than **not** be? src/hotspot/share/opto/vectornode.cpp line 2122: > 2120: // index format by subsequent VectorLoadShuffle. > 2121: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); > 2122: Node* index_byte_vec = phase->transform(VectorCastNode::make(cast_vopc, index_vec, T_BYTE, num_elem)); This cast assumes that the indices cannot have more than 8 bits. This would allow vector lengths of up to 256. This is fine for intel. But as far as I know ARM has in principle longer vectors - up to 2048 bytes. Should we maybe add some assert here to make sure we never badly truncate the index? src/hotspot/share/opto/vectornode.cpp line 2138: > 2136: > 2137: // Load indexes from byte vector and appropriatly massage them to target specific > 2138: // permutation index format. I would replace `massage` -> `transform` everywhere. src/hotspot/share/opto/vectornode.hpp line 1625: > 1623: Node* Ideal(PhaseGVN* phase, bool can_reshape); > 1624: virtual int Opcode() const; > 1625: }; `index` -> `indexes` because this is a vector, right? Otherwise I'll assume it is a scalar. Can you do some pseudo-code, that says how exactly the indices are interpreted? What if they are out of bounds? Does it wrap? Or assume they are in bounds? Undefined behaviour? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2305905569 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760651336 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760674461 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760678107 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760680772 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760665944 From epeter at openjdk.org Mon Sep 16 07:51:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:16 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:44:05 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/hotspot/share/opto/vectornode.cpp line 2104: > >> 2102: // MASK) >> 2103: // This shall prevent an intrinsification failure and associated argument >> 2104: // boxing penalties. > > A quick comment about how the mask is computed could be nice. > `MASK = INDEX < num_elem` @jatin-bhateja very nice, thanks! > src/hotspot/share/opto/vectornode.cpp line 2148: > >> 2146: >> 2147: BoolTest::mask pred = BoolTest::lt; >> 2148: ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred)); > > Would `as_ConI()` be a better alternative to the `(ConINode*)` cast? Please at least add a comment why you are not following my suggestion. I feel like the work I put in to review is not being respected when comments are just silently resolved without any action or comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760673419 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760656304 From epeter at openjdk.org Mon Sep 16 07:51:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Mon, 16 Sep 2024 07:27:06 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2148: >> >>> 2146: >>> 2147: BoolTest::mask pred = BoolTest::lt; >>> 2148: ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred)); >> >> Would `as_ConI()` be a better alternative to the `(ConINode*)` cast? > > Please at least add a comment why you are not following my suggestion. I feel like the work I put in to review is not being respected when comments are just silently resolved without any action or comment. I really do think that `as_ConI()` would be the right thing here. In product it is just a cast, and in debug at least we have an assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760657072 From epeter at openjdk.org Mon Sep 16 07:51:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:18 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> Message-ID: <4VKGFHuL8RSSll0Pnqgg5DeesBdXys8JOZT64yGUBG8=.58b88db6-58c0-49ea-b01c-d2d814a93cae@github.com> On Mon, 16 Sep 2024 07:35:46 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > > src/hotspot/share/opto/vectornode.hpp line 1625: > >> 1623: Node* Ideal(PhaseGVN* phase, bool can_reshape); >> 1624: virtual int Opcode() const; >> 1625: }; > > `index` -> `indexes` because this is a vector, right? Otherwise I'll assume it is a scalar. > Can you do some pseudo-code, that says how exactly the indices are interpreted? What if they are out of bounds? Does it wrap? Or assume they are in bounds? Undefined behaviour? For me good comments here would be immendely valuable, because it helps with other C2 optimizations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760667297 From epeter at openjdk.org Mon Sep 16 07:51:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:18 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <_U2DgK6DAW3ZJhozsMhwHzggUFpj5fnHdLJOoYFcNJA=.1875811f-458f-4834-bb94-339a8ff7360d@github.com> On Fri, 13 Sep 2024 17:38:29 GMT, Jatin Bhateja wrote: >> That does not answer my question. If the backend operations you implemented would have the wrong vector-length: do we have any tests that would catch that? Often that requires not just going "up" with a loop but also "counting down" with the loop iv. Do you know what I mean? > > Patch includes tests for all the species (combination of vector type and sizes), each vector kernel is validated against equivalent scalar implementation, scenario which you are referring is implicitly handled though tests. Ok, just so that I can relax, can you please point me to this test that would implicitly verify that the backend has chosen the correct vector size? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760671393 From rcastanedalo at openjdk.org Mon Sep 16 08:07:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 08:07:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 13:11:45 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Various touch-ups > * Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. An alternative that seems promising is to hide the object header klass pointer extraction and make it part of the `LoadNKlass` node semantics, as illustrated in this example: ![alternative-modeling](https://github.com/user-attachments/assets/06243966-3065-4969-a2dd-d05133b36366) `LoadNKlass` nodes can then be expanded into more primitive operations (load and shift for compact headers, load with `klass_offset_in_bytes()` for original headers) within C2's back-end or even during code emission as sketched [here](https://github.com/robcasloz/jdk/commit/6cb4219f101e3be982264071c2cb1d0af1c6d754). @rkennke is this similar to what you tried out ("Expanding it as a macro")? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2352253326 From rkennke at openjdk.org Mon Sep 16 12:38:00 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Sep 2024 12:38:00 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v16] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 53 commits: - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - Rework compressedklass encoding - remove stray debug output - Fixes post 8338526 - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4 - Various touch-ups - Hide log timestamps in test to prevent false failures - Revert accidental change of UCOH default - ... and 43 more: https://git.openjdk.org/jdk/compare/59778885...49c87547 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=15 Stats: 4605 lines in 190 files changed: 3252 ins; 724 del; 629 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Mon Sep 16 13:28:00 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Sep 2024 13:28:00 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v17] In-Reply-To: References: Message-ID: <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 54 commits: - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - Rework compressedklass encoding - remove stray debug output - Fixes post 8338526 - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4 - Various touch-ups - Hide log timestamps in test to prevent false failures - ... and 44 more: https://git.openjdk.org/jdk/compare/54595188...2125cd81 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=16 Stats: 4598 lines in 190 files changed: 3245 ins; 719 del; 634 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Mon Sep 16 13:31:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Sep 2024 13:31:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:13:01 GMT, Roman Kennke wrote: >>> @rkennke Can you please explain the changes in these tests: >>> >>> ``` >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >>> ``` >>> >>> You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >>> >>> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >>> >>> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >>> >>> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >>> >>> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). >> >> IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. >> >> I will re-evaluate those tests, and add comments or remove the restrictions. > >> > > @rkennke Can you please explain the changes in these tests: >> > > ``` >> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >> > > ``` >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >> > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >> > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >> > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >> > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). >> > >> > >> > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. >> > I will re-evaluate those tests, and add comments or remove the restrictions. >> >> If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ... > `LoadNKlass` nodes can then be expanded into more primitive operations (load and shift for compact headers, load with `klass_offset_in_bytes()` for original headers) within C2's back-end or even during code emission as sketched [here](https://github.com/robcasloz/jdk/commit/6cb4219f101e3be982264071c2cb1d0af1c6d754). @rkennke is this similar to what you tried out ("Expanding it as a macro")? No, this is not what I tried. I tried to completely expand LoadNKlass, and replace it with the lower nodes that load and shift the mark-word right there, in ideal graph. But your approach is saner: there is so much implicit knowledge about Load(N)Klass, and even klass_offset_in_bytes(), all over the place, it would be very hard to get this right without breaking something. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2352926265 From aboldtch at openjdk.org Mon Sep 16 16:21:20 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 16 Sep 2024 16:21:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v17] In-Reply-To: <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com> References: <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com> Message-ID: On Mon, 16 Sep 2024 13:28:00 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 54 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 > - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java > - Fix loop on aarch64 > - clarify obscure assert in metasapce setup > - Rework compressedklass encoding > - remove stray debug output > - Fixes post 8338526 > - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4 > - Various touch-ups > - Hide log timestamps in test to prevent false failures > - ... and 44 more: https://git.openjdk.org/jdk/compare/54595188...2125cd81 src/hotspot/cpu/aarch64/aarch64.ad line 6459: > 6457: format %{ "ldrw $dst, $mem\t# compressed class ptr" %} > 6458: ins_encode %{ > 6459: __ load_nklass_compact_c2($dst$$Register, $mem$$base$$Register, $mem$$index$$Register, $mem$$scale, $mem$$disp); I wonder if something along the line of this is required here. Suggestion: Address addr = mem2address($mem->opcode(), $mem$$base$$Register, $mem$$index, $mem$$scale, $mem$$disp); __ load_nklass_compact_c2($dst$$Register, __ adjust_compact_object_header_address_c2(addr, rscratch1)); With `adjust_compact_object_header_address_c2` being: ```C++ Address C2_MacroAssembler::adjust_compact_object_header_address_c2(Address addr, Register tmp) { // The incoming address is pointing into obj-start + klass_offset_in_bytes. We need to extract // obj-start, so that we can load from the object's mark-word instead. Usually the address // comes as obj-start in addr.base() and klass_offset_in_bytes in addr.offset(). if (addr.getMode() != Address::base_plus_offset) { lea(tmp, addr); addr = Address(tmp, -oopDesc::klass_offset_in_bytes()); } else { addr = Address(addr.base(), addr.offset() - oopDesc::klass_offset_in_bytes()); } return legitimize_address(addr, 8, tmp); } Maybe it is the case that we never get the case where `$mem->opcode()` is not `lsl` variant, nor that the offset is to far away for an immediate fixed by `legitimize_address`. But it seems like this would at least make those cases correct, while avoiding the `lea` in the common case. Maybe someone with better experience in aarch64 macroassembler+ad files and C2 can give an opinion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1761455581 From sgibbons at openjdk.org Mon Sep 16 16:36:11 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 16 Sep 2024 16:36:11 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v6] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 22:36:45 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update test/jdk/java/lang/Math/HyperbolicTests.java > > Co-authored-by: Andrey Turbanov I hand-verified the code. ------------- Marked as reviewed by sgibbons (Committer). PR Review: https://git.openjdk.org/jdk/pull/20657#pullrequestreview-2307159316 From psandoz at openjdk.org Mon Sep 16 16:47:11 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Sep 2024 16:47:11 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 22:30:36 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Java changes are good (I created a CSR). The approach in HotSpot looks good to me, but need HotSpot reviewers. ------------- Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2307180561 From sviswanathan at openjdk.org Mon Sep 16 17:02:09 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 16 Sep 2024 17:02:09 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <0QUwAu8wCrqU-BSbINCiBATZje4xib3rLEZKgG9mHhE=.fed2bc28-b4c3-417d-b4d6-3b5ce1e34c67@github.com> On Fri, 13 Sep 2024 19:45:11 GMT, Paul Sandoz wrote: >>> Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). We will normalize the indices when we invoke `VectorShuffle::toVector` which I think is much less used than `Vector::toShuffle`. What do you think? >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > > Yes, we are trying to take smaller incremental steps. Once the we are done with this work we can step back and discuss/review what to do about shuffles. @PaulSandoz Thanks a lot for the review and the CSR. I will look forward to Hotspot review and CSR progress/approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2353449454 From duke at openjdk.org Mon Sep 16 18:01:49 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 16 Sep 2024 18:01:49 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v7] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with three additional commits since the last revision: - Update HyperbolicTests.java Remove the path to random library - update copyright year and remove unused random from HyperbolicTests - remove tanh tests in seprate file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/ca3314c5..e908eb44 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=05-06 Stats: 1640 lines in 2 files changed: 735 ins; 892 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Mon Sep 16 18:22:56 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 16 Sep 2024 18:22:56 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v8] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - update tanh additional tests - Merge branch 'master' of https://git.openjdk.java.net/jdk into onetanh - Update HyperbolicTests.java Remove the path to random library - update copyright year and remove unused random from HyperbolicTests - remove tanh tests in seprate file - Update test/jdk/java/lang/Math/HyperbolicTests.java Co-authored-by: Andrey Turbanov - quad precision tanh tests - c1 and template generator fixes - update libm tanh reference test with code review suggestions - Add stub initialization and extra tanh tests - ... and 2 more: https://git.openjdk.org/jdk/compare/7110935c...3664be15 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/e908eb44..3664be15 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=06-07 Stats: 111889 lines in 2917 files changed: 66503 ins; 28786 del; 16600 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Mon Sep 16 18:22:56 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 16 Sep 2024 18:22:56 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: <0cm-1lCXQVJaCJbfp7evFyNvkLFxwUyfSukdy40aVxY=.7129df0b-d9a0-4a3b-b7f6-53f767b01ef6@github.com> Message-ID: On Fri, 13 Sep 2024 23:10:19 GMT, Sandhya Viswanathan wrote: >> Hi Joe (@jddarcy), >> >> As suggested by Sandhya (@sviswa7), I added ~750 fixed point tests for tanh in `TanhTests.java` using the quad precision tanh implementation in libquadmath library from gcc. >> >> Please let me know if this looks good. > > @vamsi-parasa In my thoughts the best way to do this is add the additional tests points to HyperbolicTests.java itself in the testcases array of testTanh() method. We should remove all the other changes from HyperbolicTests.java. Also no need for separate TanhTests.java file. Hi Sandhya(@sviswa7), please see the updated code in `HyperbolicTests.java` which removes the previous random based tests with the new fixed point tests. Also removed the `TanhTests.java`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1761642227 From psandoz at openjdk.org Mon Sep 16 18:47:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Sep 2024 18:47:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Mon, 16 Sep 2024 02:58:41 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 561: > 559: for (int i = 0; i < vlen; i++) { > 560: int index = ((int)vecPayload1[i]); > 561: res[i] = index >= vlen ? vecPayload3[index & (vlen - 1)] : vecPayload2[index]; This is incorrect as the index could be negative. You need to wrap in the range `[0, 2 * vlen - 1]` before the comparison and selection. int index = ((int)vecPayload1[i]) & ((vlen << 1) - 1)); res[i] = index < vlen ? vecPayload2[index] : vecPayload3[index - vlen]; src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2974: > 2972: final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, > 2973: $abstractvectortype$ v1, $abstractvectortype$ v2) { > 2974: int twoVectorLen = length() * 2; We should assert that the length is a power of two. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1761663646 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1761667602 From psandoz at openjdk.org Mon Sep 16 21:21:11 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Sep 2024 21:21:11 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Mon, 16 Sep 2024 02:58:41 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2970: > 2968: > 2969: > 2970: /*package-private*/ I think we can simplify with: /*package-private*/ @ForceInline final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, $abstractvectortype$ v1, $abstractvectortype$ v2) { int twoVectorLenMask = (length() << 1) - 1; #if[FP] Vector<$Boxbitstype$> wrapped_indexes = this.convert(VectorOperators.{#if[intOrFloat]?F2I:D2L}, 0) .lanewise(VectorOperators.AND, twoVectorLenMask); return VectorSupport.selectFromTwoVectorOp(getClass(), indexVecClass , $type$.class, $bitstype$.class, length(), wrapped_indexes, v1, v2, (vec1, vec2, vec3) -> selectFromTwoVectorHelper(vec1, vec2, vec3) ); #else[FP] $abstractvectortype$ wrapped_indexes = this.lanewise(VectorOperators.AND, twoVectorLenMask); return VectorSupport.selectFromTwoVectorOp(getClass(), indexVecClass, $type$.class, $type$.class, length(), wrapped_indexes, v1, v2, (vec1, vec2, vec3) -> selectFromTwoVectorHelper(vec1, vec2, vec3) ); #end[FP] } (Note that's without the assert - see separate comment). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1761977004 From duke at openjdk.org Mon Sep 16 23:01:08 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 16 Sep 2024 23:01:08 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v9] In-Reply-To: References: Message-ID: <1TAqO7DOjjkXpbdTmsDbByq9kxPnaX1Ev57KnKWakjQ=.e0c25c34-fc49-445a-8cd2-7dd0fae64e80@github.com> > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' of https://git.openjdk.java.net/jdk into onetanh - update tanh additional tests - Merge branch 'master' of https://git.openjdk.java.net/jdk into onetanh - Update HyperbolicTests.java Remove the path to random library - update copyright year and remove unused random from HyperbolicTests - remove tanh tests in seprate file - Update test/jdk/java/lang/Math/HyperbolicTests.java Co-authored-by: Andrey Turbanov - quad precision tanh tests - c1 and template generator fixes - update libm tanh reference test with code review suggestions - ... and 3 more: https://git.openjdk.org/jdk/compare/e48dd57e...b438555e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/3664be15..b438555e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=07-08 Stats: 344 lines in 9 files changed: 329 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Tue Sep 17 00:41:20 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 17 Sep 2024 00:41:20 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v10] In-Reply-To: References: Message-ID: <2GF3hOTuSrN9ejN_aaNWV6g5zfBdVJ93kiPjIiPUKQE=.c61b7a6c-edd9-4edd-b866-6d4969591c8a@github.com> > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add call to the additional tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/b438555e..1ee4c1fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=08-09 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Tue Sep 17 04:41:20 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 17 Sep 2024 04:41:20 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v11] In-Reply-To: References: Message-ID: <4eB1-LVi9xoS-lksSPbpyf39ebjZvsaqKQrP0XSMOTE=.48731919-0181-4aeb-97f6-ea2f22ac3410@github.com> > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove -ve tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/1ee4c1fe..aa163896 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=09-10 Stats: 727 lines in 1 file changed: 0 ins; 364 del; 363 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From jbhateja at openjdk.org Tue Sep 17 07:10:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:54 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v11] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Jcheck clearance - Review comments resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/7c80bfce..29530047 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=09-10 Stats: 402 lines in 41 files changed: 98 ins; 98 del; 206 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Tue Sep 17 07:10:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:54 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> Message-ID: On Mon, 16 Sep 2024 07:45:51 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > > src/hotspot/share/opto/vectornode.cpp line 2122: > >> 2120: // index format by subsequent VectorLoadShuffle. >> 2121: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); >> 2122: Node* index_byte_vec = phase->transform(VectorCastNode::make(cast_vopc, index_vec, T_BYTE, num_elem)); > > This cast assumes that the indices cannot have more than 8 bits. This would allow vector lengths of up to 256. This is fine for intel. But as far as I know ARM has in principle longer vectors - up to 2048 bytes. Should we maybe add some assert here to make sure we never badly truncate the index? Shuffle overall is on our todo list, its a know limitation which we tried lifting once, yes you read it correctly, its a limitation for AARCH64 SVE once a 2048 bits vector systems are available, IIRC current max vector size on any available AARCH64 system is 256 bits, with Neoverse V2 they shrink the vector size back to 16 bytes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504446 From jbhateja at openjdk.org Tue Sep 17 07:10:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:54 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <8QUaed-UNR5ura5MXAeccEXQgaSOUaM_JCHvrUUeCVE=.d895b3db-6e3c-4351-9147-81eb303536f9@github.com> On Mon, 16 Sep 2024 07:27:44 GMT, Emanuel Peter wrote: >> Please at least add a comment why you are not following my suggestion. I feel like the work I put in to review is not being respected when comments are just silently resolved without any action or comment. > > I really do think that `as_ConI()` would be the right thing here. In product it is just a cast, and in debug at least we have an assert. DONE **It just got overlooked @eme64, we respect reviewer suggestions and value the time you invest in polishing our patches, thanks again :-)** ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504618 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504671 From jbhateja at openjdk.org Tue Sep 17 07:10:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:55 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Mon, 16 Sep 2024 18:35:42 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 561: > >> 559: for (int i = 0; i < vlen; i++) { >> 560: int index = ((int)vecPayload1[i]); >> 561: res[i] = index >= vlen ? vecPayload3[index & (vlen - 1)] : vecPayload2[index]; > > This is incorrect as the index could be negative. You need to wrap in the range `[0, 2 * vlen - 1]` before the comparison and selection. > > int index = ((int)vecPayload1[i]) & ((vlen << 1) - 1)); > res[i] = index < vlen ? vecPayload2[index] : vecPayload3[index - vlen]; Hi @PaulSandoz , we already pass wrapped indexes to this helper routine called from fallback implementation. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2974: > >> 2972: final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, >> 2973: $abstractvectortype$ v1, $abstractvectortype$ v2) { >> 2974: int twoVectorLen = length() * 2; > > We should assert that the length is a power of two. API only accepts vector parameters and there is no means though public facing API to create a vector of NPOT sizes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504366 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504318 From jbhateja at openjdk.org Tue Sep 17 07:10:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:55 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 14:43:42 GMT, Emanuel Peter wrote: >> Original API did throw IndexOutOfBoundsException, but later on we have moved away from exception throwing semantics to wrapping semantics. >> Please find details at following comment >> https://github.com/openjdk/jdk/pull/20508#issuecomment-2306344606 > > And do we test that the wrapping works correctly? VectorAPI Jtreg framework is based on testNG, our custom data providers associated with various test methods ensure to generates range of values which are beyond valid index range, this should check the wrapping scenarios. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504894 From jbhateja at openjdk.org Tue Sep 17 07:10:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:55 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: <_U2DgK6DAW3ZJhozsMhwHzggUFpj5fnHdLJOoYFcNJA=.1875811f-458f-4834-bb94-339a8ff7360d@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <_U2DgK6DAW3ZJhozsMhwHzggUFpj5fnHdLJOoYFcNJA=.1875811f-458f-4834-bb94-339a8ff7360d@github.com> Message-ID: On Mon, 16 Sep 2024 07:40:33 GMT, Emanuel Peter wrote: >> Patch includes tests for all the species (combination of vector type and sizes), each vector kernel is validated against equivalent scalar implementation, scenario which you are referring is implicitly handled though tests. > > Ok, just so that I can relax, can you please point me to this test that would implicitly verify that the backend has chosen the correct vector size? Each test method validates the intrinsic code against equivalent scalar implementation, it should catch if backend emits instruction with incorrect vector size. https://github.com/openjdk/jdk/pull/20508/files#diff-95c582657bf90bef3530e67cb143865d070fd2e8e4538849e3dce6061b0d5f2dR4863 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504831 From rkennke at openjdk.org Tue Sep 17 09:35:02 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 17 Sep 2024 09:35:02 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 - Fixes post-8340184 - Merge upstream up to and including 8340184 - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - Rework compressedklass encoding - remove stray debug output - Fixes post 8338526 - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=17 Stats: 4518 lines in 190 files changed: 3180 ins; 718 del; 620 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stuefe at openjdk.org Tue Sep 17 10:02:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:02:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:25:37 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.hpp line 81: > >> 79: metaspace::MetaspaceArena* class_space_arena() const { return _class_space_arena; } >> 80: >> 81: bool have_class_space_arena() const { return _class_space_arena != nullptr; } > > This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers` I'd prefer not to. This logic (when -UCCP class space arena is NULL, with the implicit assumption that both are different entities) has been in there forever, and changing that is out of scope for and unrelated to this PR. I am not sure what will break if I change this but don't want to chase risk test errors at this point (one example, reporting would have to be adapted to recognize that both arenas are the same, and there are plenty of tests that would also need to be fixd). This can be done in a follow-up RFE if necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762917467 From stuefe at openjdk.org Tue Sep 17 10:05:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:05:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:05:10 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.cpp line 165: > >> 163: MetaBlock bl(ptr, word_size); >> 164: // If the block would be reusable for a Klass, add to class arena, otherwise to >> 165: // then non-class arena. > > Nit: spelling, "the" Okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762928041 From stuefe at openjdk.org Tue Sep 17 10:16:24 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:16:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:50:59 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace.cpp line 656: > >> 654: // Adjust size of the compressed class space. >> 655: >> 656: const size_t res_align = reserve_alignment(); > > Can you change the name to `root_chunk_size`? It feels wrong, since this is a deeply hidden implementation detail.\ I will remove this temporary variable, which will also make the diff smaller. > src/hotspot/share/memory/metaspace.hpp line 112: > >> 110: static size_t max_allocation_word_size(); >> 111: >> 112: // Minimum allocation alignment, in bytes. All MetaData shall be aligned correclty > > Nit: Spelling, "correctly" Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762968742 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762972938 From stuefe at openjdk.org Tue Sep 17 10:23:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:23:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 11:25:56 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/metablock.hpp line 48: > >> 46: >> 47: MetaWord* base() const { return _base; } >> 48: const MetaWord* end() const { return _base + _word_size; } > > `assert(is_nonempty())` Raises the question of why here and not in other accessors? Note that the only patch via which end() is called already asserts for non-empty-ness (MetaspaceArena::contains). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762985723 From jsjolen at openjdk.org Tue Sep 17 10:31:19 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 17 Sep 2024 10:31:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 09:59:49 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/classLoaderMetaspace.hpp line 81: >> >>> 79: metaspace::MetaspaceArena* class_space_arena() const { return _class_space_arena; } >>> 80: >>> 81: bool have_class_space_arena() const { return _class_space_arena != nullptr; } >> >> This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers` > > I'd prefer not to. > > This logic (when -UCCP class space arena is NULL, with the implicit assumption that both are different entities) has been in there forever, and changing that is out of scope for and unrelated to this PR. I am not sure what will break if I change this but don't want to risk test errors at this point (one example, reporting would have to be adapted to recognize that both arenas are the same, and there are plenty of tests that would also need to be fixd). > > This can be done in a follow-up RFE if necessary. OK, that's fine. >> src/hotspot/share/memory/metaspace.cpp line 656: >> >>> 654: // Adjust size of the compressed class space. >>> 655: >>> 656: const size_t res_align = reserve_alignment(); >> >> Can you change the name to `root_chunk_size`? > > It feels wrong, since this is a deeply hidden implementation detail.\ > > I will remove this temporary variable, which will also make the diff smaller. Sounds OK, I wanted the name change to indicate that "hey, deep impl detail where we use this to mean something else". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762993568 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762994772 From stuefe at openjdk.org Tue Sep 17 10:31:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:31:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: <32_SIVHDWyZyYSvbV1jUHc631MTKUP2Thh_M9Q71jrc=.351aed23-599d-4a53-9cc0-0e9c85ecdf03@github.com> On Wed, 11 Sep 2024 11:29:38 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/metablock.hpp line 52: > >> 50: bool is_empty() const { return _base == nullptr; } >> 51: bool is_nonempty() const { return _base != nullptr; } >> 52: void reset() { _base = nullptr; _word_size = 0; } > > Is this function really necessary? According to my IDE it's only used in tests and even then the `MetaBlock` isn't used afterwards (so it has no effect). see test_clms.cpp, test_random function, used in two places there. > src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 84: > >> 82: // between threads and needs to be synchronized in CLMS. >> 83: >> 84: const size_t _allocation_alignment_words; > > Nit: Document this? All other members are documented. ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762993378 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762995731 From stuefe at openjdk.org Tue Sep 17 10:31:23 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:31:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 11:40:24 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: >> >> - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 >> - Fixes post-8340184 >> - Merge upstream up to and including 8340184 >> - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 >> - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java >> - Fix loop on aarch64 >> - clarify obscure assert in metasapce setup >> - Rework compressedklass encoding >> - remove stray debug output >> - Fixes post 8338526 >> - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed > > src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 44: > >> 42: class FreeBlocks; >> 43: >> 44: struct ArenaStats; > > Nit: Sort? ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762994972 From stuefe at openjdk.org Tue Sep 17 10:45:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:45:14 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:57:31 GMT, Doug Simon wrote: >> https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstant.java line 29: > >> 27: /** >> 28: * Marker interface for hotspot specific constants. >> 29: */ > > Let's take this opportunity to improve this javadoc: > > /** > * A value in a space managed by Hotspot (e.g. heap or metaspace). > * Some of these values can be referenced with a compressed pointer (32 bits) > * instead of a full word-sized pointer. > */ drive-by comment, 32-bit is an implementation detail. The width of a narrowKlass will be adjustable with the upcoming JEP450. Referring to 32 bit may be obsolete soon. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20949#discussion_r1763011967 From jsjolen at openjdk.org Tue Sep 17 10:47:25 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 17 Sep 2024 10:47:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 09:35:02 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: > > - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 > - Fixes post-8340184 > - Merge upstream up to and including 8340184 > - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 > - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java > - Fix loop on aarch64 > - clarify obscure assert in metasapce setup > - Rework compressedklass encoding > - remove stray debug output > - Fixes post 8338526 > - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed Hi, We've gone through the rest of the Metaspace code and looked at the tests. It looks OK to us. Would like to see some style cleanups in the tests, but that can wait as a follow up. test/hotspot/gtest/metaspace/test_clms.cpp line 193: > 191: > 192: { > 193: // Nonclass arena allocation. The style in this source file isn't really up to scratch, especially *these* lines. Anyway, it's in the tests, so I'm OK with this being fixed in a follow up RFE. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2309360771 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1763005291 From rkennke at openjdk.org Tue Sep 17 12:52:03 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 17 Sep 2024 12:52:03 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - CompressedKlassPointers::is_encodable shall be callable with -UseCCP - Johan review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/28a26aed..612d3045 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=17-18 Stats: 39 lines in 7 files changed: 22 ins; 8 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From psandoz at openjdk.org Tue Sep 17 17:03:14 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 17 Sep 2024 17:03:14 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Tue, 17 Sep 2024 07:02:15 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 561: >> >>> 559: for (int i = 0; i < vlen; i++) { >>> 560: int index = ((int)vecPayload1[i]); >>> 561: res[i] = index >= vlen ? vecPayload3[index & (vlen - 1)] : vecPayload2[index]; >> >> This is incorrect as the index could be negative. You need to wrap in the range `[0, 2 * vlen - 1]` before the comparison and selection. >> >> int index = ((int)vecPayload1[i]) & ((vlen << 1) - 1)); >> res[i] = index < vlen ? vecPayload2[index] : vecPayload3[index - vlen]; > > Hi @PaulSandoz , we already pass wrapped indexes to this helper routine called from fallback implementation. Opps yes, the masking was throwing me off. Can you please add a comment and/or rename the parameters e.g., so `v1` is renamed to `wrappedIndex`? Also i would recommend not doing the masking, it is very misleading and instead do the subtraction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1763582764 From psandoz at openjdk.org Tue Sep 17 17:07:16 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 17 Sep 2024 17:07:16 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Tue, 17 Sep 2024 07:02:12 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2974: >> >>> 2972: final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, >>> 2973: $abstractvectortype$ v1, $abstractvectortype$ v2) { >>> 2974: int twoVectorLen = length() * 2; >> >> We should assert that the length is a power of two. > > API only accepts vector parameters and there is no means though public facing API to create a vector of NPOT sizes. https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java#L842C58-L843C27 You missed the first bit of the sentence linked to "With the possible exception of the {@linkplain VectorShape#S_Max_BIT maximum shape}". In generally the specification avoids assuming POT where it is not explicitly stated (i.e., the constant shapes). In this case we align with the specification of `VectorShuffle::wrapIndex`. We don't need to implement NPOT but we need a reminder in the implementation where we make that assumption. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1763587293 From psandoz at openjdk.org Tue Sep 17 18:24:05 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 17 Sep 2024 18:24:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> Message-ID: On Thu, 22 Aug 2024 18:43:56 GMT, Paul Sandoz wrote: > Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java. Another related link to base 64 decoding https://github.com/simdutf/SimdBase64/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2356611024 From sviswanathan at openjdk.org Tue Sep 17 18:42:05 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 17 Sep 2024 18:42:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> Message-ID: On Tue, 17 Sep 2024 18:21:43 GMT, Paul Sandoz wrote: > > Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java. > > Another related link to base 64 decoding https://github.com/simdutf/SimdBase64/ Thanks Paul! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2356639277 From jbhateja at openjdk.org Wed Sep 18 07:21:52 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 07:21:52 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v12] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Incorporating review and documentation suggestions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/29530047..31a58642 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=10-11 Stats: 96 lines in 8 files changed: 25 ins; 0 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Wed Sep 18 07:21:52 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 07:21:52 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> Message-ID: On Fri, 13 Sep 2024 14:49:01 GMT, Emanuel Peter wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 544: >> >>> 542: byte[] vpayload1 = ((ByteVector)v1).vec(); >>> 543: byte[] vpayload2 = ((ByteVector)v2).vec(); >>> 544: byte[] vpayload3 = ((ByteVector)v3).vec(); >> >> Is there a reason you are not using more descriptive names here instead of `vpayload1`? >> I also wonder if the `selectFromHelper` should not be named more specifically: `selectFromTwoVector(s)Helper`? > > You only gave me a thumbs up and no change - but comment resolved. Is that intentional? Makes me feel like you are ignoring my comments, and that discourages me from reviewing in the future. Routine was renamed as per you suggestion and first vector argument also appropriately renamed to wrappedIndex. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1764527888 From yzheng at openjdk.org Wed Sep 18 09:52:06 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 18 Sep 2024 09:52:06 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 10:42:12 GMT, Thomas Stuefe wrote: >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstant.java line 29: >> >>> 27: /** >>> 28: * Marker interface for hotspot specific constants. >>> 29: */ >> >> Let's take this opportunity to improve this javadoc: >> >> /** >> * A value in a space managed by Hotspot (e.g. heap or metaspace). >> * Some of these values can be referenced with a compressed pointer (32 bits) >> * instead of a full word-sized pointer. >> */ > > drive-by comment, 32-bit is an implementation detail. The width of a narrowKlass will be adjustable with the upcoming JEP450. Referring to 32 bit may be obsolete soon. Thanks for the note! By adjustable you mean it can go beyond 32 bits? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20949#discussion_r1764760056 From stuefe at openjdk.org Wed Sep 18 10:12:09 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 18 Sep 2024 10:12:09 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: <7TkgOs_GnhgNa7qogZoIsm3QPfzxx6yMXUSCE8wpFm0=.84f010ef-20fe-4799-863d-c84b84e1b570@github.com> On Wed, 18 Sep 2024 09:49:48 GMT, Yudi Zheng wrote: >> drive-by comment, 32-bit is an implementation detail. The width of a narrowKlass will be adjustable with the upcoming JEP450. Referring to 32 bit may be obsolete soon. > > Thanks for the note! By adjustable you mean it can go beyond 32 bits? No, it will be smaller. Lilliput 1 will probably ship with 22 bit narrowKlass, and we may reduce this further. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20949#discussion_r1764786815 From rkennke at openjdk.org Wed Sep 18 12:11:31 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:11:31 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Mon, 16 Sep 2024 06:53:42 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Various touch-ups > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576: > >> 2574: } else { >> 2575: lea(dst, Address(obj, index, Address::lsl(scale))); >> 2576: ldr(dst, Address(dst, offset)); > > Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well? AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like r27[nklass]+offset, that's why we need to lea the r27[nklass] part first. Yes, this also happens on x86, but x86 supports rX[nklass]+offset addressing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1764937842 From epeter at openjdk.org Wed Sep 18 12:18:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:18:11 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v12] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Wed, 18 Sep 2024 07:21:52 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Incorporating review and documentation suggestions. Generally, from a C2 point of view this looks good now. ------------- PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2312501448 From epeter at openjdk.org Wed Sep 18 12:18:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:18:12 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> Message-ID: On Tue, 17 Sep 2024 07:02:20 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectornode.cpp line 2122: >> >>> 2120: // index format by subsequent VectorLoadShuffle. >>> 2121: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); >>> 2122: Node* index_byte_vec = phase->transform(VectorCastNode::make(cast_vopc, index_vec, T_BYTE, num_elem)); >> >> This cast assumes that the indices cannot have more than 8 bits. This would allow vector lengths of up to 256. This is fine for intel. But as far as I know ARM has in principle longer vectors - up to 2048 bytes. Should we maybe add some assert here to make sure we never badly truncate the index? > > Shuffle overhaul is on our todo list, its a know limitation which we tried lifting once, yes you read it correctly, its a limitation for AARCH64 SVE once a 2048 bits vector systems are available, IIRC current max vector size on any available AARCH64 system is 256 bits, with Neoverse V2 they shrink the vector size back to 16 bytes. Are there any asserts that would catch this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1764943566 From rkennke at openjdk.org Wed Sep 18 12:25:50 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:25:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v20] In-Reply-To: References: Message-ID: <1o2b4fxBhqrlRqkNwKqZD1mgRNfTM16_NHZweEbd9SI=.1f68868b-1b98-4f78-9d37-2a805ffc932b@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 60 commits: - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - CompressedKlassPointers::is_encodable shall be callable with -UseCCP - Johan review feedback - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 - Fixes post-8340184 - Merge upstream up to and including 8340184 - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - ... and 50 more: https://git.openjdk.org/jdk/compare/19b2cee4...bb641621 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=19 Stats: 4525 lines in 190 files changed: 3194 ins; 718 del; 613 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From yzheng at openjdk.org Wed Sep 18 12:25:51 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 18 Sep 2024 12:25:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 12:52:03 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - CompressedKlassPointers::is_encodable shall be callable with -UseCCP > - Johan review feedback Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2358324621 From epeter at openjdk.org Wed Sep 18 12:26:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:26:12 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 22:30:36 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments I'm a bit confused by the name `shuffleWrapIndexes` and `inline_vector_shuffle_wrap_indexes`. Are you **shuffling wrap-indexes**? I don't know what that would even mean. I think you should name it `wrapShuffleIndexes`. Or is there any naming convention in the VectorAPI that prevents this? ------------- PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2312528484 From rkennke at openjdk.org Wed Sep 18 12:38:21 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:38:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 19:04:13 GMT, Chris Plummer wrote: >> I pulled your changes and I see one slight difference in the output. The following line is missing: >> >> `_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject` >> >> I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output: >> >> _mark: 16294762323640321 >> >> So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this. > > Thinking about this a bit more, maybe _mark needs to be a MetadataFile rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two seprate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case. Do you think this needs to be addressed before integration? And if so, could you help with implementation? Or could we do it after intergration? Then please file a follow-up issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1764976086 From rkennke at openjdk.org Wed Sep 18 12:59:25 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:59:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Mon, 9 Sep 2024 18:30:21 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: >> >> - Print as warning when UCOH doesn't match in CDS archive >> - Improve initialization of mark-word in CDS ArchiveHeapWriter >> - Simplify getKlass() in SA >> - Simplify oopDesc::init_mark() >> - Get rid of forward_safe_* methods >> - GCForwarding touch-ups > > src/hotspot/share/oops/markWord.inline.hpp line 90: > >> 88: ShouldNotReachHere(); >> 89: return markWord(); >> 90: #endif > > Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits? Kindof. The problem is that klass_shift is larger than 31, and shifting with it would thus be UB and generate a compiler warning. I opted to simply not compile any of that code in 32bit builds. We could also define klass_shift differently on 32bit. Long-term (maybe with Lilliput2/4-byte-headers?) it would be nice to consolidate the header layout between 32 and 64 bit builds and not make any distinction anywhere. E.g. define markWord (or objectHeader?) in a single way, and use that to extract all the relevant stuff. It's not totally unlikely that we deprecate 32-bit builds before that can happen, though. > src/hotspot/share/oops/oop.inline.hpp line 90: > >> 88: } else { >> 89: return markWord::prototype(); >> 90: } > > Could this be unconditional since prototoype_header is initialized for all Klasses? yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765003983 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765006669 From rkennke at openjdk.org Wed Sep 18 13:23:44 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 13:23:44 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: JVMCI support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/bb641621..9ad2e62f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=19-20 Stats: 22 lines in 6 files changed: 16 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stuefe at openjdk.org Wed Sep 18 14:00:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 18 Sep 2024 14:00:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:27:14 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.cpp line 87: > >> 85: klass_alignment_words, >> 86: "class arena"); >> 87: } > > As per my comment in the header file, change the code to this: > > ```c++ > if (class_context != nullptr) { > // ... Same as in PR > } else { > _class_space_arena = _non_class_space_arena; > } Rather not, see reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754330432 > src/hotspot/share/memory/classLoaderMetaspace.cpp line 118: > >> 116: #ifdef ASSERT >> 117: if (result.is_nonempty()) { >> 118: const bool in_class_arena = class_space_arena() != nullptr ? class_space_arena()->contains(result) : false; > > Unnecessary nullptr check if you take my suggestion, or you should switch to `have_class_space_arena`. See reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754335269 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765113297 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765113850 From sviswanathan at openjdk.org Wed Sep 18 15:30:10 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 18 Sep 2024 15:30:10 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Wed, 18 Sep 2024 12:23:48 GMT, Emanuel Peter wrote: > I'm a bit confused by the name `shuffleWrapIndexes` and `inline_vector_shuffle_wrap_indexes`. > > Are you **shuffling wrap-indexes**? I don't know what that would even mean. I think you should name it `wrapShuffleIndexes`. Or is there any naming convention in the VectorAPI that prevents this? Agree, wrapShuffleIndexes makes more sense. I will make the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2358784233 From cjplummer at openjdk.org Wed Sep 18 16:41:20 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 18 Sep 2024 16:41:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:35:28 GMT, Roman Kennke wrote: >> Thinking about this a bit more, maybe _mark needs to be a MetadataField rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two separate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case. > > Do you think this needs to be addressed before integration? And if so, could you help with implementation? Or could we do it after intergration? Then please file a follow-up issue. Ok. I filed [JDK-8340396](https://bugs.openjdk.org/browse/JDK-8340396). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765387764 From duke at openjdk.org Wed Sep 18 16:58:17 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Wed, 18 Sep 2024 16:58:17 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option Message-ID: Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based host method compilations but also prevents the loading of the libjvmci compiler. While this works as expected for host method compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. Expected behavior: With `-XX:+UseGraalJIT`, both host method compilations and Truffle compilations should utilize the libjvmci compiler, if available. With `-XX:+EnableJVMCI`, host method compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. ------------- Commit messages: - JDK-8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option Changes: https://git.openjdk.org/jdk/pull/21069/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21069&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340398 Stats: 9 lines in 2 files changed: 6 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21069.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21069/head:pull/21069 PR: https://git.openjdk.org/jdk/pull/21069 From sviswanathan at openjdk.org Wed Sep 18 17:00:30 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 18 Sep 2024 17:00:30 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Change method name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20634/files - new: https://git.openjdk.org/jdk/pull/20634/files/428f2289..87e103ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=01-02 Stats: 45 lines in 37 files changed: 0 ins; 0 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From coleenp at openjdk.org Thu Sep 19 00:04:45 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 00:04:45 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > JVMCI support src/hotspot/share/oops/compressedKlass.cpp line 242: > 240: } else { > 241: > 242: // Traditional (non-compact) header mode) Extra ) src/hotspot/share/oops/compressedKlass.hpp line 175: > 173: // 5b) if CDS=off: Calls initialize() - here, we have more freedom and, if we want, can choose an encoding > 174: // base that differs from the reservation base from step (4). That allows us, e.g., to later use > 175: // zero-based encoding. Not for this but is there really any benefit for zero based encoding for klass ids? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765888065 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765889975 From coleenp at openjdk.org Thu Sep 19 00:04:46 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 00:04:46 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Wed, 18 Sep 2024 12:56:16 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/oop.inline.hpp line 90: >> >>> 88: } else { >>> 89: return markWord::prototype(); >>> 90: } >> >> Could this be unconditional since prototoype_header is initialized for all Klasses? > > yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then. Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765893566 From stefank at openjdk.org Thu Sep 19 05:06:51 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 05:06:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Wed, 18 Sep 2024 23:59:39 GMT, Coleen Phillimore wrote: >> yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then. > > Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766163092 From stefank at openjdk.org Thu Sep 19 05:53:48 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 05:53:48 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > JVMCI support src/hotspot/share/gc/parallel/psParallelCompact.cpp line 787: > 785: // The gap is always equal to min-fill-size, so nothing to do. > 786: return; > 787: } Reading the comment, it is not obvious that this is correct if you set MinObjectAlignment to something larger than the default value: void PSParallelCompact::fill_dense_prefix_end(SpaceId id) { // Comparing two sizes to decide if filling is required: // // The size of the filler (min-obj-size) is 2 heap words with the default // MinObjAlignment, since both markword and klass take 1 heap word. // // The size of the gap (if any) right before dense-prefix-end is // MinObjAlignment. // // Need to fill in the gap only if it's smaller than min-obj-size, and the // filler obj will extend to next region. // Note: If min-fill-size decreases to 1, this whole method becomes redundant. if (UseCompactObjectHeaders) { // The gap is always equal to min-fill-size, so nothing to do. return; } assert(CollectedHeap::min_fill_size() >= 2, "inv"); src/hotspot/share/oops/compressedKlass.cpp line 231: > 229: // The reason is that we want to avoid, if possible, shifts larger than > 230: // a cacheline size. > 231: _base = addr; Why is this important? src/hotspot/share/oops/compressedKlass.hpp line 261: > 259: } > 260: > 261: }; Missing blank line before `#endif` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766185665 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766192688 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766193355 From stefank at openjdk.org Thu Sep 19 05:53:49 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 05:53:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com> On Thu, 19 Sep 2024 05:35:34 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 787: > >> 785: // The gap is always equal to min-fill-size, so nothing to do. >> 786: return; >> 787: } > > Reading the comment, it is not obvious that this is correct if you set MinObjectAlignment to something larger than the default value: > > void PSParallelCompact::fill_dense_prefix_end(SpaceId id) { > // Comparing two sizes to decide if filling is required: > // > // The size of the filler (min-obj-size) is 2 heap words with the default > // MinObjAlignment, since both markword and klass take 1 heap word. > // > // The size of the gap (if any) right before dense-prefix-end is > // MinObjAlignment. > // > // Need to fill in the gap only if it's smaller than min-obj-size, and the > // filler obj will extend to next region. > > // Note: If min-fill-size decreases to 1, this whole method becomes redundant. > if (UseCompactObjectHeaders) { > // The gap is always equal to min-fill-size, so nothing to do. > return; > } > assert(CollectedHeap::min_fill_size() >= 2, "inv"); Style note: The added code is inserted between a comment and the code that the comment refers to. It would be nice to tidy this up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766186545 From jbhateja at openjdk.org Thu Sep 19 07:10:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 07:10:38 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <-SvKZpGY6NbQyh2PnmV5--a8f4oKdSq3VQKV2siSawg=.c812df74-12d4-428b-a7f9-5b1945cdae39@github.com> On Wed, 18 Sep 2024 17:00:30 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Change method name src/hotspot/share/opto/vectorIntrinsics.cpp line 772: > 770: > 771: if (elem_klass == nullptr || shuffle_klass == nullptr || shuffle->is_top() || vlen == nullptr) { > 772: return false; // dead code Why dead code in comment ? this is a failed intrinsification condition. src/hotspot/share/opto/vectorIntrinsics.cpp line 776: > 774: if (!vlen->is_con() || shuffle_klass->const_oop() == nullptr) { > 775: return false; // not enough info for intrinsification > 776: } Why don't you club it with above conditions to be consistent with other inline expanders ? src/hotspot/share/opto/vectorIntrinsics.cpp line 790: > 788: // Shuffles use byte array based backing storage > 789: BasicType shuffle_bt = T_BYTE; > 790: No need a of new line b/w 789 and 791 src/hotspot/share/opto/vectorIntrinsics.cpp line 793: > 791: if (!arch_supports_vector(Op_AndV, num_elem, shuffle_bt, VecMaskNotUsed) || > 792: !arch_supports_vector(Op_Replicate, num_elem, shuffle_bt, VecMaskNotUsed)) { > 793: return false; You should emit proper intrinsification failure message here. src/hotspot/share/opto/vectorIntrinsics.cpp line 805: > 803: const TypeVect* vt = TypeVect::make(shuffle_bt, num_elem); > 804: const Type* shuffle_type_bt = Type::get_const_basic_type(shuffle_bt); > 805: No need of a blank line here. src/hotspot/share/opto/vectorIntrinsics.cpp line 808: > 806: Node* mod_mask = gvn().makecon(TypeInt::make(num_elem-1)); > 807: Node* bcast_mod_mask = gvn().transform(VectorNode::scalar2vector(mod_mask, num_elem, shuffle_type_bt)); > 808: Remove redundant new line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766272449 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766273205 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766273880 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766274718 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766275107 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766275345 From jbhateja at openjdk.org Thu Sep 19 07:32:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 07:32:38 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <2pPJKmEHM24iStw8Xv2IQ08Xrp7Ag3P2_9yEzsS4nOw=.49f3554d-0917-40ff-8824-d8719c3d271f@github.com> On Wed, 18 Sep 2024 17:00:30 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Change method name Hi @sviswa7 , some comments, overall patch looks good to me. Best Regards, Jatin src/hotspot/share/opto/vectorIntrinsics.cpp line 2120: > 2118: > 2119: if (vector_klass == nullptr || elem_klass == nullptr || vlen == nullptr) { > 2120: return false; // dead code Why dead code in comments ? src/hotspot/share/opto/vectorIntrinsics.cpp line 2129: > 2127: NodeClassNames[argument(2)->Opcode()], > 2128: NodeClassNames[argument(3)->Opcode()]); > 2129: return false; // not enough info for intrinsification Please club this with above condition to be consistent with other inline expanders. src/hotspot/share/opto/vectorIntrinsics.cpp line 2141: > 2139: } > 2140: BasicType elem_bt = elem_type->basic_type(); > 2141: Remove new line. src/hotspot/share/opto/vectorIntrinsics.cpp line 2144: > 2142: int num_elem = vlen->get_con(); > 2143: if ((num_elem < 4) || !is_power_of_2(num_elem)) { > 2144: log_if_needed(" ** vlen < 4 or not power of two=%d", num_elem); Will num_elem < 4 not be handled by L2149 since we have an implementation limitation to support less than 32-bit shuffle / masks. src/hotspot/share/opto/vectorIntrinsics.cpp line 2171: > 2169: use_predicate = false; > 2170: if(!is_masked_op || > 2171: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskNotUsed) || Suggestion: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskUseLoad) || src/hotspot/share/opto/vectorIntrinsics.cpp line 2188: > 2186: > 2187: if (v1 == nullptr || v2 == nullptr) { > 2188: return false; // operand unboxing failed To be consistent with other expanders please emit proper error for unboxing failure like on following line. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L426 src/hotspot/share/opto/vectorIntrinsics.cpp line 2197: > 2195: mask = unbox_vector(argument(6), mbox_type, elem_bt, num_elem); > 2196: if (mask == nullptr) { > 2197: log_if_needed(" ** not supported: op=selectFrom vlen=%d etype=%s is_masked_op=1", Error should an unboxing failure here. ------------- PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2314643808 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277056 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277739 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766278169 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766297640 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766292679 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766303620 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766304688 From mli at openjdk.org Thu Sep 19 10:32:50 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Sep 2024 10:32:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > JVMCI support src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2529: > 2527: } > 2528: __ decode_klass_not_null(result); > 2529: } else { Could this if/else block be replaced with a simple call of load_klass(...)? src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3522: > 3520: { > 3521: __ movptr(result, Address(obj, oopDesc::klass_offset_in_bytes())); > 3522: } Could this if/else block be replaced with a simple call of load_klass(...)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766587136 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766582255 From rcastanedalo at openjdk.org Thu Sep 19 11:02:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 11:02:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:04:43 GMT, Roberto Casta?eda Lozano wrote: > I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360673405 From yzheng at openjdk.org Thu Sep 19 11:12:49 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 19 Sep 2024 11:12:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Thu, 19 Sep 2024 05:03:42 GMT, Stefan Karlsson wrote: >> Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. > > We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. Could you please point me to the C2 change? Is it going to be integrated in this PR? We in Graal have not yet adopted `Klass::_prototype_header` and will hold if you decide to get rid of it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766642585 From stuefe at openjdk.org Thu Sep 19 11:39:50 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:39:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 23:49:34 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/oops/compressedKlass.cpp line 242: > >> 240: } else { >> 241: >> 242: // Traditional (non-compact) header mode) > > Extra ) Will fix ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766676702 From rkennke at openjdk.org Thu Sep 19 11:52:34 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:52:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v22] In-Reply-To: References: Message-ID: <0mWQW50x4UNwdsRE94w3rZVGnppxQeR9fbe4eUrAGtM=.cca89805-ca82-4605-bc11-4f9ac53d2b90@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Simplify LIR_Assembler::emit_load_klass() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/9ad2e62f..b25a4b69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=20-21 Stats: 28 lines in 2 files changed: 0 ins; 26 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Sep 19 11:52:34 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:52:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:00:20 GMT, Roberto Casta?eda Lozano wrote: > > I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. > > What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers? Yes, that sounds like a good improvement! It'd also clean up C2 considerably - right now there are many places in C2 that rely on klass_offset_in_bytes(). Getting rid of them all would be great, but also seems like a major effort. Could you file an issue to track that future work? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360756796 From rkennke at openjdk.org Thu Sep 19 11:52:37 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:52:37 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 10:29:11 GMT, Hamlin Li wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2529: > >> 2527: } >> 2528: __ decode_klass_not_null(result); >> 2529: } else { > > Could this if/else block be replaced with a simple call of load_klass(...)? Yes, will do. > src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3522: > >> 3520: { >> 3521: __ movptr(result, Address(obj, oopDesc::klass_offset_in_bytes())); >> 3522: } > > Could this if/else block be replaced with a simple call of load_klass(...)? Yes, will do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766689169 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766689004 From stuefe at openjdk.org Thu Sep 19 11:52:38 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:38 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 05:44:42 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/oops/compressedKlass.cpp line 231: > >> 229: // The reason is that we want to avoid, if possible, shifts larger than >> 230: // a cacheline size. >> 231: _base = addr; > > Why is this important? It lessens the cache effects of Klass hyperaligning. > src/hotspot/share/oops/compressedKlass.hpp line 261: > >> 259: } >> 260: >> 261: }; > > Missing blank line before `#endif` Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766684016 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766684491 From stuefe at openjdk.org Thu Sep 19 11:52:39 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:39 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:43:12 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 231: >> >>> 229: // The reason is that we want to avoid, if possible, shifts larger than >>> 230: // a cacheline size. >>> 231: _base = addr; >> >> Why is this important? > > It lessens the cache effects of Klass hyperaligning. Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766688756 From stuefe at openjdk.org Thu Sep 19 11:52:40 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:40 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 23:53:28 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/oops/compressedKlass.hpp line 175: > >> 173: // 5b) if CDS=off: Calls initialize() - here, we have more freedom and, if we want, can choose an encoding >> 174: // base that differs from the reservation base from step (4). That allows us, e.g., to later use >> 175: // zero-based encoding. > > Not for this but is there really any benefit for zero based encoding for klass ids? Yes, I think so. I think the SAP Jit people investigated this when doing the PPC ports. You save at least two instructions, and possibly more, per decode op. You save code size too since you don't need to materialize the 64-bit base immediate. Especially on x64 this can mean easily 11 fewer bytes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766681110 From stuefe at openjdk.org Thu Sep 19 11:52:42 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:42 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 10:36:58 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: >> >> - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 >> - Fixes post-8340184 >> - Merge upstream up to and including 8340184 >> - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 >> - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java >> - Fix loop on aarch64 >> - clarify obscure assert in metasapce setup >> - Rework compressedklass encoding >> - remove stray debug output >> - Fixes post 8338526 >> - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed > > test/hotspot/gtest/metaspace/test_clms.cpp line 193: > >> 191: >> 192: { >> 193: // Nonclass arena allocation. > > The style in this source file isn't really up to scratch, especially *these* lines. Anyway, it's in the tests, so I'm OK with this being fixed in a follow up RFE. Okay, will fix ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766686807 From rkennke at openjdk.org Thu Sep 19 11:57:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:57:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Thu, 19 Sep 2024 05:03:42 GMT, Stefan Karlsson wrote: >> Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. > > We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766697849 From rkennke at openjdk.org Thu Sep 19 12:08:46 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 12:08:46 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: References: Message-ID: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 - review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/b25a4b69..0d8a9236 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=21-22 Stats: 10 lines in 3 files changed: 1 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From coleenp at openjdk.org Thu Sep 19 12:38:48 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 12:38:48 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:47:21 GMT, Thomas Stuefe wrote: >> It lessens the cache effects of Klass hyperaligning. > > Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10. Yes, please, not having this code would be really nice. This is difficult code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766753081 From rcastanedalo at openjdk.org Thu Sep 19 13:12:49 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 13:12:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: <0gatRiYQ3frDnMftpb_WaDolUwcYvBFh5hAp6jY0dzQ=.21d6518e-7217-477e-954f-69fd52eb713e@github.com> On Thu, 19 Sep 2024 11:42:04 GMT, Roman Kennke wrote: > > > I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. > > > > > > What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers? > > Yes, that sounds like a good improvement! It'd also clean up C2 considerably - right now there are many places in C2 that rely on klass_offset_in_bytes(). Getting rid of them all would be great, but also seems like a major effort. Could you file an issue to track that future work? Done: https://bugs.openjdk.org/browse/JDK-8340453. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360945827 From stefank at openjdk.org Thu Sep 19 13:12:50 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 13:12:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:35:30 GMT, Coleen Phillimore wrote: >> Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10. > > Yes, please, not having this code would be really nice. This is difficult code. Do you seen any effects of this in anything other than special-crafted micro benchmarks? I wonder if it would be good enough to hard-code this to be 10 for the first integration of Lilliput. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766804699 From stuefe at openjdk.org Thu Sep 19 13:37:52 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 13:37:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 13:08:43 GMT, Stefan Karlsson wrote: >> Yes, please, not having this code would be really nice. This is difficult code. > > Do you seen any effects of this in anything other than special-crafted micro benchmarks? I wonder if it would be good enough to hard-code this to be 10 for the first integration of Lilliput. I will do some benchmarks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766848371 From stefank at openjdk.org Thu Sep 19 14:25:52 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 14:25:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> On Thu, 19 Sep 2024 11:54:50 GMT, Roman Kennke wrote: >> We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. > > We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. This is my current work-in-progress code: https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2 I've made some large rewrites and are currently running it through functional testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766934571 From mli at openjdk.org Thu Sep 19 15:03:53 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Sep 2024 15:03:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 > - review feedback In both aarch64.ad and x86_64.ad, `MachUEPNode::format` might need some change accordingly? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2361266175 From rcastanedalo at openjdk.org Thu Sep 19 17:23:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 17:23:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Wed, 18 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576: >> >>> 2574: } else { >>> 2575: lea(dst, Address(obj, index, Address::lsl(scale))); >>> 2576: ldr(dst, Address(dst, offset)); >> >> Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well? > > AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like r27[nklass]+offset, that's why we need to lea the r27[nklass] part first. > Yes, this also happens on x86, but x86 supports rX[nklass]+offset addressing. Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1767315114 From duke at openjdk.org Thu Sep 19 21:15:11 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 19 Sep 2024 21:15:11 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v12] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: fix is_intrinsic_supported to work properly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/aa163896..5da2754a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=10-11 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From sviswanathan at openjdk.org Thu Sep 19 21:43:01 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 19 Sep 2024 21:43:01 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v4] In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20634/files - new: https://git.openjdk.org/jdk/pull/20634/files/87e103ee..f8e67fb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=02-03 Stats: 27 lines in 1 file changed: 9 ins; 8 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From sviswanathan at openjdk.org Thu Sep 19 21:43:02 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 19 Sep 2024 21:43:02 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: <2pPJKmEHM24iStw8Xv2IQ08Xrp7Ag3P2_9yEzsS4nOw=.49f3554d-0917-40ff-8824-d8719c3d271f@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <2pPJKmEHM24iStw8Xv2IQ08Xrp7Ag3P2_9yEzsS4nOw=.49f3554d-0917-40ff-8824-d8719c3d271f@github.com> Message-ID: On Thu, 19 Sep 2024 07:29:11 GMT, Jatin Bhateja wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Change method name > > Hi @sviswa7 , some comments, overall patch looks good to me. > > Best Regards, > Jatin Thanks a lot @jatin-bhateja. I have implemented your review comments. > src/hotspot/share/opto/vectorIntrinsics.cpp line 772: > >> 770: >> 771: if (elem_klass == nullptr || shuffle_klass == nullptr || shuffle->is_top() || vlen == nullptr) { >> 772: return false; // dead code > > Why dead code in comment ? this is a failed intrinsification condition. Modified comment. > src/hotspot/share/opto/vectorIntrinsics.cpp line 776: > >> 774: if (!vlen->is_con() || shuffle_klass->const_oop() == nullptr) { >> 775: return false; // not enough info for intrinsification >> 776: } > > Why don't you club it with above conditions to be consistent with other inline expanders ? Done > src/hotspot/share/opto/vectorIntrinsics.cpp line 2120: > >> 2118: >> 2119: if (vector_klass == nullptr || elem_klass == nullptr || vlen == nullptr) { >> 2120: return false; // dead code > > Why dead code in comments ? Modified comment. > src/hotspot/share/opto/vectorIntrinsics.cpp line 2129: > >> 2127: NodeClassNames[argument(2)->Opcode()], >> 2128: NodeClassNames[argument(3)->Opcode()]); >> 2129: return false; // not enough info for intrinsification > > Please club this with above condition to be consistent with other inline expanders. done > src/hotspot/share/opto/vectorIntrinsics.cpp line 2144: > >> 2142: int num_elem = vlen->get_con(); >> 2143: if ((num_elem < 4) || !is_power_of_2(num_elem)) { >> 2144: log_if_needed(" ** vlen < 4 or not power of two=%d", num_elem); > > Will num_elem < 4 not be handled by L2149 since we have an implementation limitation to support less than 32-bit shuffle / masks. Yes that should handle it. > src/hotspot/share/opto/vectorIntrinsics.cpp line 2171: > >> 2169: use_predicate = false; >> 2170: if(!is_masked_op || >> 2171: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskNotUsed) || > > Suggestion: > > (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskUseLoad) || Here it should be VecMaskNotUsed as this case it using blend to emulate masking. The VecMaskUseLoad case is checked at line 2168. > src/hotspot/share/opto/vectorIntrinsics.cpp line 2188: > >> 2186: >> 2187: if (v1 == nullptr || v2 == nullptr) { >> 2188: return false; // operand unboxing failed > > To be consistent with other expanders please emit proper error for unboxing failure like on following line. > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L426 done > src/hotspot/share/opto/vectorIntrinsics.cpp line 2197: > >> 2195: mask = unbox_vector(argument(6), mbox_type, elem_bt, num_elem); >> 2196: if (mask == nullptr) { >> 2197: log_if_needed(" ** not supported: op=selectFrom vlen=%d etype=%s is_masked_op=1", > > Error should an unboxing failure here. done ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2362249672 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767601917 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767602096 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767605028 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767605213 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767607670 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767610833 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767615559 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767617255 From sviswanathan at openjdk.org Thu Sep 19 21:45:36 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 19 Sep 2024 21:45:36 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <_Q0HCE6Lc7LZY8Sc5XzQvLHg_WdeCDOAGZgMOeEWK4M=.d28c8b11-ee52-4551-92b8-357c04a4d5ef@github.com> On Wed, 18 Sep 2024 12:23:48 GMT, Emanuel Peter wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > I'm a bit confused by the name `shuffleWrapIndexes` and `inline_vector_shuffle_wrap_indexes`. > > Are you **shuffling wrap-indexes**? I don't know what that would even mean. I think you should name it `wrapShuffleIndexes`. Or is there any naming convention in the VectorAPI that prevents this? Thanks a lot @eme64 for the review. I have implemented your review comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2362253398 From sviswanathan at openjdk.org Fri Sep 20 00:12:48 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 20 Sep 2024 00:12:48 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v12] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 21:15:11 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix is_intrinsic_supported to work properly The PR looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20657#pullrequestreview-2316948493 From rkennke at openjdk.org Fri Sep 20 12:33:50 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 20 Sep 2024 12:33:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Thu, 19 Sep 2024 17:20:36 GMT, Roberto Casta?eda Lozano wrote: >> AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like r27[nklass]+offset, that's why we need to lea the r27[nklass] part first. >> Yes, this also happens on x86, but x86 supports rX[nklass]+offset addressing. > > Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization. I tried to reproduce for a few hours now using a custom testcase, with no success. I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know. I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further. For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768538965 From haosun at openjdk.org Fri Sep 20 14:24:43 2024 From: haosun at openjdk.org (Hao Sun) Date: Fri, 20 Sep 2024 14:24:43 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v3] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <07DqhAfjMD9qfeno10HOAuNBeiIul86acqTMpE6YtaY=.2569accb-c0ab-470f-b348-5894831be5d5@github.com> Message-ID: On Mon, 19 Aug 2024 08:58:53 GMT, Andrew Dinn wrote: >> Hi Andrew, I find that we need following add-on change for riscv: >> >> >> diff --git a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp >> index dc89e489b24..bed24e442e8 100644 >> --- a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp >> +++ b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp >> @@ -66,6 +66,12 @@ >> >> #define __ masm-> >> >> +#ifdef PRODUCT >> +#define BLOCK_COMMENT(str) /* nothing */ >> +#else >> +#define BLOCK_COMMENT(str) __ block_comment(str) >> +#endif >> + >> const int StackAlignmentInSlots = StackAlignmentInBytes / VMRegImpl::stack_slot_size; >> >> class RegisterSaver { >> @@ -2742,7 +2748,7 @@ static void jfr_epilogue(MacroAssembler* masm) { >> // For c2: c_rarg0 is junk, call to runtime to write a checkpoint. >> // It returns a jobject handle to the event writer. >> // The handle is dereferenced and the return value is the event writer oop. >> -static RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { >> +RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { >> enum layout { >> fp_off, >> fp_off2, >> @@ -2780,7 +2786,7 @@ static RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { >> } >> >> // For c2: call to return a leased buffer. >> -static RuntimeStub* SharedRuntime::generate_jfr_return_lease() { >> +RuntimeStub* SharedRuntime::generate_jfr_return_lease() { >> enum layout { >> fp_off, >> fp_off2, > > @RealFYang Thanks! Hi @adinn , I encountered Client build failure on AArch64 after this commit. Could you help take a look at it when you have spare time? Thanks. Here shows the configuration ==================================================== The existing configuration has been successfully updated in /tmp/test123/build-release using configure arguments '--with-debug-level=release --with-version-opt=git-fe80618bf3f --with-jvm-variants=client'. Configuration summary: * Name: /tmp/test123/build-release * Debug level: release * HS debug level: product * JVM variants: client * JVM features: client: 'cds compiler1 epsilongc g1gc jfr jni-check jvmti management parallelgc serialgc services shenandoahgc vm-structs zgc' * OpenJDK target: OS: linux, CPU architecture: aarch64, address length: 64 * Version string: 24-internal-git-fe80618bf3f (24-internal) * Source date: 1726841146 (2024-09-20T14:05:46Z) Tools summary: * Boot JDK: openjdk version "22.0.2" 2024-07-16 OpenJDK Runtime Environment (build 22.0.2+9-70) OpenJDK 64-Bit Server VM (build 22.0.2+9-70, mixed mode, sharing) (at /usr/lib/jvm/jdk-22.0.2) * Toolchain: gcc (GNU Compiler Collection) * C Compiler: Version 13.2.0 (at /usr/bin/gcc) * C++ Compiler: Version 13.2.0 (at /usr/bin/g++) Build performance summary: * Build jobs: 72 * Memory limit: 587068 MB And here shows the error msg: === Output from failing command(s) repeated here === * For target hotspot_variant-client_libjvm_objs_sharedRuntime_aarch64.o: /tmp/test123/jdk-src/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp: In static member function ?static RuntimeStub* SharedRuntime::generate_throw_exception(const char*, address)?: /tmp/test123/jdk-src/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:2809:3: error: ?TraceTime? was not declared in this scope; did you mean ?traceid?? 2809 | TraceTime timer(timer_msg, TRACETIME_LOG(Info, startuptime)); | ^~~~~~~~~ | traceid * All command lines available in /tmp/test123/build-release/make-support/failure-logs. === End of repeated output === ------------- PR Comment: https://git.openjdk.org/jdk/pull/20566#issuecomment-2363857168 From rkennke at openjdk.org Fri Sep 20 15:29:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 20 Sep 2024 15:29:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Fri, 20 Sep 2024 12:31:18 GMT, Roman Kennke wrote: >> Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization. > > I tried to reproduce for a few hours now using a custom testcase, with no success. > I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know. > I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further. > > For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738 Something like this is what I have in mind. It seems to pass tier1 tests. I still haven't managed to reproduce the path that requires an index register, though. https://github.com/rkennke/jdk/commit/2c4a7877e4ef94017c8155578d8cfc9342441656 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768816377 From jbhateja at openjdk.org Fri Sep 20 17:04:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 20 Sep 2024 17:04:41 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v4] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Thu, 19 Sep 2024 21:43:01 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Implement review comments Thanks @sviswa7 , LGTM ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2318802240 From matsaave at openjdk.org Fri Sep 20 17:21:51 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 20 Sep 2024 17:21:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 > - review feedback CDS changes look good! Have two style comments but otherwise this makes sense ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2318793061 From matsaave at openjdk.org Fri Sep 20 17:21:53 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 20 Sep 2024 17:21:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/cds/archiveBuilder.cpp line 677: > 675: // Allocate space for the future InstanceKlass with proper alignment > 676: const size_t alignment = > 677: #ifdef _LP64 I think the text alignment here is a bit confusing. Should 678 and 682 be at the same indentation? src/hotspot/share/cds/archiveUtils.cpp line 348: > 346: old_tag = (int)(intptr_t)nextPtr(); > 347: // do_int(&old_tag); > 348: assert(tag == old_tag, "tag doesn't match (%d, expected %d)", old_tag, tag); Is this assert message change a leftover from debugging or is it meant to be this way? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768946883 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768923643 From coleenp at openjdk.org Fri Sep 20 18:19:51 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 18:19:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 > - review feedback I mostly reviewed the metaspace changes and suggest upstreaming the MetaBlock refactoring ahead of the rest of this patch. Only one comment about the interpreter code (affecting 4 locations). src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3636: > 3634: } else { > 3635: __ sub(r3, r3, sizeof(oopDesc)); > 3636: } This looks like something that could be buggy if we're not careful. We had a pass where we cleaned up sizeof(oopDesc) once. Can this be in oopDesc as (this is not header_size() anymore?) some function with the right name? src/hotspot/cpu/x86/templateTable_x86.cpp line 4121: > 4119: __ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 1*oopSize), rcx); > 4120: NOT_LP64(__ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 2*oopSize), rcx)); > 4121: } For this and above, I'd rather oopDesc encapsulate the header_size for UseCompactObjectHeaders condition in C++ code, and never see sizeof(oopDesc). src/hotspot/share/memory/metaspace.cpp line 799: > 797: > 798: // Set up compressed class pointer encoding. > 799: // In CDS=off mode, we give the JVM some leeway to choose a favorable base/shift combination. I don't know why this comment is here. Seems out of place. src/hotspot/share/memory/metaspace/freeBlocks.cpp line 57: > 55: } > 56: } > 57: return p; This answers my prior question. The waste is added back to the block list for non-class-arenas as well. src/hotspot/share/memory/metaspace/metablock.hpp line 74: > 72: #define METABLOCKFORMATARGS(__block__) p2i((__block__).base()), (__block__).word_size() > 73: > 74: } // namespace metaspace I am wondering if some of these metaspace changes, that is, the addition of MetaBlock could be upstreamed ahead of the CompactObjectHeaders. Some is refactoring so that you can use the wastage to allocate into class-arena but a lot of this seems neutral to compact object headers, and would reduce this patch and allow different people to focus on just this. src/hotspot/share/memory/metaspace/metaspaceArena.cpp line 470: > 468: > 469: // Returns true if the given block is contained in this arena > 470: // Returns true if the given block is contained in this arena Here's the same comment twice. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2318539468 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768775590 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768781956 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768979540 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769008437 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769012842 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769015008 From coleenp at openjdk.org Fri Sep 20 18:19:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 18:19:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com> On Wed, 18 Sep 2024 13:57:29 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/classLoaderMetaspace.cpp line 87: >> >>> 85: klass_alignment_words, >>> 86: "class arena"); >>> 87: } >> >> As per my comment in the header file, change the code to this: >> >> ```c++ >> if (class_context != nullptr) { >> // ... Same as in PR >> } else { >> _class_space_arena = _non_class_space_arena; >> } > > Rather not, see reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754330432 Yes, I'd rather _class_space_arena be nullptr if not used. >> src/hotspot/share/memory/classLoaderMetaspace.cpp line 115: >> >>> 113: if (wastage.is_nonempty()) { >>> 114: non_class_space_arena()->deallocate(wastage); >>> 115: } >> >> This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example: >> >> ```c++ >> // Any wasted memory is presumably too small for any class. >> // Therefore, give it back to the non-class space arena's free list. > > Yes. Some background: > > - wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert) > - wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small > > Yes, I will write a better comment. Yes, this definitely needs a comment why since this is how we allocate small chunks of wasted because of hyper-aligning Klasses in class space. Line 111 is somewhat surprising though. I didn't expect there to be wastage from allocating to non-class-metaspace. The unnerving bit of this is that CompressedKlassPointers::is_encodable() is true for memory allocated here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768897591 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768966812 From coleenp at openjdk.org Fri Sep 20 18:19:53 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 18:19:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com> References: <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com> Message-ID: On Fri, 20 Sep 2024 17:34:09 GMT, Coleen Phillimore wrote: >> Yes. Some background: >> >> - wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert) >> - wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small >> >> Yes, I will write a better comment. > > Yes, this definitely needs a comment why since this is how we allocate small chunks of wasted because of hyper-aligning Klasses in class space. Line 111 is somewhat surprising though. I didn't expect there to be wastage from allocating to non-class-metaspace. > > The unnerving bit of this is that CompressedKlassPointers::is_encodable() is true for memory allocated here. I think this should also assert or be condionalized on UseCompactObjectHeaders. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768972448 From coleenp at openjdk.org Fri Sep 20 19:02:49 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 19:02:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Wed, 18 Sep 2024 12:54:34 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/markWord.inline.hpp line 90: >> >>> 88: ShouldNotReachHere(); >>> 89: return markWord(); >>> 90: #endif >> >> Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits? > > Kindof. The problem is that klass_shift is larger than 31, and shifting with it would thus be UB and generate a compiler warning. I opted to simply not compile any of that code in 32bit builds. We could also define klass_shift differently on 32bit. > Long-term (maybe with Lilliput2/4-byte-headers?) it would be nice to consolidate the header layout between 32 and 64 bit builds and not make any distinction anywhere. E.g. define markWord (or objectHeader?) in a single way, and use that to extract all the relevant stuff. It's not totally unlikely that we deprecate 32-bit builds before that can happen, though. Ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769069007 From coleenp at openjdk.org Fri Sep 20 19:09:50 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 19:09:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> Message-ID: On Thu, 19 Sep 2024 14:22:51 GMT, Stefan Karlsson wrote: >> We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. > > This is my current work-in-progress code: > https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2 > > I've made some large rewrites and I'm currently running it through functional testing. The refactoring is better in this last version with encode_and_store_compact_object_header, although some comments around the c2 version would be good. Still don't know what the c2 version does. Someone else should review that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769075714 From shade at openjdk.org Sat Sep 21 06:07:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Sat, 21 Sep 2024 06:07:06 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized Message-ID: See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. Additional testing: - [x] Linux x86_64 server fastdebug, `all` - [x] Linux AArch64 server fastdebug, `all` - [x] GHA to test platform buildability + adhoc platform cross-compilation ------------- Commit messages: - Initial version Changes: https://git.openjdk.org/jdk/pull/21110/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21110&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338379 Stats: 27 lines in 17 files changed: 12 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21110/head:pull/21110 PR: https://git.openjdk.org/jdk/pull/21110 From dnsimon at openjdk.org Sun Sep 22 10:47:35 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 22 Sep 2024 10:47:35 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 16:53:40 GMT, Tom?? Zezula wrote: > Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. > > Expected behavior: > > With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. > With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. src/hotspot/share/jvmci/jvmci_globals.cpp line 84: > 82: if (EnableJVMCI) { > 83: if (FLAG_IS_DEFAULT(UseJVMCINativeLibrary) && !UseJVMCINativeLibrary) { > 84: char path[JVM_MAXPATHLEN]; This check for enabling `UseJVMCINativeLibrary` should really be: if (UseJVMCICompiler) { if (FLAG_IS_DEFAULT(UseJVMCINativeLibrary) && !UseJVMCINativeLibrary) { - char path[JVM_MAXPATHLEN]; - if (os::dll_locate_lib(path, sizeof(path), Arguments::get_dll_dir(), JVMCI_SHARED_LIBRARY_NAME)) { + if (JVMCI::shared_library_exists()) { // If a JVMCI native library is present, but I will address that as part of [JDK-8340576](https://bugs.openjdk.org/browse/JDK-8340576). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21069#discussion_r1770520057 From dnsimon at openjdk.org Sun Sep 22 11:49:35 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 22 Sep 2024 11:49:35 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 16:53:40 GMT, Tom?? Zezula wrote: > Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. > > Expected behavior: > > With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. > With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. src/hotspot/share/jvmci/jvmci_globals.cpp line 82: > 80: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) > 81: > 82: if (EnableJVMCI) { This needs to be `EnableJVMCI || UseJVMCICompiler` (since deriving `EnableJVMCI` from `UseJVMCICompiler` is only done [below](https://github.com/openjdk/jdk/blob/ab06a878f888827026424530781f0af414a8a611/src/hotspot/share/jvmci/jvmci_globals.cpp#L96)). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21069#discussion_r1770532660 From stuefe at openjdk.org Sun Sep 22 12:01:51 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 22 Sep 2024 12:01:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 16:56:58 GMT, Matias Saavedra Silva wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/cds/archiveUtils.cpp line 348: > >> 346: old_tag = (int)(intptr_t)nextPtr(); >> 347: // do_int(&old_tag); >> 348: assert(tag == old_tag, "tag doesn't match (%d, expected %d)", old_tag, tag); > > Is this assert message change a leftover from debugging or is it meant to be this way? Its a leftover, but otoh it does not hurt. I found myself re-adding it several times to analyze CDS issues during development, so I decided to just leave it in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1770536320 From dholmes at openjdk.org Mon Sep 23 01:51:41 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Sep 2024 01:51:41 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 14:02:51 GMT, Aleksey Shipilev wrote: > See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. > > In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. > > Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). > > I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] GHA to test platform buildability + adhoc platform cross-compilation Seems far more extensive than what was discussed. Code that takes the lock-free path to check `in_initialized` is what I thought we agreed needed the acquire/release not every read of the state variable. This code will be executed a lot and in 99.99% of cases the memory barriers are not needed. src/hotspot/share/oops/instanceKlass.cpp line 4099: > 4097: #endif > 4098: assert(_init_thread == nullptr, "should be cleared before state change"); > 4099: Atomic::release_store_fence(&_init_state, state); Why not just a release_store ?? Why do we need the trailing fence? ------------- PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2321028771 PR Review Comment: https://git.openjdk.org/jdk/pull/21110#discussion_r1770709316 From shade at openjdk.org Mon Sep 23 07:17:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 07:17:50 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: > See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. > > In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. > > Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). > > I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] GHA to test platform buildability + adhoc platform cross-compilation Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Relax to just a release ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21110/files - new: https://git.openjdk.org/jdk/pull/21110/files/66dc20b6..179d8aa1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21110&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21110&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21110/head:pull/21110 PR: https://git.openjdk.org/jdk/pull/21110 From shade at openjdk.org Mon Sep 23 07:17:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 07:17:50 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: <7TWC-8mou61HHIklbJ9Ox2XpSNQBD-QfBKXWyDhd3C8=.431c5d1b-448b-4676-8a0d-f23906480546@github.com> On Mon, 23 Sep 2024 01:46:12 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Relax to just a release > > src/hotspot/share/oops/instanceKlass.cpp line 4099: > >> 4097: #endif >> 4098: assert(_init_thread == nullptr, "should be cleared before state change"); >> 4099: Atomic::release_store_fence(&_init_state, state); > > Why not just a release_store ?? Why do we need the trailing fence? Says in PR: "Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path." But I can turn it into just a weaker release, sure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21110#discussion_r1770871023 From duke at openjdk.org Mon Sep 23 07:26:36 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Mon, 23 Sep 2024 07:26:36 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 11:46:49 GMT, Doug Simon wrote: >> Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. >> >> Expected behavior: >> >> With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. >> With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. > > src/hotspot/share/jvmci/jvmci_globals.cpp line 82: > >> 80: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) >> 81: >> 82: if (EnableJVMCI) { > > This needs to be `EnableJVMCI || UseJVMCICompiler` (since deriving `EnableJVMCI` from `UseJVMCICompiler` is only done [below](https://github.com/openjdk/jdk/blob/ab06a878f888827026424530781f0af414a8a611/src/hotspot/share/jvmci/jvmci_globals.cpp#L96)). I see, `FLAG_SET_DEFAULT(EnableJVMCI, true)` on [line 99](https://github.com/openjdk/jdk/blob/78f576192e815f957db93f5f8cb3763a35474381/src/hotspot/share/jvmci/jvmci_globals.cpp#L99). Maybe moving this block if (!FLAG_IS_DEFAULT(EnableJVMCI) && !EnableJVMCI) { jio_fprintf(defaultStream::error_stream(), "Improperly specified VM option UseJVMCICompiler: EnableJVMCI cannot be disabled\n"); return false; } FLAG_SET_DEFAULT(EnableJVMCI, true); in front of my change makes it more readable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21069#discussion_r1770881033 From duke at openjdk.org Mon Sep 23 07:31:10 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Mon, 23 Sep 2024 07:31:10 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: References: Message-ID: > Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. > > Expected behavior: > > With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. > With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: JDK-8340398: Fixed EnableJVMCI handling. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21069/files - new: https://git.openjdk.org/jdk/pull/21069/files/78f57619..b7550463 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21069&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21069&range=00-01 Stats: 15 lines in 1 file changed: 9 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21069.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21069/head:pull/21069 PR: https://git.openjdk.org/jdk/pull/21069 From shade at openjdk.org Mon Sep 23 07:33:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 07:33:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 01:48:47 GMT, David Holmes wrote: > Seems far more extensive than what was discussed. Code that takes the lock-free path to check `in_initialized` is what I thought we agreed needed the acquire/release not every read of the state variable. This code will be executed a lot and in 99.99% of cases the memory barriers are not needed. This just extends the architectural parts of the patch we agreed with @coleenp for the fix. Which parts you think are excessive? The acquires in `instanceKlass.hpp`? It would be hard to track which one of those are used without a lock, I think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2367428015 From dnsimon at openjdk.org Mon Sep 23 07:39:37 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 23 Sep 2024 07:39:37 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: References: Message-ID: <2Z_U9xcRXvMmqxA2GKHRb_1j1J-50ULmfK1VGxs-Hqo=.b1af5258-2c74-4a23-8ff9-7b91c3f103f5@github.com> On Mon, 23 Sep 2024 07:21:32 GMT, Tom?? Zezula wrote: >> src/hotspot/share/jvmci/jvmci_globals.cpp line 82: >> >>> 80: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) >>> 81: >>> 82: if (EnableJVMCI) { >> >> This needs to be `EnableJVMCI || UseJVMCICompiler` (since deriving `EnableJVMCI` from `UseJVMCICompiler` is only done [below](https://github.com/openjdk/jdk/blob/ab06a878f888827026424530781f0af414a8a611/src/hotspot/share/jvmci/jvmci_globals.cpp#L96)). > > I see, `FLAG_SET_DEFAULT(EnableJVMCI, true)` on [line 99](https://github.com/openjdk/jdk/blob/78f576192e815f957db93f5f8cb3763a35474381/src/hotspot/share/jvmci/jvmci_globals.cpp#L99). > Maybe moving this block > > if (!FLAG_IS_DEFAULT(EnableJVMCI) && !EnableJVMCI) { > jio_fprintf(defaultStream::error_stream(), > "Improperly specified VM option UseJVMCICompiler: EnableJVMCI cannot be disabled\n"); > return false; > } > FLAG_SET_DEFAULT(EnableJVMCI, true); > > in front of my change makes it more readable. It's not obvious to me how that's clearer than just expanding the guard on line 82 to be `EnableJVMCI || UseJVMCICompiler`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21069#discussion_r1770895929 From dnsimon at openjdk.org Mon Sep 23 07:39:37 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 23 Sep 2024 07:39:37 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: <2Z_U9xcRXvMmqxA2GKHRb_1j1J-50ULmfK1VGxs-Hqo=.b1af5258-2c74-4a23-8ff9-7b91c3f103f5@github.com> References: <2Z_U9xcRXvMmqxA2GKHRb_1j1J-50ULmfK1VGxs-Hqo=.b1af5258-2c74-4a23-8ff9-7b91c3f103f5@github.com> Message-ID: On Mon, 23 Sep 2024 07:34:32 GMT, Doug Simon wrote: >> I see, `FLAG_SET_DEFAULT(EnableJVMCI, true)` on [line 99](https://github.com/openjdk/jdk/blob/78f576192e815f957db93f5f8cb3763a35474381/src/hotspot/share/jvmci/jvmci_globals.cpp#L99). >> Maybe moving this block >> >> if (!FLAG_IS_DEFAULT(EnableJVMCI) && !EnableJVMCI) { >> jio_fprintf(defaultStream::error_stream(), >> "Improperly specified VM option UseJVMCICompiler: EnableJVMCI cannot be disabled\n"); >> return false; >> } >> FLAG_SET_DEFAULT(EnableJVMCI, true); >> >> in front of my change makes it more readable. > > It's not obvious to me how that's clearer than just expanding the guard on line 82 to be `EnableJVMCI || UseJVMCICompiler`. Now that I see your change and understand what you meant, it is better - thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21069#discussion_r1770898087 From dnsimon at openjdk.org Mon Sep 23 08:14:39 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 23 Sep 2024 08:14:39 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:31:10 GMT, Tom?? Zezula wrote: >> Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. >> >> Expected behavior: >> >> With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. >> With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8340398: Fixed EnableJVMCI handling. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21069#pullrequestreview-2321435416 From dholmes at openjdk.org Mon Sep 23 09:23:36 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Sep 2024 09:23:36 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: <7TWC-8mou61HHIklbJ9Ox2XpSNQBD-QfBKXWyDhd3C8=.431c5d1b-448b-4676-8a0d-f23906480546@github.com> References: <7TWC-8mou61HHIklbJ9Ox2XpSNQBD-QfBKXWyDhd3C8=.431c5d1b-448b-4676-8a0d-f23906480546@github.com> Message-ID: On Mon, 23 Sep 2024 07:14:52 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 4099: >> >>> 4097: #endif >>> 4098: assert(_init_thread == nullptr, "should be cleared before state change"); >>> 4099: Atomic::release_store_fence(&_init_state, state); >> >> Why not just a release_store ?? Why do we need the trailing fence? > > Says in PR: "Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path." But I can turn it into just a weaker release, sure. I thought a seqcst write would be `fence(); store; fence()`? Anyway I don't like "paranoid when it comes to memory barriers because that says to me "hey we don't understand what is going on here so we're just going to do the heaviest barrier we can 'just in case'." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21110#discussion_r1771041123 From dholmes at openjdk.org Mon Sep 23 09:26:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Sep 2024 09:26:37 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release The problem is we have completely different code paths that look at the different states of a class (loaded, linked, initialized, in-error) and those actions use different locks. This issue was, I thought, only about the lock-free fast-paths checking the "is initialized" state not anything else. These extra barriers could be completely redundant for "is loaded" or "is linked" or "is in error" checks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2367663009 From coleenp at openjdk.org Mon Sep 23 12:43:36 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 23 Sep 2024 12:43:36 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I like this patch. src/hotspot/share/oops/instanceKlass.hpp line 517: > 515: bool is_in_error_state() const { return init_state() == initialization_error; } > 516: bool is_reentrant_initialization(Thread *thread) { return thread == _init_thread; } > 517: ClassState init_state() const { return Atomic::load_acquire(&_init_state); } This is the code that I want the most with this patch. If we're reading this field outside a lock, we need the acquire. Let's not make it more complicated than that. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2322066095 PR Review Comment: https://git.openjdk.org/jdk/pull/21110#discussion_r1771328239 From shade at openjdk.org Mon Sep 23 13:14:38 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 13:14:38 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 09:23:29 GMT, David Holmes wrote: > The problem is we have completely different code paths that look at the different states of a class (loaded, linked, initialized, in-error) and those actions use different locks. This issue was, I thought, only about the lock-free fast-paths checking the "is initialized" state not anything else. These extra barriers could be completely redundant for "is loaded" or "is linked" or "is in error" checks. Right. I chose this code shape to make sure we cover _all_ paths that poll `init_state` to extra safety. We could, in principle, only protect `is_initialized()` path with the acquire. But I think we then start to depend on downstream code not doing "smart" things bypassing that check, for example polling `is_loaded() { _init_state >= loaded }` to implicitly (and too optimistically) check for `init_state >= fully_initialized`, or even doing `init_state() > being_initialized` somewhere. I would not discount the possibility that something somewhere would depend on pre-fully-initialized states to publish the intermediate class state. Looking around, I see some interesting uses in `InstanceKlass::methods_do`, `ClassLoaderData::methods_do`, `ClassLoaderData::loaded_classes_do`, `LoaderConstraintTable::find_constrained_klass`, ... It feels much safer to be extra paranoid here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2368197595 From duke at openjdk.org Mon Sep 23 16:26:45 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 23 Sep 2024 16:26:45 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v12] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 21:15:11 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix is_intrinsic_supported to work properly Hello Vladimir (@vnkozlov), Could you please run the tests for this PR and let us know? We're hoping to integrate this PR soon. Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2368777178 From sviswanathan at openjdk.org Mon Sep 23 18:27:45 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 23 Sep 2024 18:27:45 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v12] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Wed, 18 Sep 2024 07:21:52 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Incorporating review and documentation suggestions. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2600: > 2598: assert ((vlen & (vlen -1)) == 0); > 2599: int twoVectorLenMask = (vlen << 1) - 1; > 2600: ByteVector wrapped_indexes = this.lanewise(VectorOperators.AND, twoVectorLenMask); This assert and the following AND forcing power of two vector length seems out of place in Java code. You could move the wrapping within the selectFromTwoVectorOp on similar lines as the PR #20634. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1771898190 From kvn at openjdk.org Mon Sep 23 19:16:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 23 Sep 2024 19:16:39 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v12] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 21:15:11 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix is_intrinsic_supported to work properly Looks good. I have only one nitpick. I will start testing. src/hotspot/share/c1/c1_Compiler.cpp line 170: > 168: case vmIntrinsics::_dcos: > 169: case vmIntrinsics::_dtan: > 170: #if defined(X86) Use `#ifdef AMD64` for x64 only ------------- PR Review: https://git.openjdk.org/jdk/pull/20657#pullrequestreview-2323102058 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1771949759 From duke at openjdk.org Mon Sep 23 19:24:51 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 23 Sep 2024 19:24:51 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v13] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: change ifdef from x86 to AMD64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/5da2754a..4dc2e36a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Mon Sep 23 19:24:51 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 23 Sep 2024 19:24:51 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v12] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 19:14:10 GMT, Vladimir Kozlov wrote: > Looks good. I have only one nitpick. I will start testing. Thank you Vladimir! > src/hotspot/share/c1/c1_Compiler.cpp line 170: > >> 168: case vmIntrinsics::_dcos: >> 169: case vmIntrinsics::_dtan: >> 170: #if defined(X86) > > Use `#ifdef AMD64` for x64 only Thanks Vladimir! Please see the code updated with `#ifdef AMD64`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2369168165 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1771961469 From kvn at openjdk.org Tue Sep 24 01:04:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 24 Sep 2024 01:04:38 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v13] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 19:24:51 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change ifdef from x86 to AMD64 My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20657#pullrequestreview-2323769774 From duke at openjdk.org Tue Sep 24 01:32:35 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Tue, 24 Sep 2024 01:32:35 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: References: Message-ID: <3Q2rjX8usrpKiTZVeUXq7Ty8lFiFsDi03fmMnNA9IQ0=.7539b42e-03be-4b5c-abcf-51314d85b087@github.com> On Mon, 23 Sep 2024 07:31:10 GMT, Tom?? Zezula wrote: >> Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. >> >> Expected behavior: >> >> With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. >> With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8340398: Fixed EnableJVMCI handling. If I'm reading things correctly, the doc-string for `UseJVMCINativeLibrary` in `jvmci_globals.hpp` needs updating. That [currently states](https://github.com/openjdk/jdk/blob/78f576192e815f957db93f5f8cb3763a35474381/src/hotspot/share/jvmci/jvmci_globals.hpp#L140-L144) "Defaults to true if EnableJVMCIProduct is true and a JVMCI native library is available" but looks like it default to true if `EnableJVMCI` is true, regardless of the `EnableJVMCIProduct` setting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21069#issuecomment-2369922039 From duke at openjdk.org Tue Sep 24 03:39:36 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 24 Sep 2024 03:39:36 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v13] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 01:01:54 GMT, Vladimir Kozlov wrote: > My testing passed. Thank You Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2370044599 From duke at openjdk.org Tue Sep 24 03:39:37 2024 From: duke at openjdk.org (duke) Date: Tue, 24 Sep 2024 03:39:37 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v13] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 19:24:51 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change ifdef from x86 to AMD64 @vamsi-parasa Your change (at version 4dc2e36a8a2897d0a39d30a5580b18fbd9e5baf5) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2370047322 From dholmes at openjdk.org Tue Sep 24 05:54:41 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Sep 2024 05:54:41 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: <2oh0M5OXUp35ibGjTLXkJFyDEjF1zt3816WdKpB98tQ=.feabfe80-c6b8-430d-a0b9-30cb4149680a@github.com> On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release Well I don't like "paranoid" code when it comes to concurrency for the reason I already gave. I think part of the problem here is that so many different locks are involved in the different stages of class loading, linking and initialization, that it can be unclear when you've zoomed in exactly which lock should be part of the code path you're dealing with (e.g the loader constraint table code is protected by the SD lock so the checking of the `is_loaded` state is not lock-free). But this code is functionally correct so the only potential harm here (other than complicating code understanding) is to performance, which we will just have to keep an eye on. FYI I'm away for the next couple of days. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2324129700 From dnsimon at openjdk.org Tue Sep 24 06:54:37 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 06:54:37 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: <3Q2rjX8usrpKiTZVeUXq7Ty8lFiFsDi03fmMnNA9IQ0=.7539b42e-03be-4b5c-abcf-51314d85b087@github.com> References: <3Q2rjX8usrpKiTZVeUXq7Ty8lFiFsDi03fmMnNA9IQ0=.7539b42e-03be-4b5c-abcf-51314d85b087@github.com> Message-ID: On Tue, 24 Sep 2024 01:29:55 GMT, Todd V. Jonker wrote: > If I'm reading things correctly, the doc-string for `UseJVMCINativeLibrary` in `jvmci_globals.hpp` needs updating. That [currently states](https://github.com/openjdk/jdk/blob/78f576192e815f957db93f5f8cb3763a35474381/src/hotspot/share/jvmci/jvmci_globals.hpp#L140-L144) "Defaults to true if EnableJVMCIProduct is true and a JVMCI native library is available" but looks like it default to true if `EnableJVMCI` is true, regardless of the `EnableJVMCIProduct` setting. That is correct and I'm making that fix [here](https://github.com/openjdk/jdk/pull/21120/files#diff-cba70430948d75c7d40424fbbc704e7d7c571d6862502e210630369d8800ec62L143). However, it wouldn't hurt to also fix it here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21069#issuecomment-2370342202 From jbhateja at openjdk.org Tue Sep 24 07:10:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 24 Sep 2024 07:10:24 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/31a58642..42ca80c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=11-12 Stats: 225 lines in 41 files changed: 25 ins; 82 del; 118 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From duke at openjdk.org Tue Sep 24 07:15:57 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 24 Sep 2024 07:15:57 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v3] In-Reply-To: References: Message-ID: > Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. > > Expected behavior: > > With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. > With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: JDK-8340398: Fixed UseJVMCINativeLibrary doc string. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21069/files - new: https://git.openjdk.org/jdk/pull/21069/files/b7550463..28dbd932 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21069&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21069&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21069.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21069/head:pull/21069 PR: https://git.openjdk.org/jdk/pull/21069 From duke at openjdk.org Tue Sep 24 07:15:57 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 24 Sep 2024 07:15:57 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v2] In-Reply-To: References: <3Q2rjX8usrpKiTZVeUXq7Ty8lFiFsDi03fmMnNA9IQ0=.7539b42e-03be-4b5c-abcf-51314d85b087@github.com> Message-ID: On Tue, 24 Sep 2024 06:51:52 GMT, Doug Simon wrote: > If I'm reading things correctly, the doc-string for `UseJVMCINativeLibrary` in `jvmci_globals.hpp` needs updating. Fixed in https://github.com/openjdk/jdk/pull/21069/commits/28dbd9329a4ad67a39e3ba19767aea2209313382 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21069#issuecomment-2370379478 From dnsimon at openjdk.org Tue Sep 24 07:31:46 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 07:31:46 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v3] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 07:15:57 GMT, Tom?? Zezula wrote: >> Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. >> >> Expected behavior: >> >> With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. >> With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8340398: Fixed UseJVMCINativeLibrary doc string. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21069#pullrequestreview-2324314959 From rcastanedalo at openjdk.org Tue Sep 24 09:01:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 24 Sep 2024 09:01:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Fri, 20 Sep 2024 15:26:36 GMT, Roman Kennke wrote: >> I tried to reproduce for a few hours now using a custom testcase, with no success. >> I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know. >> I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further. >> >> For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738 > > Something like this is what I have in mind. It seems to pass tier1 tests. I still haven't managed to reproduce the path that requires an index register, though. > https://github.com/rkennke/jdk/commit/2c4a7877e4ef94017c8155578d8cfc9342441656 Thanks for the update! If there is a path requiring an index register, I would agree on limiting the memory opclass to exclude indices as you suggest. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1772945253 From adinn at openjdk.org Tue Sep 24 09:53:44 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 24 Sep 2024 09:53:44 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v3] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <07DqhAfjMD9qfeno10HOAuNBeiIul86acqTMpE6YtaY=.2569accb-c0ab-470f-b348-5894831be5d5@github.com> Message-ID: On Fri, 20 Sep 2024 14:22:12 GMT, Hao Sun wrote: >> @RealFYang Thanks! > > Hi @adinn , I encountered Client build failure on AArch64 after this commit. Could you help take a look at it when you have spare time? Thanks. > > Here shows the configuration > > ==================================================== > The existing configuration has been successfully updated in > /tmp/test123/build-release > using configure arguments '--with-debug-level=release --with-version-opt=git-fe80618bf3f --with-jvm-variants=client'. > > Configuration summary: > * Name: /tmp/test123/build-release > * Debug level: release > * HS debug level: product > * JVM variants: client > * JVM features: client: 'cds compiler1 epsilongc g1gc jfr jni-check jvmti management parallelgc serialgc services shenandoahgc vm-structs zgc' > * OpenJDK target: OS: linux, CPU architecture: aarch64, address length: 64 > * Version string: 24-internal-git-fe80618bf3f (24-internal) > * Source date: 1726841146 (2024-09-20T14:05:46Z) > > Tools summary: > * Boot JDK: openjdk version "22.0.2" 2024-07-16 OpenJDK Runtime Environment (build 22.0.2+9-70) OpenJDK 64-Bit Server VM (build 22.0.2+9-70, mixed mode, sharing) (at /usr/lib/jvm/jdk-22.0.2) > * Toolchain: gcc (GNU Compiler Collection) > * C Compiler: Version 13.2.0 (at /usr/bin/gcc) > * C++ Compiler: Version 13.2.0 (at /usr/bin/g++) > > Build performance summary: > * Build jobs: 72 > * Memory limit: 587068 MB > > > And here shows the error msg: > > === Output from failing command(s) repeated here === > * For target hotspot_variant-client_libjvm_objs_sharedRuntime_aarch64.o: > /tmp/test123/jdk-src/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp: In static member function ?static RuntimeStub* SharedRuntime::generate_throw_exception(const char*, address)?: > /tmp/test123/jdk-src/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:2809:3: error: ?TraceTime? was not declared in this scope; did you mean ?traceid?? > 2809 | TraceTime timer(timer_msg, TRACETIME_LOG(Info, startuptime)); > | ^~~~~~~~~ > | traceid > > * All command lines available in /tmp/test123/build-release/make-support/failure-logs. > === End of repeated output === @shqking Thanks for reporting this problem. I reproduced the same failure on aarch64 and have noticed that the problem also arises on arm. File `sharedRuntime_aarch64.cpp` needs to include the header that declares class TraceTime as does `sharedRuntime_arm.cpp`. I raised JDK-8340793 and will push a patch to fix it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20566#issuecomment-2370806327 From duke at openjdk.org Tue Sep 24 10:18:43 2024 From: duke at openjdk.org (duke) Date: Tue, 24 Sep 2024 10:18:43 GMT Subject: RFR: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option [v3] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 07:15:57 GMT, Tom?? Zezula wrote: >> Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. >> >> Expected behavior: >> >> With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. >> With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8340398: Fixed UseJVMCINativeLibrary doc string. @tzezula Your change (at version 28dbd9329a4ad67a39e3ba19767aea2209313382) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21069#issuecomment-2370858957 From duke at openjdk.org Tue Sep 24 10:22:43 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 24 Sep 2024 10:22:43 GMT Subject: Integrated: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 16:53:40 GMT, Tom?? Zezula wrote: > Disabling the JVMCI compiler with `-XX:-UseJVMCICompiler` not only deactivates JVMCI-based CompileBroker compilations but also prevents the loading of the libjvmci compiler. While this works as expected for CompileBroker compilations, it poses issues for the Truffle compiler. When `-XX:-UseJVMCICompiler` is used, Truffle falls back to the jargraal compiler, if available. This behavior may be confusing for Truffle users. > > Expected behavior: > > With `-XX:+UseGraalJIT`, both CompileBroker compilations and Truffle compilations should utilize the libjvmci compiler, if available. > With `-XX:+EnableJVMCI`, CompileBroker compilations should use the C2 compiler, while only Truffle compilations should leverage the libjvmci compiler, if available. This pull request has now been integrated. Changeset: 4cd8c75a Author: Tomas Zezula Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/4cd8c75a55163be33917b1fba9f360ea816f3aa9 Stats: 21 lines in 3 files changed: 12 ins; 3 del; 6 mod 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/21069 From rkennke at openjdk.org Tue Sep 24 11:42:30 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 24 Sep 2024 11:42:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v24] In-Reply-To: References: Message-ID: <7N9vxRKxAK2GCBNlnU5E0Bj0sGV6_T-2QX9fKCCxlWg=.bdee038b-cee3-4c52-825c-d381d3616092@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Improve matching of loadNKlassCompactHeaders on aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/0d8a9236..2c4a7877 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=22-23 Stats: 17 lines in 3 files changed: 5 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From dnsimon at openjdk.org Tue Sep 24 11:55:44 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 11:55:44 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent Message-ID: This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. ------------- Commit messages: - Merge branch 'master' into JDK-8340576 - fix incorrect code in oopMap.inline.hpp - since UseJVMCICompiler implies EnableJVMCI, remove the latter from conjunctive tests of both - disentangle EnableJVMCI and UseJVMCICompiler Changes: https://git.openjdk.org/jdk/pull/21120/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21120&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340576 Stats: 15 lines in 8 files changed: 2 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21120/head:pull/21120 PR: https://git.openjdk.org/jdk/pull/21120 From duke at openjdk.org Tue Sep 24 11:55:44 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 24 Sep 2024 11:55:44 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 11:51:20 GMT, Doug Simon wrote: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. Marked as reviewed by tzezula at github.com (no known OpenJDK username). Looks good to me. ------------- PR Review: https://git.openjdk.org/jdk/pull/21120#pullrequestreview-2321351214 PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2367447737 From dnsimon at openjdk.org Tue Sep 24 11:55:45 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 11:55:45 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 17:05:21 GMT, Tom Rodriguez wrote: >> This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. > > src/hotspot/share/compiler/oopMap.inline.hpp line 69: > >> 67: >> 68: #ifndef COMPILER2 >> 69: COMPILER1_PRESENT(ShouldNotReachHere();) > > As I noted in a private comment, I think this logic is simply wrong for JVMCI but since we never build JVMCI without C2 we never encounter it. Derived oops should only be encountered if C2 in available or if EnableJVMCI is true. I don't really understand the `COMPILER1_PRESENT` guard here either. It seems like it should be more like: > > #ifndef COMPILER2 > #if INCLUDE_JVMCI > if (!EnableJVMCI) > #endif > ShouldNotReachHere(); > #endif // !COMPILER2 Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1772737069 From yzheng at openjdk.org Tue Sep 24 11:55:46 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 24 Sep 2024 11:55:46 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> On Sun, 22 Sep 2024 11:51:20 GMT, Doug Simon wrote: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. src/hotspot/share/jvmci/jvmci_globals.cpp line 83: > 81: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) > 82: > 83: if ((UseJVMCICompiler || EnableJVMCI) && Doesn't `UseJVMCICompiler` require `EnableJVMCI`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1770567413 From duke at openjdk.org Tue Sep 24 11:55:46 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 24 Sep 2024 11:55:46 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> Message-ID: On Sun, 22 Sep 2024 15:20:22 GMT, Doug Simon wrote: >> src/hotspot/share/jvmci/jvmci_globals.cpp line 83: >> >>> 81: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) >>> 82: >>> 83: if ((UseJVMCICompiler || EnableJVMCI) && >> >> Doesn't `UseJVMCICompiler` require `EnableJVMCI`? > > No. This will conflict with my change https://github.com/openjdk/jdk/pull/21069/commits/b75504633bc0f4fcecc1c08552e556b26d9ffbb9. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1770895396 From never at openjdk.org Tue Sep 24 11:55:45 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 24 Sep 2024 11:55:45 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 11:51:20 GMT, Doug Simon wrote: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. src/hotspot/share/compiler/oopMap.inline.hpp line 69: > 67: > 68: #ifndef COMPILER2 > 69: COMPILER1_PRESENT(ShouldNotReachHere();) As I noted in a private comment, I think this logic is simply wrong for JVMCI but since we never build JVMCI without C2 we never encounter it. Derived oops should only be encountered if C2 in available or if EnableJVMCI is true. I don't really understand the `COMPILER1_PRESENT` guard here either. It seems like it should be more like: #ifndef COMPILER2 #if INCLUDE_JVMCI if (!EnableJVMCI) #endif ShouldNotReachHere(); #endif // !COMPILER2 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1771805501 From dnsimon at openjdk.org Tue Sep 24 11:55:46 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 11:55:46 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> Message-ID: On Mon, 23 Sep 2024 07:34:03 GMT, Tom?? Zezula wrote: >> No. > > This will conflict with my change https://github.com/openjdk/jdk/pull/21069/commits/b75504633bc0f4fcecc1c08552e556b26d9ffbb9. No problem - I'll resolve it once your PR is merged. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1770898600 From dnsimon at openjdk.org Tue Sep 24 11:55:46 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 11:55:46 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> References: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> Message-ID: On Sun, 22 Sep 2024 14:28:46 GMT, Yudi Zheng wrote: >> This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. > > src/hotspot/share/jvmci/jvmci_globals.cpp line 83: > >> 81: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) >> 82: >> 83: if ((UseJVMCICompiler || EnableJVMCI) && > > Doesn't `UseJVMCICompiler` require `EnableJVMCI`? No. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1770577425 From jbhateja at openjdk.org Tue Sep 24 15:09:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 24 Sep 2024 15:09:41 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v13] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 19:24:51 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change ifdef from x86 to AMD64 Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20657#pullrequestreview-2325597060 From duke at openjdk.org Tue Sep 24 15:14:47 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 24 Sep 2024 15:14:47 GMT Subject: Integrated: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 00:25:03 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x This pull request has now been integrated. Changeset: 212e3293 Author: vamsi-parasa URL: https://git.openjdk.org/jdk/commit/212e32931cafe446d94219d6c3ffd92261984dff Stats: 980 lines in 26 files changed: 970 ins; 0 del; 10 mod 8338694: x86_64 intrinsic for tanh using libm Reviewed-by: kvn, jbhateja, sgibbons, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/20657 From coleenp at openjdk.org Tue Sep 24 15:40:55 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 24 Sep 2024 15:40:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Fri, 20 Sep 2024 18:11:43 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 >> - review feedback > > src/hotspot/share/memory/metaspace/metablock.hpp line 74: > >> 72: #define METABLOCKFORMATARGS(__block__) p2i((__block__).base()), (__block__).word_size() >> 73: >> 74: } // namespace metaspace > > I am wondering if some of these metaspace changes, that is, the addition of MetaBlock could be upstreamed ahead of the CompactObjectHeaders. Some is refactoring so that you can use the wastage to allocate into class-arena but a lot of this seems neutral to compact object headers, and would reduce this patch and allow different people to focus on just this. For the record, I am fine with these metaspace changes going in with this PR if the timing for that is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1773607587 From duke at openjdk.org Tue Sep 24 22:02:34 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Tue, 24 Sep 2024 22:02:34 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 11:51:20 GMT, Doug Simon wrote: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. Studying these recent changes led me back to #14851 which added jtreg propeties: - `jdk.hasLibgraal`: the libgraal shared library file is present - `vm.libgraal.enabled`: libgraal is used as JIT compiler The latter now feels misleading, since libgraal can be "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. (I'm here b/c we're assembling a distro doing exactly that.) Would it make sense to rename the latter, to reduce ambiguity in the tests? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2372462365 From dnsimon at openjdk.org Tue Sep 24 22:12:34 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 22:12:34 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 22:00:02 GMT, Todd V. Jonker wrote: > Would it make sense to rename the latter, to reduce ambiguity in the tests? Sounds reasonable to me. Maybe `vm.libgraal.jit`? The good news is that there are no current tests using this predicate as far as I can see. Want to take the lead on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2372474408 From duke at openjdk.org Tue Sep 24 22:21:34 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Tue, 24 Sep 2024 22:21:34 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> Message-ID: On Mon, 23 Sep 2024 07:37:01 GMT, Doug Simon wrote: >> This will conflict with my change https://github.com/openjdk/jdk/pull/21069/commits/b75504633bc0f4fcecc1c08552e556b26d9ffbb9. > > No problem - I'll resolve it once your PR is merged. Based just off the flag docs, I think its easy to think that `+EnableJVMCI` is a prerequisite to enabling the other JVMCI flags, along the line of `+UnlockExperimentalVMOptions`. (That was for sure my newbie reading.) Perhaps its docs ([here](https://github.com/openjdk/jdk/blob/0b8c9f6d2397dcb480dc5ae109607d86f2b15619/src/hotspot/share/jvmci/jvmci_globals.hpp#L47-L48)) should be updated to say "Defaults to true if UseJVMCICompiler is true"? TBH I can't wrap my head around what `+EnableJVMCI` _means_; these options have a few chains of "this enables that" making it hard to grok the interactions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1774158571 From duke at openjdk.org Tue Sep 24 22:25:35 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Tue, 24 Sep 2024 22:25:35 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 22:09:42 GMT, Doug Simon wrote: > > Would it make sense to rename the latter, to reduce ambiguity in the tests? > > Sounds reasonable to me. Maybe `vm.libgraal.jit`? The good news is that there are no current tests using this predicate as far as I can see. > > Want to take the lead on this? I like that alternative, and yes I'll work up the patch. I find myself need to fix several jtreg tests to handle this kind of configuration, so it's relevant to my goals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2372492772 From dnsimon at openjdk.org Tue Sep 24 23:05:09 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 23:05:09 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread Message-ID: [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. ------------- Commit messages: - added CompilerThreadCanCallJavaScope Changes: https://git.openjdk.org/jdk/pull/21171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21171&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340733 Stats: 116 lines in 5 files changed: 103 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21171/head:pull/21171 PR: https://git.openjdk.org/jdk/pull/21171 From dnsimon at openjdk.org Wed Sep 25 06:02:13 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 25 Sep 2024 06:02:13 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v2] In-Reply-To: References: Message-ID: <4IDkstmCYal4JfjNPQoRETzmvf97QVOkNFaewE6bacU=.d6a690b6-b0c1-4d16-ad76-8b219351e68a@github.com> > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: added CompilerThreadCanCallJavaScope ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21171/files - new: https://git.openjdk.org/jdk/pull/21171/files/258492da..c3e23c0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21171&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21171&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21171/head:pull/21171 PR: https://git.openjdk.org/jdk/pull/21171 From dnsimon at openjdk.org Wed Sep 25 06:05:15 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 25 Sep 2024 06:05:15 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: Message-ID: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21171/files - new: https://git.openjdk.org/jdk/pull/21171/files/c3e23c0e..882cec4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21171&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21171&range=01-02 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21171/head:pull/21171 PR: https://git.openjdk.org/jdk/pull/21171 From dnsimon at openjdk.org Wed Sep 25 10:32:49 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 25 Sep 2024 10:32:49 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent [v2] In-Reply-To: References: Message-ID: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: clarified doc for EnableJVMCI and UseJVMCINativeLibrary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21120/files - new: https://git.openjdk.org/jdk/pull/21120/files/fcc3ece0..e26e68d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21120&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21120&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21120/head:pull/21120 PR: https://git.openjdk.org/jdk/pull/21120 From dnsimon at openjdk.org Wed Sep 25 10:32:49 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 25 Sep 2024 10:32:49 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent [v2] In-Reply-To: References: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> Message-ID: On Tue, 24 Sep 2024 22:17:46 GMT, Todd V. Jonker wrote: >> No problem - I'll resolve it once your PR is merged. > > Based just off the flag docs, I think its easy to think that `+EnableJVMCI` is a prerequisite to enabling the other JVMCI flags, along the line of `+UnlockExperimentalVMOptions`. (That was for sure my newbie reading.) > > Perhaps its docs ([here](https://github.com/openjdk/jdk/blob/0b8c9f6d2397dcb480dc5ae109607d86f2b15619/src/hotspot/share/jvmci/jvmci_globals.hpp#L47-L48)) should be updated to say "Defaults to true if UseJVMCICompiler is true"? > > TBH I can't wrap my head around what `+EnableJVMCI` _means_; these options have a few chains of "this enables that" making it hard to grok the interactions. I've pushed https://github.com/openjdk/jdk/pull/21120/commits/e26e68d9a70ee27b4e71da86ecb42dca11e9a24f to try clarify this a bit. I know it's a little confusing and apologize. When JVMCI is no longer experimental, this should become much clearer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1774982914 From rkennke at openjdk.org Wed Sep 25 12:34:36 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 25 Sep 2024 12:34:36 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v25] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Enforce lightweight locking on 32-bit platforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/2c4a7877..cd69da86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=23-24 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Wed Sep 25 12:53:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 25 Sep 2024 12:53:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Allow LM_MONITOR on 32-bit platforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/cd69da86..4904d433 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From coleenp at openjdk.org Wed Sep 25 13:12:38 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 25 Sep 2024 13:12:38 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: <8X-gxUxvDE0dkl9VGwDNd3aCa06ABV6Kr7uE1vQLuYE=.8b5de297-045c-4096-96b8-b27c91acc10e@github.com> On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I was looking through and we set the "loaded" state under the Compile_lock (because of dependencies in add_to_hierarchy), we set the "linked", "being_initialized", "fully_initialized" and "initialization_error" under the init_lock object (which I want to change again) with a notify for the latter two. Using a load_acquire to examine the state (and release_store to write) seems like the right thing to do because there isn't just one lock so we should assume reading this state is lock free. It looks like the C2 code optimizes away the clinit_barrier when possible so we can watch for any performance difference but I'd still rather have safety. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2374046227 From rcastanedalo at openjdk.org Wed Sep 25 13:54:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 13:54:59 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> Message-ID: <2adTLZAwTvFTVNGeR5e9Cef5uNqpsz2haeobLIDZiNI=.cb2bbf0d-5c1b-4583-b4bd-898e0c5cdbb7@github.com> On Fri, 13 Sep 2024 06:43:34 GMT, Roberto Casta?eda Lozano wrote: >> I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. > > I see, thanks. In that case, I would suggest removing the explicit `UseCompressedClassPointers` test, since it should be implied by `t->isa_narrowklass()`. `check_init()` within `CompressedKlassPointers::shift()` would already fail for the unexpected case where `t->isa_narrowklass() && !UseCompressedClassPointers`, no? I think it would be good to remove the explicit `UseCompressedClassPointers` test as argued above (i.e. revert this change), unless there is any other reason to keep it that I am missing out? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1775277784 From rcastanedalo at openjdk.org Wed Sep 25 14:19:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 14:19:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/share/opto/memnode.cpp line 2256: > 2254: if (!UseCompactObjectHeaders && alloc != nullptr) { > 2255: return TypeX::make(markWord::prototype().value()); > 2256: } Suggestion: make these four lines conditional on `!UseCompactObjectHeaders`, like so: if (!UseCompactObjectHeaders) { Node* alloc = is_new_object_mark_load(); if (alloc != nullptr) { return TypeX::make(markWord::prototype().value()); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1775322670 From shade at openjdk.org Wed Sep 25 16:43:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Sep 2024 16:43:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I am running performance tests with it, and expect no difference given JITed code normally knows that classes are initialized at JIT compilation time. The impact on interpreter paths is likely not be visible as well. If you can run your set of benchmarks, please do as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2374598301 From coleenp at openjdk.org Wed Sep 25 17:25:38 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 25 Sep 2024 17:25:38 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 16:41:02 GMT, Aleksey Shipilev wrote: > If you can run your set of benchmarks, please do as well. Ok. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2374715067 From duke at openjdk.org Wed Sep 25 21:41:05 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Wed, 25 Sep 2024 21:41:05 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled Message-ID: This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. ------------- Commit messages: - Rename jtreg property vm.libgraal.enabled to vm.libgraal.jit. Changes: https://git.openjdk.org/jdk/pull/21190/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21190&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340974 Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21190/head:pull/21190 PR: https://git.openjdk.org/jdk/pull/21190 From duke at openjdk.org Wed Sep 25 21:41:37 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Wed, 25 Sep 2024 21:41:37 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: <3cVQSC_0_gThdjkUu-UsN5TC1BUtq5ehsKu0BXBlE8U=.1e51b5e5-1493-4f6f-a827-c573fd5916df@github.com> On Tue, 24 Sep 2024 22:09:42 GMT, Doug Simon wrote: >> Studying these recent changes led me back to #14851 which added jtreg propeties: >> >> - `jdk.hasLibgraal`: the libgraal shared library file is present >> - `vm.libgraal.enabled`: libgraal is used as JIT compiler >> >> The latter now feels misleading, since libgraal can be "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. (I'm here b/c we're assembling a distro doing exactly that.) >> >> Would it make sense to rename the latter, to reduce ambiguity in the tests? > >> Would it make sense to rename the latter, to reduce ambiguity in the tests? > > Sounds reasonable to me. Maybe `vm.libgraal.jit`? The good news is that there are no current tests using this predicate as far as I can see. > > Want to take the lead on this? @dougxc I cut an issue https://bugs.openjdk.org/projects/JDK/issues/JDK-8340974 and posted a PR https://github.com/openjdk/jdk/pull/21190 This is my first JDK issue and fix; apologies if I'm getting the process wrong. I wasn't sure if I should tag you on either (or how). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2375311109 From dnsimon at openjdk.org Thu Sep 26 07:26:35 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 26 Sep 2024 07:26:35 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled In-Reply-To: References: Message-ID: <7jMxI7fnbc9aen7sk5qrH4MHt7Pf7eu0lSQaDeav8To=.465b1523-eb84-4e31-9346-38c3165583e8@github.com> On Wed, 25 Sep 2024 19:49:28 GMT, Todd V. Jonker wrote: > This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. > > Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 > > Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. test/jtreg-ext/requires/VMProps.java line 562: > 560: * @return true if libgraal is used as JIT compiler. > 561: */ > 562: protected String isLibgraalJit() { I slightly prefer `isLibgraalJIT` as this acronym is (most) capitalized in the code base. You should also rename `isLibgraalEnabled` to `isLibgraalJIT` in `test/lib/jdk/test/whitebox/code/Compiler.java` for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21190#discussion_r1776521228 From rcastanedalo at openjdk.org Thu Sep 26 09:07:56 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 09:07:56 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5692: > 5690: > 5691: void MacroAssembler::load_klass(Register dst, Register src, Register tmp) { > 5692: BLOCK_COMMENT("load_klass"); I am not sure that the complexity of `MacroAssembler::load_klass` and the two `MacroAssembler::cmp_klass` functions warrant adding block comments, but if you prefer to leave them in, could you use opening and closing comments, as in the other functions in this file (e.g. `MacroAssembler::_verify_oop`)? In that case, please update the comment in the two `MacroAssembler::cmp_klass` functions with a more descriptive name than `cmp_klass 1` and `cmp_klass 2`. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5726: > 5724: #ifdef _LP64 > 5725: if (UseCompactObjectHeaders) { > 5726: load_nklass_compact(tmp, obj); Suggestion: assert here that `tmp != noreg`, just like in `MacroAssembler::cmp_klass(Register src, Register dst, Register tmp1, Register tmp2)` below. Perhaps also assert that the input registers are different. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 379: > 377: // Uses tmp1 and tmp2 as temporary registers. > 378: void cmp_klass(Register src, Register dst, Register tmp1, Register tmp2); > 379: The naming of these two functions could be made clearer and more consistent with their documentation. Please consider renaming the four-argument `cmp_klass` function to `cmp_klasses_from_objects` or similar. The notion of "source" and "destination" in the parameter names is unclear, I suggest to just call them `obj`, `obj1`, `obj2` etc. Please also make sure that the parameter names are consistent in the declaration and definition (e.g. `dst` vs `obj`). src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: > 4006: #ifdef COMPILER2 > 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) { > 4008: generate_string_indexof(StubRoutines::_string_indexof_array); This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task? src/hotspot/share/opto/memnode.cpp line 1976: > 1974: // The field is Klass::_prototype_header. Return its (constant) value. > 1975: assert(this->Opcode() == Op_LoadX, "must load a proper type from _prototype_header"); > 1976: return TypeX::make(klass->prototype_header()); This code is dead, because by the time we call `load_array_final_field` from `LoadNode::Value` (its only caller) we know that if `UseCompactObjectHeaders`, then `tkls->offset() != in_bytes(Klass::prototype_header_offset()` (or else we would have returned from line 2161). Please remove it, or replace it with an assertion if you prefer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776676785 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776628929 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776644021 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776663594 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776621766 From rcastanedalo at openjdk.org Thu Sep 26 09:54:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 09:54:57 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: <4sBfv1qLQjGZnrCuHBPuWp1PNkIDFLBjxMo3z_RR0Mo=.38e699ce-30bc-42fe-86b6-988df6700c82@github.com> On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/x86/x86_64.ad line 4388: > 4386: effect(KILL cr); > 4387: ins_cost(125); // XXX > 4388: format %{ "movl $dst, $mem\t# compressed klass ptr" %} For consistency with the aarch64 back-end: Suggestion: format %{ "load_nklass_compact $dst, $mem\t# compressed klass ptr" %} ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776747538 From rkennke at openjdk.org Thu Sep 26 11:41:53 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 11:41:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 08:55:44 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow LM_MONITOR on 32-bit platforms > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: > >> 4006: #ifdef COMPILER2 >> 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) { >> 4008: generate_string_indexof(StubRoutines::_string_indexof_array); > > This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task? This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776888460 From rcastanedalo at openjdk.org Thu Sep 26 12:16:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 12:16:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2570: > 2568: // we get the heapBase in obj, and the narrowOop+klass_offset_in_bytes/sizeof(narrowOop) in index. > 2569: // When that happens, we need to lea the address into a single register, and subtract the > 2570: // klass_offset_in_bytes, to get the address of the mark-word. Parts of this comment are obsolete after commit 2c4a7877, please update the comment. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 882: > 880: void store_klass(Register dst, Register src); > 881: void cmp_klass(Register oop, Register trial_klass, Register tmp); > 882: void cmp_klass(Register src, Register dst, Register tmp1, Register tmp2); Same suggestion as for the analogous x86 functions: consider renaming the four-argument `cmp_klass` function to `cmp_klasses_from_objects` or similar, and the `src` and `dst` parameters to `oop1` and `oop2` or similar if there is no notion of "source" and "destination". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776927247 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776942226 From duke at openjdk.org Thu Sep 26 12:40:35 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Thu, 26 Sep 2024 12:40:35 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 06:05:15 GMT, Doug Simon wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerThreadCanCallJavaScope.java line 81: > 79: > 80: if (vm != null) { > 81: vm.updateCompilerThreadCanCallJava(!state); This is not correct. The scope has to capture the original `_can_call_java` value in the constructor and restore it here in `close`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21171#discussion_r1776989493 From rcastanedalo at openjdk.org Thu Sep 26 13:07:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 13:07:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 11:39:02 GMT, Roman Kennke wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: >> >>> 4006: #ifdef COMPILER2 >>> 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) { >>> 4008: generate_string_indexof(StubRoutines::_string_indexof_array); >> >> This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task? > > This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 > > If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections. I am not familiar with the `indexOf` implementation, but here is a relevant comment that motivates the assertion: https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777033220 From coleenp at openjdk.org Thu Sep 26 13:10:40 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 26 Sep 2024 13:10:40 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release My benchmarking showed only the normal jitter and no regressions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2376912839 From rkennke at openjdk.org Thu Sep 26 14:00:58 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 14:00:58 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> On Thu, 26 Sep 2024 13:04:57 GMT, Roberto Casta?eda Lozano wrote: >> This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 >> >> If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections. > > I am not familiar with the `indexOf` implementation, but here is a relevant comment that motivates the assertion: https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634. Ok, this is indeed relevant and helpful. This could segfault if we happen to read from the very first object on the heap. I can solve this by allowing to copy only 8 bytes onto the stack: https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 Does this look correct to you? Or better to do it as a follow-up? (It passes a couple of indexOf tests, will run tier1-4 on it). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777134871 From rkennke at openjdk.org Thu Sep 26 14:04:43 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 14:04:43 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v27] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - @robcasloz review comments - Improve CollectedHeap::is_oop() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/4904d433..d48f55d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=25-26 Stats: 86 lines in 10 files changed: 20 ins; 21 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From shade at openjdk.org Thu Sep 26 14:39:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 14:39:37 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release Our performance tests show no effect as well. So I guess we are fine. I would like platform maintainers to look at relevant parts: @RealFYang, @TheRealMDoerr, @RealLucy, @offamitkumar? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2377161569 From rcastanedalo at openjdk.org Thu Sep 26 16:02:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 16:02:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 13:58:02 GMT, Roman Kennke wrote: > Does this look correct to you? Or better to do it as a follow-up? I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777370316 From rkennke at openjdk.org Thu Sep 26 16:18:58 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 16:18:58 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 15:59:50 GMT, Roberto Casta?eda Lozano wrote: >> Ok, this is indeed relevant and helpful. This could segfault if we happen to read from the very first object on the heap. I can solve this by allowing to copy only 8 bytes onto the stack: https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 >> >> Does this look correct to you? Or better to do it as a follow-up? >> (It passes a couple of indexOf tests, will run tier1-4 on it). > >> Does this look correct to you? Or better to do it as a follow-up? > > I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement. @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777396409 From never at openjdk.org Thu Sep 26 17:24:35 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 26 Sep 2024 17:24:35 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent [v2] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 10:32:49 GMT, Doug Simon wrote: >> This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > clarified doc for EnableJVMCI and UseJVMCINativeLibrary This is a nice cleanup. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21120#pullrequestreview-2331928013 From sgibbons at openjdk.org Thu Sep 26 17:27:50 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 26 Sep 2024 17:27:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 16:15:39 GMT, Roman Kennke wrote: >>> Does this look correct to you? Or better to do it as a follow-up? >> >> I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement. > > @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers. @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me. I would prefer this approach instead of not generating the IndexOf intrinsic. Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`? I can see benefits to either - which provides more clarity? I like the assert as it makes the intention clear (thanks!). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777485078 From duke at openjdk.org Thu Sep 26 17:49:50 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Thu, 26 Sep 2024 17:49:50 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled [v2] In-Reply-To: References: Message-ID: > This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. > > Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 > > Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. Todd V. Jonker has updated the pull request incrementally with one additional commit since the last revision: Rename Compiler.isLibgraalEnabled to isLibgraalJIT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21190/files - new: https://git.openjdk.org/jdk/pull/21190/files/386150d4..59a8ed3c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21190&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21190&range=00-01 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21190/head:pull/21190 PR: https://git.openjdk.org/jdk/pull/21190 From duke at openjdk.org Thu Sep 26 17:49:50 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Thu, 26 Sep 2024 17:49:50 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled [v2] In-Reply-To: <7jMxI7fnbc9aen7sk5qrH4MHt7Pf7eu0lSQaDeav8To=.465b1523-eb84-4e31-9346-38c3165583e8@github.com> References: <7jMxI7fnbc9aen7sk5qrH4MHt7Pf7eu0lSQaDeav8To=.465b1523-eb84-4e31-9346-38c3165583e8@github.com> Message-ID: On Thu, 26 Sep 2024 07:23:35 GMT, Doug Simon wrote: >> Todd V. Jonker has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename Compiler.isLibgraalEnabled to isLibgraalJIT > > test/jtreg-ext/requires/VMProps.java line 562: > >> 560: * @return true if libgraal is used as JIT compiler. >> 561: */ >> 562: protected String isLibgraalJit() { > > I slightly prefer `isLibgraalJIT` as this acronym is (most) capitalized in the code base. > > You should also rename `isLibgraalEnabled` to `isLibgraalJIT` in `test/lib/jdk/test/whitebox/code/Compiler.java` for consistency. Renamed both. I was being conservative by limiting scope of change to the relevant interface, and felt that the existing name was reasonable in the `Compiler` scope. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21190#discussion_r1777513248 From dnsimon at openjdk.org Thu Sep 26 19:29:35 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 26 Sep 2024 19:29:35 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled [v2] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 17:49:50 GMT, Todd V. Jonker wrote: >> This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. >> >> Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 >> >> Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. > > Todd V. Jonker has updated the pull request incrementally with one additional commit since the last revision: > > Rename Compiler.isLibgraalEnabled to isLibgraalJIT LGTM. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21190#pullrequestreview-2332175624 From dnsimon at openjdk.org Thu Sep 26 19:39:39 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 26 Sep 2024 19:39:39 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent [v2] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 10:32:49 GMT, Doug Simon wrote: >> This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > clarified doc for EnableJVMCI and UseJVMCINativeLibrary Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2377780416 From dnsimon at openjdk.org Thu Sep 26 19:39:40 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 26 Sep 2024 19:39:40 GMT Subject: Integrated: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 11:51:20 GMT, Doug Simon wrote: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. This pull request has now been integrated. Changeset: 5d062e24 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/5d062e248ec4be7b35f85c341e76aa6d8d6d8b2b Stats: 18 lines in 9 files changed: 2 ins; 4 del; 12 mod 8340576: Some JVMCI flags are inconsistent Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/21120 From duke at openjdk.org Thu Sep 26 20:48:36 2024 From: duke at openjdk.org (duke) Date: Thu, 26 Sep 2024 20:48:36 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled [v2] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 17:49:50 GMT, Todd V. Jonker wrote: >> This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. >> >> Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 >> >> Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. > > Todd V. Jonker has updated the pull request incrementally with one additional commit since the last revision: > > Rename Compiler.isLibgraalEnabled to isLibgraalJIT @openjdk[bot] Your change (at version 59a8ed3c876e6084f95cd15dffa1403b263b8048) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21190#issuecomment-2377905075 From phh at openjdk.org Thu Sep 26 21:40:39 2024 From: phh at openjdk.org (Paul Hohensee) Date: Thu, 26 Sep 2024 21:40:39 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled [v2] In-Reply-To: References: Message-ID: <6ZSR79931Yt_5OPH5-_HS4g4RQzOqU1cBebC3kVvKe0=.ee58ee43-a69c-431e-99f2-a200de999364@github.com> On Thu, 26 Sep 2024 17:49:50 GMT, Todd V. Jonker wrote: >> This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. >> >> Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 >> >> Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. > > Todd V. Jonker has updated the pull request incrementally with one additional commit since the last revision: > > Rename Compiler.isLibgraalEnabled to isLibgraalJIT Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21190#pullrequestreview-2332385804 From duke at openjdk.org Thu Sep 26 21:40:40 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Thu, 26 Sep 2024 21:40:40 GMT Subject: Integrated: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 19:49:28 GMT, Todd V. Jonker wrote: > This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. > > Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 > > Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. This pull request has now been integrated. Changeset: 2349bb7a Author: Todd V. Jonker Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/2349bb7ace0c40c0f19dee81b4a86bed0e855043 Stats: 7 lines in 3 files changed: 0 ins; 1 del; 6 mod 8340974: Ambiguous name of jtreg property vm.libgraal.enabled Reviewed-by: dnsimon, phh ------------- PR: https://git.openjdk.org/jdk/pull/21190 From fyang at openjdk.org Fri Sep 27 05:37:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 27 Sep 2024 05:37:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release Hi, Thanks for the ping. RISC-V part of the change looks fine. Not obvious change witnessed on specjbb numbers. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2332782685 From rkennke at openjdk.org Fri Sep 27 08:27:57 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 27 Sep 2024 08:27:57 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 17:25:06 GMT, Scott Gibbons wrote: >> @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers. > > @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me. I would prefer this approach instead of not generating the IndexOf intrinsic. > > Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`? I can see benefits to either - which provides more clarity? I like the assert as it makes the intention clear (thanks!). I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point. I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778230714 From rkennke at openjdk.org Fri Sep 27 09:41:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 27 Sep 2024 09:41:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v28] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Disable TestSplitPacks::test4a, failing on aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/d48f55d6..059b1573 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=26-27 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From shade at openjdk.org Fri Sep 27 09:57:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 09:57:06 GMT Subject: RFR: 8336103: Clean up confusing Method::is_initializer [v4] In-Reply-To: References: Message-ID: > All around Hotspot, we have calls to `method->is_initializer()`. That method tests for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor, not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. At this point, I think the best way to prevent future accidents is to remove the confusing `is_initializer`. > > The behavioral changes have been handled by already integrated PRs, see the links in JBS. The changes left here are not (supposed to be) changing the behavior. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: Fix ------------- Changes: https://git.openjdk.org/jdk/pull/20120/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=03 Stats: 24 lines in 7 files changed: 4 ins; 13 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20120/head:pull/20120 PR: https://git.openjdk.org/jdk/pull/20120 From amitkumar at openjdk.org Fri Sep 27 10:32:35 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 27 Sep 2024 10:32:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I can't see any regression on s390x as well. @RealLucy maybe a quick look ? ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2333378085 From coleenp at openjdk.org Fri Sep 27 12:58:36 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 27 Sep 2024 12:58:36 GMT Subject: RFR: 8336103: Clean up confusing Method::is_initializer [v4] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 09:57:06 GMT, Aleksey Shipilev wrote: >> All around Hotspot, we have calls to `method->is_initializer()`. That method tests for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor, not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. At this point, I think the best way to prevent future accidents is to remove the confusing `is_initializer`. >> >> The behavioral changes have been handled by already integrated PRs, see the links in JBS. The changes left here are not (supposed to be) changing the behavior. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > Fix This looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20120#pullrequestreview-2333676170 From mdoerr at openjdk.org Fri Sep 27 13:52:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 27 Sep 2024 13:52:38 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release Which benchmarks did you use? Is there any micro benchmark for class initialization? Is this one interesting? https://github.com/clojure/test.benchmark ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379333144 From shade at openjdk.org Fri Sep 27 13:58:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 13:58:40 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation regimes We keep seeing `Reference.clear` native call on hot paths in services in JDK 17+. I would like to get this PR moving again. Please take a look :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379346593 From shade at openjdk.org Fri Sep 27 14:01:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 14:01:37 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 13:49:41 GMT, Martin Doerr wrote: > Which benchmarks did you use? Is there any micro benchmark for class initialization? Is this one interesting? https://github.com/clojure/test.benchmark Our usual corpus of industry-standard benchmarks, like Dacapo, SPECjbb, etc. I don't think we have a microbenchmark that targets class loading specifically. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379351833 From galder at openjdk.org Fri Sep 27 14:18:41 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 14:18:41 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v2] In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: <-IW4I9MWB3up_N8BClv2TvHy2lUuvDk7bGohxIPv5IU=.b2f0e1b6-3ef8-4f97-9331-a7c5ba1046d1@github.com> > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... Galder Zamarre?o has updated the pull request incrementally with 17 additional commits since the last revision: - Remove previous benchmark effort - Multiply array value in reduction for vectorization to kick in - Renamed benchmark methods - Add min/max benchmark that includes loops and reductions - Skip single array benchmarks - Add an intermediate % that is more representative of real life - Fix compilation error - Fix min case to distribute numbers as per probability - Distribute values targetting a branch percentage * Use a random increment algorithm, to create an array of values such that min/max branch percentage matches. - Fix format of assembly for the movl to movq switch - ... and 7 more: https://git.openjdk.org/jdk/compare/3dd72b89...28778c84 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20098/files - new: https://git.openjdk.org/jdk/pull/20098/files/3dd72b89..28778c84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=00-01 Stats: 562 lines in 5 files changed: 418 ins; 132 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/20098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098 PR: https://git.openjdk.org/jdk/pull/20098 From galder at openjdk.org Fri Sep 27 14:18:41 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 14:18:41 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v2] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Wed, 17 Jul 2024 22:48:04 GMT, Jasmine Karthikeyan wrote: >>> The C2 changes look nice! I just added one comment here about style. It would also be good to add some IR tests checking that the intrinsic is creating `MaxL`/`MinL` nodes before macro expansion, and a microbenchmark to compare results. >> >> Thanks for the review. +1 to the IR tests, I'll work on those. >> >> Re: microbenchmark - what do you have exactly in mind? For vectorization performance there is `ReductionPerf` though it's not a microbenchmark per se. Do you want a microbenchmark for the performance of vectorized max/min long? For non-vectorization performance there is `MathBench`. >> >> I would not expect performance differences in `MathBench` because the backend is still the same and this change really benefits vectorization. I've run the min/max long tests on darwin/aarch64 and linux/x64 and indeed I see no difference: >> >> linux/x64 >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.maxLong 0 thrpt 8 1464197.164 ? 27044.205 ops/ms # base >> MathBench.minLong 0 thrpt 8 1469917.328 ? 25397.401 ops/ms # base >> MathBench.maxLong 0 thrpt 8 1469615.250 ? 17950.429 ops/ms # patched >> MathBench.minLong 0 thrpt 8 1456290.514 ? 44455.727 ops/ms # patched >> >> >> darwin/aarch64 >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.maxLong 0 thrpt 8 1739341.447 ? 210983.444 ops/ms # base >> MathBench.minLong 0 thrpt 8 1659547.649 ? 260554.159 ops/ms # base >> MathBench.maxLong 0 thrpt 8 1660449.074 ? 254534.725 ops/ms # patched >> MathBench.minLong 0 thrpt 8 1729728.021 ? 16327.575 ops/ms # patched > >> Do you want a microbenchmark for the performance of vectorized max/min long? > > Yeah, I think a simple benchmark that tests for long min/max vectorization and reduction would be good. I worry that checking performance manually like in `ReductionPerf` can lead to harder to interpret results than with a microbenchmark, especially with vm warmup ? Thanks for looking into this! Following the advice from @jaskarth I've worked on a JMH benchmark for this intrinsic. The benchmarks are pretty straightforward, but the way the data is distributed in the arrays has been designed such that the branch percentage can be controlled. The code uses a random increment/decrement algorithm to distribute the data for testing max. To test min the values are negated. Controlling the branching is an important factor, because the IR/assembly C2 emits can vary depending on the branching characteristics. First, the non-AVX512 results (only posting max results for brevity, same seen with min): Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxLoopBench.longReductionMax 50 10000 thrpt 8 107.609 ? 0.149 ops/ms (non-AVX512, base) MinMaxLoopBench.longReductionMax 80 10000 thrpt 8 107.627 ? 0.150 ops/ms (non-AVX512, base) MinMaxLoopBench.longReductionMax 100 10000 thrpt 8 238.799 ? 5.028 ops/ms (non-AVX512, base) MinMaxLoopBench.longReductionMax 50 10000 thrpt 8 107.575 ? 0.088 ops/ms (non-AVX512, patch) MinMaxLoopBench.longReductionMax 80 10000 thrpt 8 107.594 ? 0.072 ops/ms (non-AVX512, patch) MinMaxLoopBench.longReductionMax 100 10000 thrpt 8 107.514 ? 0.067 ops/ms (non-AVX512, patch) The only situation where this PR is a regression compared to current code is when the one of the branch side is always taken. Why is this? At 50% and 80%, the base code uses cmovlq, but for 100% uses cmp+jump (the branch is for the uncommon trap). The intrinsic the patch adds means that a MinL/MaxL node is always used, and through the macro expansion that always transforms to cmovlq. Next, the AVX-512 results (note that the results were taken in different machines, so the non-AVX-512 and AVX-512 numbers cannot be compared): Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxLoopBench.longReductionMax 50 10000 thrpt 8 492.327 ? 0.106 ops/ms (AVX512, base) MinMaxLoopBench.longReductionMax 80 10000 thrpt 8 492.515 ? 0.044 ops/ms (AVX512, base) MinMaxLoopBench.longReductionMax 100 10000 thrpt 8 232.861 ? 5.859 ops/ms (AVX512, base) MinMaxLoopBench.longReductionMax 50 10000 thrpt 8 492.563 ? 0.452 ops/ms (AVX512, patch) MinMaxLoopBench.longReductionMax 80 10000 thrpt 8 492.478 ? 0.105 ops/ms (AVX512, patch) MinMaxLoopBench.longReductionMax 100 10000 thrpt 8 492.365 ? 0.220 ops/ms (AVX512, patch) Here we see the same thing as in non-AVX512 systems but the other way around. For the base JDK, at 50-80% the CmpL+Bool gets converted into a CMoveL, and via `CMoveNode::Ideal_minmax` it gets converted to MinL/MaxL nodes, so it behaves just like the patched version. At 100% base adds a cmp+jump (for the uncommon trap branch) and because of flow control vectorization is not applied. The patched version behaves the same way regardless of the branch probability. For completeness, here are the numbers from ~longLoopMax~, which tests vectorization of min/max without reduction on AVX-512. The pattern is the same: Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxLoopBench.longLoopMax 50 10000 thrpt 8 66.959 ? 0.426 ops/ms (AVX512, base) MinMaxLoopBench.longLoopMax 80 10000 thrpt 8 66.783 ? 0.342 ops/ms (AVX512, base) MinMaxLoopBench.longLoopMax 100 10000 thrpt 8 55.923 ? 0.390 ops/ms (AVX512, base) MinMaxLoopBench.longLoopMax 50 10000 thrpt 8 67.044 ? 0.535 ops/ms (AVX512, patch) MinMaxLoopBench.longLoopMax 80 10000 thrpt 8 66.600 ? 0.176 ops/ms (AVX512, patch) MinMaxLoopBench.longLoopMax 100 10000 thrpt 8 66.672 ? 0.205 ops/ms (AVX512, patch) Finally, note that the reduction benchmarks only use one array to compute the value. Coming up with a random increment algorithm such that the combination of multiple array values would be higher/lower than the previous one was quite complex. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2379386872 From galder at openjdk.org Fri Sep 27 14:21:57 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 14:21:57 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: - Revert "Implement cmovL as a jump+mov branch" This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. - Revert "Switch movl to movq" This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. - Revert "Fix format of assembly for the movl to movq switch" This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20098/files - new: https://git.openjdk.org/jdk/pull/20098/files/28778c84..16ae2a33 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=01-02 Stats: 9 lines in 1 file changed: 0 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098 PR: https://git.openjdk.org/jdk/pull/20098 From galder at openjdk.org Fri Sep 27 14:24:39 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 14:24:39 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - Revert "Switch movl to movq" > > This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. > - Revert "Fix format of assembly for the movl to movq switch" > > This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. Reverted the ad changes, those are not related to this PR. While exploring the differences in performance between base and the patched version, I wondered why the cmov version was slower than the branch one. As part of that investigation I played around with modifying the ad file to make cmov emit a branch instead. The details of this can be found in https://bugs.openjdk.org/browse/JDK-8340206 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2379401009 From rcastanedalo at openjdk.org Fri Sep 27 14:35:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 27 Sep 2024 14:35:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v28] In-Reply-To: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> References: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> Message-ID: On Thu, 12 Sep 2024 15:42:59 GMT, Thomas Stuefe wrote: >> src/hotspot/share/opto/machnode.cpp line 390: >> >>> 388: t = t->make_ptr(); >>> 389: } >>> 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { >> >> Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. > > I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. @tstuefe @rkennke what do you think about this suggestion? If there is a known case where `t->isa_narrowklass() && !UseCompressedClassPointers` holds, it should be investigated because it might be a symptom of a larger problem. If there is no such a case, I think the explicit `UseCompressedClassPointers` test should be removed to avoid confusion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778724120 From galder at openjdk.org Fri Sep 27 14:40:37 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 14:40:37 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - Revert "Switch movl to movq" > > This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. > - Revert "Fix format of assembly for the movl to movq switch" > > This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. The numbers in https://github.com/openjdk/jdk/pull/20098#issuecomment-2379386872 might have been obtained with the ad changes included in them. I'll re-run the benchmarks again (in about ~2 weeks time) and post the results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2379434675 From sgibbons at openjdk.org Fri Sep 27 14:47:51 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 27 Sep 2024 14:47:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Fri, 27 Sep 2024 08:24:50 GMT, Roman Kennke wrote: >> @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me. I would prefer this approach instead of not generating the IndexOf intrinsic. >> >> Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`? I can see benefits to either - which provides more clarity? I like the assert as it makes the intention clear (thanks!). > > I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point. > > I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement. I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away. The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block. So there will not be an additional branch for the code when it is executed. I'm good with a comment tying `UseCompactObjectHeaders` to the condition. The comment can be removed when the flag is removed. "Ship it" :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778739517 From mdoerr at openjdk.org Fri Sep 27 16:17:37 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 27 Sep 2024 16:17:37 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I've run some of these benchmarks on PPC64le and couldn't spot a regression, but the results are not very stable and I guess that they are not very sensitive to class initialization. I really wonder about the acquire barrier in `LIR_Assembler::emit_alloc_obj`. The interesting fields of the class are already read by `LIRGenerator::new_instance` during compile time. How can an acquire barrier after the execution help? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379632980 From rkennke at openjdk.org Fri Sep 27 16:25:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 27 Sep 2024 16:25:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Fri, 27 Sep 2024 14:44:35 GMT, Scott Gibbons wrote: >> I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point. >> >> I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement. > > I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away. The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block. So there will not be an additional branch for the code when it is executed. > > I'm good with a comment tying `UseCompactObjectHeaders` to the condition. The comment can be removed when the flag is removed. "Ship it" :-) Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is slated to become *on by default* and then slated to go away. That means that array base offets <= 16 bytes will become the default. The generated code will be something like: if (haystack_len <= 8) { // Copy 8 bytes onto stack } else if (haystack_len <= 16) { // Copy 16 bytes onto stack } else { // Copy 32 bytes onto stack } So that is 2 branches in this prologue code instead of originally 1. However, I just noticed that what I proposed is not enough. Consider what happens when haystack_len is 17. This would take the last case and copy 32 bytes. But we only have 17+8=25 bytes that we can guarantee to be available for copying. If this happens to be the array at the very beginning of the heap (very rare/unlikely), this would segfault. I think I need to mull over it some more to come up with a correct fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778874906 From yzheng at openjdk.org Fri Sep 27 16:34:55 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 27 Sep 2024 16:34:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> Message-ID: On Thu, 19 Sep 2024 14:22:51 GMT, Stefan Karlsson wrote: >> We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. > > This is my current work-in-progress code: > https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2 > > I've made some large rewrites and I'm currently running it through functional testing. If @stefank 's patch does not go in this PR, could you please export `Klass::_prototype_header` to JVMCI? Thanks! diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index 9d1b8a1cb9f..e462025074f 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -278,6 +278,7 @@ nonstatic_field(Klass, _bitmap, uintx) \ nonstatic_field(Klass, _hash_slot, uint8_t) \ nonstatic_field(Klass, _misc_flags._flags, u1) \ + nonstatic_field(Klass, _prototype_header, markWord) \ \ nonstatic_field(LocalVariableTableElement, start_bci, u2) \ nonstatic_field(LocalVariableTableElement, length, u2) \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778884055 From galder at openjdk.org Fri Sep 27 16:53:47 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 16:53:47 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: <2F2rvBSpHjpMXu40xa2hUUqWQYJJihO7mvXD73OCqKQ=.4cf78e1b-6d7a-4188-888d-f901fcf338cc@github.com> On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - Revert "Switch movl to movq" > > This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. > - Revert "Fix format of assembly for the movl to movq switch" > > This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. Failure seems related to the patch, I'll look at it when I re-execute the benchmarks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2379688866 From aph at openjdk.org Fri Sep 27 17:44:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Sep 2024 17:44:36 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 14:15:04 GMT, Galder Zamarre?o wrote: > The only situation where this PR is a regression compared to current code is when the one of the branch side is always taken. Bear in mind that's quite common. It's not very unusual to clip a range with something equivalent to `x = min(max(x, lowest), highest)`. What does benchmarking that look like, when all the `x` are within that range? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2379768983 From kvn at openjdk.org Fri Sep 27 17:47:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 27 Sep 2024 17:47:37 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation regimes Is ZGC affected by this? I see only G1 and Shenandoah changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379772205 From shade at openjdk.org Fri Sep 27 18:14:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 18:14:36 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 16:14:29 GMT, Martin Doerr wrote: > I really wonder about the acquire barrier in `LIR_Assembler::emit_alloc_obj`. The interesting fields of the class are already read by `LIRGenerator::new_instance` during compile time. How can an acquire barrier after the execution help? At least, it doesn't help the allocation itself. Well, that's the thing: if compiler _does not know_ the class is initialized, it emits the runtime check for class initialization. Here, in `LIRGenerator::new_instance` we enter with `init_check = true` (`!klass->is_initialized()`): https://github.com/openjdk/jdk/blob/65200a9589e46956a2194b20c4c90d003351a539/src/hotspot/share/c1/c1_LIRGenerator.cpp#L670-L671 In generated code, we come to `init_check` block, check at runtime if class is fully initialized, and proceed to the rest of allocation path if so: https://github.com/openjdk/jdk/blob/f554c3ffce7599fdb535b03db4a6ea96870b3c2d/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp#L2275-L2277 If we were the only thread, it would not have been a problem: on first entry we would have called the stub, initialized the class and completed the allocation there. Next time around we would have passed `init_check == fully_initialized`, and proceeded without calling a stub. But the caveat we are handling in this PR is that if _some other thread_ might have completed the class initialization, we need to make sure _this thread_ sees the class state consistently. For example, if its Java constructor reads class statics written in ``. The initializing thread would do release-store for `init_check = fully_initialized`. On this reader side, we need a related acquire-load in the runtime check. Since runtime check does not run often -- most of the time compilers know the class is definitely initialized, the change does not affect performance all that much, if at all. Makes sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379809426 From shade at openjdk.org Fri Sep 27 19:00:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 19:00:39 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 27 Sep 2024 17:44:38 GMT, Vladimir Kozlov wrote: > Is ZGC affected by this? I see only G1 and Shenandoah changes. Good question. ZGC expands the GC barriers late. This is why the IR test configuration that tests ZGC shows the same result as with other collectors: no additional fluff in IR. I would not expect we need anything else in late expansion for ZGC for Reference.clear, but maybe I am tired and cannot see it. Can you confirm this is fine, @fisk? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379881102 From mdoerr at openjdk.org Fri Sep 27 19:02:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 27 Sep 2024 19:02:38 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release Thanks for the explanation. This makes sense. Nevertheless, the aforementioned `membar_storestore()` follows the allocation immediately and it includes an acquire barrier for the current thread, too. So, the extra acquire is redundant. At least for the C1 code and probably at more places. This is not so obvious, so we may be able to live with what you have as long as performance is ok. Otherwise, we could still do a follow-up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379883568 From shade at openjdk.org Fri Sep 27 19:11:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 19:11:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 18:59:36 GMT, Martin Doerr wrote: > Nevertheless, the aforementioned `membar_storestore()` follows the allocation immediately and it includes an acquire barrier for the current thread, too. OK, I see what you are getting at. But isn't that barrier still too late? See: Thread 1 (in "new A()"): IK::init_state = fully_initialized [stalls] Thread 2: (also in "new A()"): membar_storestore(); // <---- nothing to cumulate with yet // not seeing result fully, no barriers! Thread 1: [resumes] membar_storestore(); // <---- too late! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379895063 From kvn at openjdk.org Fri Sep 27 19:20:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 27 Sep 2024 19:20:39 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation regimes There is coming JEP for later G1 barriers expansion similar to ZGC. Will you still need this intrinsic after it? I assume Shenandoah will follow G1 later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379910959 From mdoerr at openjdk.org Fri Sep 27 19:25:35 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 27 Sep 2024 19:25:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release The point is that compiled `` is only a TLAB pointer bump. It doesn't read anything from the class. We only need an acquire barrier anywhere between `` and ``. The latter will always see ` result fully` because `membar_storestore();` acts as acquire barrier (PPC64 specifically). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379917187 From shade at openjdk.org Fri Sep 27 22:41:43 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 22:41:43 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release All right, granted. We can make an argument that a release store to `IK::init_state` can be matched with cumulative barrier like `storestore` at the end of C1-compiled allocation code. That said, it looks quite fragile, since: a) it depends on cumulative properties of low-level hardware primitives, and b) it likely only holds true for C1, as C2 normally coalesces header-protecting `storestore` with the final field `storestore`. Given that we expect no perf problems on this seemingly rare path, I prefer not to go into exploiting those specifics, unless you feel strongly otherwise :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2380238637 From kbarrett at openjdk.org Fri Sep 27 23:56:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 27 Sep 2024 23:56:37 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation regimes Changes requested by kbarrett (Reviewer). src/java.base/share/classes/java/lang/ref/Reference.java line 420: > 418: /* Implementation of clear(), also used by enqueue(). A simple > 419: * assignment of the referent field won't do for some garbage > 420: * collectors. Description of clear0 is rendered stale by this change. The first sentence is no longer true, since it's now clearImpl that has that role. The second sentence probably ought to also be moved into the description of clearImpl. ------------- PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2334816850 PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1779311136 From pminborg at openjdk.org Mon Sep 30 06:15:38 2024 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 30 Sep 2024 06:15:38 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 17:42:25 GMT, Andrew Haley wrote: > > The only situation where this PR is a regression compared to current code is when the one of the branch side is always taken. > > Bear in mind that's quite common. It's not very unusual to clip a range with something equivalent to `x = min(max(x, lowest), highest)`. What does benchmarking that look like, when all the `x` are within that range? In fact, the new `Math::clamp` methods do just this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2382197333 From mdoerr at openjdk.org Mon Sep 30 09:32:35 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 30 Sep 2024 09:32:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I've done a bit of research and it seems like the C2 clinit barrier is only used very rarely in a corner case while the C1 parts are not so infrequently used. Peak performance doesn't seem to be affected. So, I don't see any reason for optimizing C2, either. The shared code LGTM. The more frequently used parts are in platform specific code, so it might make sense to optimize the PPC64 parts. Also note that the "isync trick" is a faster acquire barrier than "lwsync". What do you think about this? diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp index 61f654c9cfa..684c06614a9 100644 --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp @@ -2274,7 +2274,7 @@ void LIR_Assembler::emit_alloc_obj(LIR_OpAllocObj* op) { } __ lbz(op->tmp1()->as_register(), in_bytes(InstanceKlass::init_state_offset()), op->klass()->as_register()); - __ lwsync(); // acquire + // acquire barrier included in membar_storestore() which follows the allocation immediately. __ cmpwi(CCR0, op->tmp1()->as_register(), InstanceKlass::fully_initialized); __ bc_far_optimized(Assembler::bcondCRbiIs0, __ bi0(CCR0, Assembler::equal), *op->stub()->entry()); } diff --git a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp index e73e617b8ca..bf2b2540e35 100644 --- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp @@ -2410,7 +2410,7 @@ void MacroAssembler::verify_secondary_supers_table(Register r_sub_klass, void MacroAssembler::clinit_barrier(Register klass, Register thread, Label* L_fast_path, Label* L_slow_path) { assert(L_fast_path != nullptr || L_slow_path != nullptr, "at least one is required"); - Label L_fallthrough; + Label L_check_thread, L_fallthrough; if (L_fast_path == nullptr) { L_fast_path = &L_fallthrough; } else if (L_slow_path == nullptr) { @@ -2419,11 +2419,14 @@ void MacroAssembler::clinit_barrier(Register klass, Register thread, Label* L_fa // Fast path check: class is fully initialized lbz(R0, in_bytes(InstanceKlass::init_state_offset()), klass); - lwsync(); // acquire + // acquire by cmp-branch-isync if fully_initialized cmpwi(CCR0, R0, InstanceKlass::fully_initialized); - beq(CCR0, *L_fast_path); + bne(CCR0, L_check_thread); + isync(); + b(*L_fast_path); // Fast path check: current thread is initializer thread + bind(L_check_thread); ld(R0, in_bytes(InstanceKlass::init_thread_offset()), klass); cmpd(CCR0, thread, R0); if (L_slow_path == &L_fallthrough) { ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2382609010 From thartmann at openjdk.org Mon Sep 30 10:38:37 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 30 Sep 2024 10:38:37 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - Revert "Switch movl to movq" > > This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. > - Revert "Fix format of assembly for the movl to movq switch" > > This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. You've probably seen this but the new test is failing IR verification: Failed IR Rules (4) of Methods (4) ---------------------------------- 1) Method "private static double compiler.intrinsics.math.TestMinMaxInlining.testDoubleMax(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_D#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 2) Method "private static double compiler.intrinsics.math.TestMinMaxInlining.testDoubleMin(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 3) Method "private static float compiler.intrinsics.math.TestMinMaxInlining.testFloatMax(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_F#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 4) Method "private static float compiler.intrinsics.math.TestMinMaxInlining.testFloatMin(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2382746375 From rcastanedalo at openjdk.org Mon Sep 30 12:40:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 12:40:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:20:14 GMT, Emanuel Peter wrote: > Indeed, I could re-enable all tests in: > > ``` > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > ``` > > but unfortunately not those others: > > ``` > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > ``` > > I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset. > > I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it. @rkennke A test run of the current changeset in our internal CI system revealed that the following tests fail (because of missing vectorization) when using `-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:UseSSE=N` with `N <= 3` on an Intel Xeon Platinum 8358 machine: - test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java - test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java - test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java Here are the failure details: test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java: 1) Method "public static void compiler.c2.irTests.TestVectorizationNotRun.test(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java: 1) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte1(byte[],byte[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 2) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte2(byte[],byte[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 3) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong1(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 4) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong2(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 5) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong3(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 6) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong5(byte[],long[],int,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java: 1) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndComplexExpression()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 2) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndInvariant()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2383072505 From eosterlund at openjdk.org Mon Sep 30 15:11:40 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 30 Sep 2024 15:11:40 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 27 Sep 2024 18:57:51 GMT, Aleksey Shipilev wrote: > > Is ZGC affected by this? I see only G1 and Shenandoah changes. > > Good question. > > ZGC expands the GC barriers late. This is why the IR test configuration that tests ZGC shows the same result as with other collectors: no additional fluff in IR. I would not expect we need anything else in late expansion for ZGC for Reference.clear, but maybe I am tired and cannot see it. Can you confirm this is fine, @fisk? ZGC needs some changes. Without doing anything, we propagate the AS_NO_KEEPALIVE decorator to the corresponding ZBarrierNoKeepalive bit being set in the barrier data of the StorePNode. However, we don't really do anything special with that information and we will in practice end up keeping the referent alive when clearing it with generational ZGC. The point of introducing the native implementation in the first place, was to make sure our GCs don't keep the referent alive when clearing it, as the user intention is clearly to not keep it alive. I think we need a new ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing(oop* p) and to make that the selected slow path function when ZBarrierNoKeepalive is set on a StorePNode. Its implementation would call ZBarrier::no_keep_alive_store_barrier_on_heap_oop_field. This should do the trick. Please let me know if you need further assistance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2383479242 From shade at openjdk.org Mon Sep 30 16:36:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 16:36:20 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v4] In-Reply-To: References: Message-ID: <836Da9CyLC9qQtoFe9YVHau-ftjmXFr87Xw2wy2DIxc=.19870ac5-3c52-4632-8093-ea47938642de@github.com> > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Attempt at implementing ZGC AArch64 parts - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Amend the test case for guaranteing it works under different compilation regimes - More precise barriers - Tests work - More touchups - Fixing the conditions, fixing the tests - Crude prototype, still failing the tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/437f2329..cba0a8e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=02-03 Stats: 258786 lines in 3084 files changed: 211178 ins; 30411 del; 17197 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From shade at openjdk.org Mon Sep 30 16:36:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 16:36:20 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Mon, 30 Sep 2024 15:08:53 GMT, Erik ?sterlund wrote: > I think we need a new ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing(oop* p) and to make that the selected slow path function when ZBarrierNoKeepalive is set on a StorePNode. Its implementation would call ZBarrier::no_keep_alive_store_barrier_on_heap_oop_field. This should do the trick. Thanks! See new commits: is that the shape you were thinking of? Once we get AArch64 parts right, I'll copy-paste that to other arches. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2383669579 From shade at openjdk.org Mon Sep 30 16:50:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 16:50:13 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v5] In-Reply-To: References: Message-ID: <4wz9wweAYNZCLw5c5fpldlEBgVyCh9io9iUih_hFVKM=.2d4a7679-0652-4f2c-8304-99bd84367519@github.com> > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - Fix other arches - Tighten up comments in Reference javadoc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/cba0a8e9..8ba681a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=03-04 Stats: 16 lines in 4 files changed: 7 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From shade at openjdk.org Mon Sep 30 16:50:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 16:50:13 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 27 Sep 2024 23:51:13 GMT, Kim Barrett wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Amend the test case for guaranteing it works under different compilation regimes > > src/java.base/share/classes/java/lang/ref/Reference.java line 420: > >> 418: /* Implementation of clear(), also used by enqueue(). A simple >> 419: * assignment of the referent field won't do for some garbage >> 420: * collectors. > > Description of clear0 is rendered stale by this change. The first sentence is no longer true, since it's now > clearImpl that has that role. The second sentence probably ought to also be moved into the description of > clearImpl. Thanks! I tightened up comments a bit, take another look? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1781452602 From shade at openjdk.org Mon Sep 30 16:59:16 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 16:59:16 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v6] In-Reply-To: References: Message-ID: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Also dispatch to slow-path on other arches ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/8ba681a4..4fe4a911 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=04-05 Stats: 6 lines in 3 files changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From shade at openjdk.org Mon Sep 30 17:11:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 17:11:57 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v3] In-Reply-To: References: Message-ID: > See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. > > In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. > > Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). > > I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] GHA to test platform buildability + adhoc platform cross-compilation Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8338379-class-init-checks - Pick up PPC64 patch from Martin - Relax to just a release - Initial version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21110/files - new: https://git.openjdk.org/jdk/pull/21110/files/179d8aa1..a7895d94 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21110&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21110&range=01-02 Stats: 20318 lines in 465 files changed: 14509 ins; 3375 del; 2434 mod Patch: https://git.openjdk.org/jdk/pull/21110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21110/head:pull/21110 PR: https://git.openjdk.org/jdk/pull/21110 From shade at openjdk.org Mon Sep 30 17:11:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 17:11:57 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 09:30:23 GMT, Martin Doerr wrote: > The more frequently used parts are in platform specific code, so it might make sense to optimize the PPC64 parts. Also note that the "isync trick" is a faster acquire barrier than "lwsync". What do you think about this? I don't mind, and what you say as maintainer of PPC64 code goes :) I merged the patch in this PR, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2383736192 From rkennke at openjdk.org Mon Sep 30 17:50:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 30 Sep 2024 17:50:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Fri, 27 Sep 2024 16:23:15 GMT, Roman Kennke wrote: >> I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away. The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block. So there will not be an additional branch for the code when it is executed. >> >> I'm good with a comment tying `UseCompactObjectHeaders` to the condition. The comment can be removed when the flag is removed. "Ship it" :-) > > Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is slated to become *on by default* and then slated to go away. That means that array base offets <= 16 bytes will become the default. The generated code will be something like: > > > if (haystack_len <= 8) { > // Copy 8 bytes onto stack > } else if (haystack_len <= 16) { > // Copy 16 bytes onto stack > } else { > // Copy 32 bytes onto stack > } > > > So that is 2 branches in this prologue code instead of originally 1. > > However, I just noticed that what I proposed is not enough. Consider what happens when haystack_len is 17. This would take the last case and copy 32 bytes. But we only have 17+8=25 bytes that we can guarantee to be available for copying. If this happens to be the array at the very beginning of the heap (very rare/unlikely), this would segfault. > > I think I need to mull over it some more to come up with a correct fix. I changed the header<16 version to be a small loop: https://github.com/rkennke/jdk/commit/bcba264ea5c15581647933db1163ca1dae39b6c5 The idea is the same as before, except it's made as a small loop with a maximum of 4 iterations (backward-branches), and it copies 8 bytes at a time, such that 1. it may copy up to 7 bytes that precede the array and 2. doesn't run over the end of the array (which would potentially crash). I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would encode better to use one of the regular registers.? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1781535745 From mdoerr at openjdk.org Mon Sep 30 20:52:37 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 30 Sep 2024 20:52:37 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v3] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 17:11:57 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8338379-class-init-checks > - Pick up PPC64 patch from Martin > - Relax to just a release > - Initial version Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2338538916 From sviswanathan at openjdk.org Mon Sep 30 21:02:42 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 21:02:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. src/hotspot/share/opto/vectorIntrinsics.cpp line 2698: > 2696: int cast_vopc = VectorCastNode::opcode(-1, elem_bt, true); > 2697: if (is_floating_point_type(elem_bt)) { > 2698: index_elem_bt = elem_bt == T_FLOAT ? T_INT : T_LONG; index_elem_bt is already assigned at line 2676 and 2678 so this line could be removed. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 551: > 549: return ((ByteVector)src1).vectorFactory(res); > 550: } > 551: This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1777839817 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1779722306 From psandoz at openjdk.org Mon Sep 30 21:30:42 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 30 Sep 2024 21:30:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sat, 28 Sep 2024 17:37:10 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 551: > >> 549: return ((ByteVector)src1).vectorFactory(res); >> 550: } >> 551: > > This could instead be: > src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); > Or even simplified to: > src1.rearrange(this.toShuffle(), src2); I think you have to do the masking before conversion - `vec.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle()` is not the same as `vec.toShuffle()` for all inputs. jshell> IntVector indexes = IntVector.fromArray(IntVector.SPECIES_256, new int[] {0, 1, 8, 9, 16, 17, 24, 25}, 0); indexes ==> [0, 1, 8, 9, 16, 17, 24, 25] jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1) $19 ==> [0, 1, 8, 9, 0, 1, 8, 9] jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1).toShuffle() $20 ==> Shuffle[0, 1, -8, -7, 0, 1, -8, -7] jshell> indexes.toShuffle() $21 ==> Shuffle[0, 1, -8, -7, -8, -7, -8, -7] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781753872 From sviswanathan at openjdk.org Mon Sep 30 22:42:41 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 22:42:41 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. src/hotspot/cpu/x86/x86.ad line 10490: > 10488: %{ > 10489: match(Set index (SelectFromTwoVector (Binary index src1) src2)); > 10490: effect(TEMP index); Just curious, why do we need TEMP index effect? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781742786 From sviswanathan at openjdk.org Mon Sep 30 22:42:42 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 22:42:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Mon, 30 Sep 2024 21:28:22 GMT, Paul Sandoz wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 551: >> >>> 549: return ((ByteVector)src1).vectorFactory(res); >>> 550: } >>> 551: >> >> This could instead be: >> src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); >> Or even simplified to: >> src1.rearrange(this.toShuffle(), src2); > > I think you have to do the masking before conversion - `vec.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle()` is not the same as `vec.toShuffle()` for all inputs. > > > jshell> IntVector indexes = IntVector.fromArray(IntVector.SPECIES_256, new int[] {0, 1, 8, 9, 16, 17, 24, 25}, 0); > indexes ==> [0, 1, 8, 9, 16, 17, 24, 25] > > jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1) > $19 ==> [0, 1, 8, 9, 0, 1, 8, 9] > > jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1).toShuffle() > $20 ==> Shuffle[0, 1, -8, -7, 0, 1, -8, -7] > > jshell> indexes.toShuffle() > $21 ==> Shuffle[0, 1, -8, -7, -8, -7, -8, -7] Thanks for the example. Yes, you have a point there. So we would need to do: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781859166 From sviswanathan at openjdk.org Mon Sep 30 23:17:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 23:17:43 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. src/hotspot/share/opto/vectorIntrinsics.cpp line 2689: > 2687: !arch_supports_vector(cast_vopc, num_elem, T_BYTE, VecMaskNotUsed) || > 2688: !arch_supports_vector(Op_VectorLoadShuffle, num_elem, index_elem_bt, VecMaskNotUsed) || > 2689: !arch_supports_vector(Op_Replicate, num_elem, T_BYTE, VecMaskNotUsed)) { Where SelectFromTwoVector is not supported, the alternate implementation is as part of SelectFromTwoVectorNode::Ideal() instead of right here. A comment both here as well as in the Ideal() implementation is needed to keep these checks in sync. src/hotspot/share/opto/vectornode.cpp line 2120: > 2118: // are held in a byte vector which are later transformed to target specific permutation > 2119: // index format by subsequent VectorLoadShuffle. > 2120: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); Good to use -1 when we are not sending an actual opcode: int cast_vopc = VectorCastNode::opcode(-1, index_elem_bt, true); src/hotspot/share/opto/vectornode.cpp line 2126: > 2124: Node* bcast_lane_cnt_m1_vec = phase->transform(VectorNode::scalar2vector(lane_cnt_m1, num_elem, Type::get_const_basic_type(T_BYTE), false)); > 2125: > 2126: // Compute the blend mask for merging two indipendently permututed vectors Typo indipendently -> independently ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781867326 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781873682 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781888912 From sviswanathan at openjdk.org Mon Sep 30 23:17:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 23:17:43 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <6NYy2NcP98xm3QRYdBWaAkkrvTdquMhhWnm-svxQjwE=.955f6dc8-c74c-472b-8c32-10228bb68d99@github.com> On Mon, 30 Sep 2024 22:51:57 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2689: > >> 2687: !arch_supports_vector(cast_vopc, num_elem, T_BYTE, VecMaskNotUsed) || >> 2688: !arch_supports_vector(Op_VectorLoadShuffle, num_elem, index_elem_bt, VecMaskNotUsed) || >> 2689: !arch_supports_vector(Op_Replicate, num_elem, T_BYTE, VecMaskNotUsed)) { > > Where SelectFromTwoVector is not supported, the alternate implementation is as part of SelectFromTwoVectorNode::Ideal() instead of right here. A comment both here as well as in the Ideal() implementation is needed to keep these checks in sync. We need to add VectorMaskCast here in the checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781886783