From ksakata at openjdk.org Tue Aug 1 10:48:53 2023 From: ksakata at openjdk.org (Koichi Sakata) Date: Tue, 1 Aug 2023 10:48:53 GMT Subject: RFR: 8312420: Integrate Graal's blender micro benchmark [v2] In-Reply-To: <-1W6PRns_akk9mk3yUsfSNQJFSXfljIHYJbmjmAk9SE=.17a2f0e4-d6e3-4d54-9e27-a830b095de8c@github.com> References: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> <-1W6PRns_akk9mk3yUsfSNQJFSXfljIHYJbmjmAk9SE=.17a2f0e4-d6e3-4d54-9e27-a830b095de8c@github.com> Message-ID: On Tue, 25 Jul 2023 17:03:07 GMT, Joshua Cao wrote: >> We would like to integrate Graal's blender micro benchmark from https://www.graalvm.org/22.1/examples/java-performance-examples/. We have been using this benchmark to test our partial escape analysis work (https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2023-July/066670.html). This test can exist independently of the project. >> >> >> example command to run test: >> >> >> make run-test TEST=micro:org.openjdk.bench.vm.compiler.pea.Blender MICRO="FORK=1;OPTIONS=-prof gc -gc true" >> >> >> example output (not complete): >> >> >> Benchmark (iteration) Mode Cnt Score Error Units [29/1913] >> Blender.initialize 1 avgt 227997775.000 ns/op >> Blender.initialize:?gc.alloc.rate 1 avgt 167.192 MB/sec >> Blender.initialize:?gc.alloc.rate.norm 1 avgt 40000081.600 B/op >> Blender.initialize:?gc.count 1 avgt 4.000 counts >> Blender.initialize:?gc.time 1 avgt 65.000 ms >> Blender.initialize 2 avgt 226255767.800 ns/op >> Blender.initialize:?gc.alloc.rate 2 avgt 168.466 MB/sec >> Blender.initialize:?gc.alloc.rate.norm 2 avgt 40000081.600 B/op >> Blender.initialize:?gc.count 2 avgt 4.000 counts >> Blender.initialize:?gc.time 2 avgt 58.000 ms >> Blender.initialize 3 avgt 225596324.600 ns/op >> Blender.initialize:?gc.alloc.rate 3 avgt 168.960 MB/sec >> Blender.initialize:?gc.alloc.rate.norm 3 avgt 40000081.600 B/op >> Blender.initialize:?gc.count 3 avgt 4.000 counts >> Blender.initialize:?gc.time 3 avgt 55.000 ms >> Blender.initialize 4 avgt 224856811.000 ns/op >> Blender.initialize:?gc.alloc.rate 4 avgt 169.520 MB/sec >> Blender.initialize:?gc.alloc.rate.norm 4 avgt 40000081.600 B/op >> Blender.initialize:?gc.count 4 avgt ... > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > change Amazon license to Oracle Marked as reviewed by ksakata (Committer). I'll sponsor you. ------------- PR Review: https://git.openjdk.org/jdk/pull/14941#pullrequestreview-1556701088 PR Comment: https://git.openjdk.org/jdk/pull/14941#issuecomment-1660056063 From duke at openjdk.org Tue Aug 1 10:51:03 2023 From: duke at openjdk.org (Joshua Cao) Date: Tue, 1 Aug 2023 10:51:03 GMT Subject: Integrated: 8312420: Integrate Graal's blender micro benchmark In-Reply-To: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> References: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> Message-ID: On Wed, 19 Jul 2023 22:31:40 GMT, Joshua Cao wrote: > We would like to integrate Graal's blender micro benchmark from https://www.graalvm.org/22.1/examples/java-performance-examples/. We have been using this benchmark to test our partial escape analysis work (https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2023-July/066670.html). This test can exist independently of the project. > > > example command to run test: > > > make run-test TEST=micro:org.openjdk.bench.vm.compiler.pea.Blender MICRO="FORK=1;OPTIONS=-prof gc -gc true" > > > example output (not complete): > > > Benchmark (iteration) Mode Cnt Score Error Units [29/1913] > Blender.initialize 1 avgt 227997775.000 ns/op > Blender.initialize:?gc.alloc.rate 1 avgt 167.192 MB/sec > Blender.initialize:?gc.alloc.rate.norm 1 avgt 40000081.600 B/op > Blender.initialize:?gc.count 1 avgt 4.000 counts > Blender.initialize:?gc.time 1 avgt 65.000 ms > Blender.initialize 2 avgt 226255767.800 ns/op > Blender.initialize:?gc.alloc.rate 2 avgt 168.466 MB/sec > Blender.initialize:?gc.alloc.rate.norm 2 avgt 40000081.600 B/op > Blender.initialize:?gc.count 2 avgt 4.000 counts > Blender.initialize:?gc.time 2 avgt 58.000 ms > Blender.initialize 3 avgt 225596324.600 ns/op > Blender.initialize:?gc.alloc.rate 3 avgt 168.960 MB/sec > Blender.initialize:?gc.alloc.rate.norm 3 avgt 40000081.600 B/op > Blender.initialize:?gc.count 3 avgt 4.000 counts > Blender.initialize:?gc.time 3 avgt 55.000 ms > Blender.initialize 4 avgt 224856811.000 ns/op > Blender.initialize:?gc.alloc.rate 4 avgt 169.520 MB/sec > Blender.initialize:?gc.alloc.rate.norm 4 avgt 40000081.600 B/op > Blender.initialize:?gc.count 4 avgt 4.000 counts > Blender.initialize:?gc.time ... This pull request has now been integrated. Changeset: e36960ec Author: Joshua Cao Committer: Koichi Sakata URL: https://git.openjdk.org/jdk/commit/e36960ec6d543b48a7739e249c4a18883b2723f8 Stats: 104 lines in 1 file changed: 104 ins; 0 del; 0 mod 8312420: Integrate Graal's blender micro benchmark Reviewed-by: dnsimon, thartmann, ksakata ------------- PR: https://git.openjdk.org/jdk/pull/14941 From vkempik at openjdk.org Tue Aug 1 11:48:50 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 1 Aug 2023 11:48:50 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint In-Reply-To: References: Message-ID: <7tSk1XE5GNwLWxTCIP8UKD0KlOHzA0ftpWADEyYle8Y=.a6c7d403-365a-451f-a9b2-30d3f41cf33b@github.com> On Mon, 24 Jul 2023 08:22:52 GMT, Ilya Gavrilin wrote: > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Hello, anyone with c2 knowledges can please take a look ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14991#issuecomment-1660148250 From shade at openjdk.org Tue Aug 1 12:26:56 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Aug 2023 12:26:56 GMT Subject: RFR: 8313248: C2: setScopedValueCache intrinsic exposes nullptr pre-values to store barriers Message-ID: See the bug for investigation breadcrumbs. The root cause for failures seen with Shenandoah seem to be as follows. The setter (`setScopedValueCache`) intrinsic passes `val_type` of `_gvn.type(arr)`, which is `narrowoop: java/lang/Object *[int:32] (java/lang/Cloneable,java/io/Serializable):NotNull:exact *`, derived from the `argument(0)`, and thus implies non-nullity. So when Shenandoah's SATB barrier loads the `pre_val`, it folds the null-check, assuming the `pre_val` is not null, due to `val_type`. This passes `nullptr` to SATB queues or slowpath, and we crash in either queue filtering or barrier code that does not expect nullptrs on SATB paths. The getter (`scopedValueCache`) constructs the `objects_type` explicitly to imply the value can be null. I think we should do the same for setter, since it can hide the "getter" from SATB barrier inside of it. Arguably, it is a landmine that GC barriers assume the `val_type` is the type of both stored value and the pre-value read from memory. So the non-null-ness derived for stored value gets used to reason for non-null-ness for pre-value. We can explore the solutions to that generic problem after we plug this leak. Other `access_store_at` uses in C2 intrinsics seem to only operate on thread fields that are not null, so the are not susceptible to this problem. `scopedValueCache` is a notable exception of lazily initialized thread OopHandle accessed from C2. I think G1 SATB barriers have the same problem, but I have not tried to reproduce the failure very hard there. (It would, AFAIU, require writing the test which does G1 concurrent marks, not just young GCs.) Attn @theRealAph ;) Additional testing: - [x] Linux x86_64 fastdebug, 10+ iterations of `java/lang/ScopedValue/StressStackOverflow.java` with Shenandoah - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with Shenandoah - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with G1 - [ ] Linux AArch64 fastdebug, `tier1 tier2 tier3` ------------- Commit messages: - Proper fix - Trying to pin more - Reverts - Debugging Changes: https://git.openjdk.org/jdk/pull/15105/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15105&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313248 Stats: 20 lines in 2 files changed: 10 ins; 8 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15105.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15105/head:pull/15105 PR: https://git.openjdk.org/jdk/pull/15105 From jvernee at openjdk.org Tue Aug 1 13:04:40 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 1 Aug 2023 13:04:40 GMT Subject: RFR: 8313406: nep_invoker_blob can be simplified more In-Reply-To: References: Message-ID: <8o90m_870sJwVWPZBRch9CR-aPOnaZzGP6DxjBexXcI=.294edec1-2355-465a-91d1-b384e9ff104f@github.com> On Mon, 31 Jul 2023 12:22:00 GMT, Yasumasa Suenaga wrote: > In FFM, native function would be called via `nep_invoker_blob`. If the function has two arguments, it would be following: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fcae394cd10 > -------------------------------------------------------------------------------- > 0x00007fcae394cd80: pushq %rbp > 0x00007fcae394cd81: movq %rsp, %rbp > 0x00007fcae394cd84: subq $0, %rsp > ;; { argument shuffle > 0x00007fcae394cd88: movq %r8, %rax > 0x00007fcae394cd8b: movq %rsi, %r10 > 0x00007fcae394cd8e: movq %rcx, %rsi > 0x00007fcae394cd91: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fcae394cd94: callq *%r10 > 0x00007fcae394cd97: leave > 0x00007fcae394cd98: retq > > > `subq $0, %rsp` is for shadow space on stack, and `movq %r8, %rax` is number of args for variadic function. So they are not necessary in some case. They should be remove following if they are not needed: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fd8778e2810 > -------------------------------------------------------------------------------- > 0x00007fd8778e2880: pushq %rbp > 0x00007fd8778e2881: movq %rsp, %rbp > ;; { argument shuffle > 0x00007fd8778e2884: movq %rsi, %r10 > 0x00007fd8778e2887: movq %rcx, %rsi > 0x00007fd8778e288a: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fd8778e288d: callq *%r10 > 0x00007fd8778e2890: leave > 0x00007fd8778e2891: retq > > > All java/foreign jtreg tests are passed. > > We can see these stub code on [ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/examples/cpumodel) with `-XX:+UnlockDiagnosticVMOptions -XX:+PrintStubCode` and hsdis library. This testcase linked the code with `Linker.Option.isTrivial()`. > > After this change, FFM performance on [another ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/benchmarks/funccall) was improved: > > before: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 106664071.816 ? 14396524.718 ops/s > FuncCallComparison.rdtsc thrpt 3 108024079.738 ? 13223921.011 ops/s > > > after: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 107622971.525 ? 12249767.134 ops/s > FuncCallComparison.rdtsc thrpt 3 107695741.608 ? 23983281.346 ops/s > > > Environment: > * CPU: AMD Ryzen 3 3300X > * OS: Fedora 38 x86_64 (Kernel 6.3.8-200.fc38.x86_64) > * Hyper-V 4vCPU, 8GB mem All green ------------- Marked as reviewed by jvernee (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15089#pullrequestreview-1556945814 From jvernee at openjdk.org Tue Aug 1 15:46:21 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 1 Aug 2023 15:46:21 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API Message-ID: This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on unsupported platforms. All tests are turned on by default, instead of being skipped when the linker is not present. 10. https://github.com/openjdk/panama-foreign/pull/851 Minor code tweaks to ensure the JIT can constant fold through native access checks if the accessing class is statically known (see commit/original PR for changes). 11. https://github.com/openjdk/panama-foreign/pull/853 Remove all the `@PreviewFeature` annotations from the API. Update all `@since` tags in the Javadoc to `@since 22` (per [JEP 12](https://bugs.openjdk.org/browse/JDK-8195734)). Update tests and benchmarks to no longer build and run using `--enable-preview` (or the `@enablePreview` jtreg tag). I want to call out in particular that this patch finalizes the FFM API (by moving it out of preview), and requires all JDK implementations to implement it. Most ports already have full FFM API support. The ones that are missing are: s390 ([currently under review](https://github.com/openjdk/jdk/pull/14801)), windows-x86 (deprecated for removal), and linux-86 & arm32 (which can both be implemented using the fallback linker [1](https://github.com/openjdk/panama-foreign/pull/770) [2](https://mail.openjdk.org/pipermail/porters-dev/2023-March/000753.html)). ------------- Commit messages: - use immutable map for fallback linker canonical layouts - 8313265: Move the FFM API out of preview - 8313005: Ensure native access check can fold away - 8312981: Make the linker API required - 8312615: Ensure jdk_foreign tests pass on linux-x86 - 8312186: TestStringEncodingFails for UTF-32 - 8312059: Clarify the documention for variadic functions - 8311533: SegmentAllocator::allocateArray call can be ambiguous - 8310893: VarHandleTestExact fails - 8310820: Remove MemorySegment::segmentOffset - ... and 4 more: https://git.openjdk.org/jdk/compare/6fca2898...b0a1abaf Changes: https://git.openjdk.org/jdk/pull/15103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312522 Stats: 2444 lines in 224 files changed: 1084 ins; 721 del; 639 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From jvernee at openjdk.org Tue Aug 1 15:46:21 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 1 Aug 2023 15:46:21 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 10:29:06 GMT, Jorn Vernee wrote: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Open build question: the `jdk_foreign` tests will fail if a JDK does not have a full linker port, and is not built with `--enable-fallback-linker`. If the API is to become required, as proposed by this PR, then should the build also require the fallback linker by default on platforms that do not have a full linker port? FWIW, GHA linux-x86 currently fail tests, since it is not being built with `--enable-fallback-linker`. I think this could be easily [resolved](https://github.com/JornVernee/jdk/commit/6577c3725f79375c1df8c4af70925d8ac8dec9a2), but attempting this currently [fails](https://github.com/JornVernee/jdk/actions/runs/5726155097/job/15516044665) when trying to install GCC: Package gcc-10-multilib is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is only available from another source This error occurs even on a plain master branch, so it doesn't seem to be caused by this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1660052156 PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1660353141 From duke at openjdk.org Tue Aug 1 15:46:21 2023 From: duke at openjdk.org (ExE Boss) Date: Tue, 1 Aug 2023 15:46:21 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 10:29:06 GMT, Jorn Vernee wrote: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... src/java.base/share/classes/jdk/internal/foreign/abi/fallback/FallbackLinker.java line 299: > 297: @Override > 298: public Map canonicalLayouts() { > 299: return CANONICAL_LAYOUTS; `CANONICAL_LAYOUTS` is?set to?a?`HashMap`, which?is?not?unmodifiable. Suggestion: return Collections.unmodifiableMap(CANONICAL_LAYOUTS); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1280480570 From jvernee at openjdk.org Tue Aug 1 15:46:21 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 1 Aug 2023 15:46:21 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 11:17:57 GMT, ExE Boss wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > src/java.base/share/classes/jdk/internal/foreign/abi/fallback/FallbackLinker.java line 299: > >> 297: @Override >> 298: public Map canonicalLayouts() { >> 299: return CANONICAL_LAYOUTS; > > `CANONICAL_LAYOUTS` is?set to?a?`HashMap`, which?is?not?unmodifiable. > Suggestion: > > return Collections.unmodifiableMap(CANONICAL_LAYOUTS); Good catch. I think the right fix is to update FallbackLinker though. The other ports already use `Map.of`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1280665030 From shade at openjdk.org Tue Aug 1 16:10:12 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Aug 2023 16:10:12 GMT Subject: RFR: 8313402: C1: Incorrect LoadIndexed value numbering [v2] In-Reply-To: References: Message-ID: > See the bug for more investigation. > > This manifests in current tests, if you run them with C1. The root cause for the failure is that we fold two `LoadIndexed` nodes, when one of them reads `char` from `byte[]` via `_getCharStringU` intrinsic, and another one reads `byte` normally. So we can fold the "char"-reading load with "byte" reading load, effectively reading the wrong thing. New regression test shows it: it would read "42" instead of full char. > > > $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:TieredStopAtLevel=1 -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintIR0 -XX:+PrintValueNumbering Test8313402.java > > . 8 0 i152 a141[i110](i144) (B) [rc] > ... > . 7 0 i180 a162[i110] (C) > ... > Value Numbering: LoadIndexed i180 equal to i152 (size 47, entries 27, nesting-diff 0) > ``` > > GVN hash discriminates on `type()->tag()`, but that `ValueType` maps to the same `T_INT` for both `char` and `byte`! Instead of hashing on that, let's hash on the original element type instead. > > Testing: > - [x] New regression test fails without the fix, passes after the fix > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8313402-c1-gvn-loadindexed - Initial fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15091/files - new: https://git.openjdk.org/jdk/pull/15091/files/02827351..d9c210b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15091&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15091&range=00-01 Stats: 4522 lines in 92 files changed: 2213 ins; 1029 del; 1280 mod Patch: https://git.openjdk.org/jdk/pull/15091.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15091/head:pull/15091 PR: https://git.openjdk.org/jdk/pull/15091 From shade at openjdk.org Tue Aug 1 16:10:27 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Aug 2023 16:10:27 GMT Subject: RFR: 8313248: C2: setScopedValueCache intrinsic exposes nullptr pre-values to store barriers [v2] In-Reply-To: References: Message-ID: > See the bug for investigation breadcrumbs. The root cause for failures seen with Shenandoah seem to be as follows. > > The setter (`setScopedValueCache`) intrinsic passes `val_type` of `_gvn.type(arr)`, which is `narrowoop: java/lang/Object *[int:32] (java/lang/Cloneable,java/io/Serializable):NotNull:exact *`, derived from the `argument(0)`, and thus implies non-nullity. > > So when Shenandoah's SATB barrier loads the `pre_val`, it folds the null-check, assuming the `pre_val` is not null, due to `val_type`. This passes `nullptr` to SATB queues or slowpath, and we crash in either queue filtering or barrier code that does not expect nullptrs on SATB paths. The getter (`scopedValueCache`) constructs the `objects_type` explicitly to imply the value can be null. I think we should do the same for setter, since it can hide the "getter" from SATB barrier inside of it. > > Arguably, it is a landmine that GC barriers assume the `val_type` is the type of both stored value and the pre-value read from memory. So the non-null-ness derived for stored value gets used to reason for non-null-ness for pre-value. We can explore the solutions to that generic problem after we plug this leak. Other `access_store_at` uses in C2 intrinsics seem to only operate on thread fields that are not null, so the are not susceptible to this problem. `scopedValueCache` is a notable exception of lazily initialized thread OopHandle accessed from C2. > > I think G1 SATB barriers have the same problem, but I have not tried to reproduce the failure very hard there. (It would, AFAIU, require writing the test which does G1 concurrent marks, not just young GCs.) > > Attn @theRealAph ;) > > Additional testing: > - [x] Linux x86_64 fastdebug, 10+ iterations of `java/lang/ScopedValue/StressStackOverflow.java` with Shenandoah > - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with Shenandoah > - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with G1 > - [ ] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into JDK-8313248-shenandoah-nullcheck - Proper fix - Trying to pin more - Reverts - Debugging ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15105/files - new: https://git.openjdk.org/jdk/pull/15105/files/baf5a197..ff3ad44b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15105&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15105&range=00-01 Stats: 289 lines in 9 files changed: 128 ins; 6 del; 155 mod Patch: https://git.openjdk.org/jdk/pull/15105.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15105/head:pull/15105 PR: https://git.openjdk.org/jdk/pull/15105 From phh at openjdk.org Tue Aug 1 16:15:49 2023 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 1 Aug 2023 16:15:49 GMT Subject: RFR: 8313402: C1: Incorrect LoadIndexed value numbering [v2] In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 16:10:12 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. >> >> This manifests in current tests, if you run them with C1. The root cause for the failure is that we fold two `LoadIndexed` nodes, when one of them reads `char` from `byte[]` via `_getCharStringU` intrinsic, and another one reads `byte` normally. So we can fold the "char"-reading load with "byte" reading load, effectively reading the wrong thing. New regression test shows it: it would read "42" instead of full char. >> >> >> $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:TieredStopAtLevel=1 -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintIR0 -XX:+PrintValueNumbering Test8313402.java >> >> . 8 0 i152 a141[i110](i144) (B) [rc] >> ... >> . 7 0 i180 a162[i110] (C) >> ... >> Value Numbering: LoadIndexed i180 equal to i152 (size 47, entries 27, nesting-diff 0) >> ``` >> >> GVN hash discriminates on `type()->tag()`, but that `ValueType` maps to the same `T_INT` for both `char` and `byte`! Instead of hashing on that, let's hash on the original element type instead. >> >> Testing: >> - [x] New regression test fails without the fix, passes after the fix >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8313402-c1-gvn-loadindexed > - Initial fix Looks like the right thing to do. ------------- Marked as reviewed by phh (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15091#pullrequestreview-1557373569 From duke at openjdk.org Tue Aug 1 17:42:54 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 1 Aug 2023 17:42:54 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v13] In-Reply-To: References: Message-ID: <2OFyLMeiiFJdZDD-BKUcaW4lfaeis9VG4jALBQlYbOc=.f3ccd800-05e4-4dec-8980-ddf3392e1cc9@github.com> On Sun, 30 Jul 2023 08:24:20 GMT, Andrew Haley wrote: >> src/java.base/share/classes/java/util/Arrays.java line 100: >> >>> 98: else if (elemType == float.class) DualPivotQuicksort.sort((float[]) array, 0, fromIndex, toIndex); >>> 99: else if (elemType == double.class) DualPivotQuicksort.sort((double[]) array, 0, fromIndex, toIndex); >>> 100: else throw new UnsupportedOperationException("arraySort intrinsic not supported for this type: " + elemType.toString()); >> >> I'm curious if there is a performance difference using switch pattern on element type that would generate an `invokedynamic typeSwitch` over the primitive array types e.g.: >> >> Suggestion: >> >> switch (array) { >> case int[] arr -> DualPivotQuicksort.sort(arr, 0, fromIndex, toIndex); >> case long[] arr -> DualPivotQuicksort.sort(arr, 0, fromIndex, toIndex); >> case float[] arr -> DualPivotQuicksort.sort(arr, 0, fromIndex, toIndex); >> case double[] arr -> DualPivotQuicksort.sort(arr, 0, fromIndex, toIndex); >> default -> throw new UnsupportedOperationException( >> "arraySort intrinsic not supported for this type: " + elemType); >> } > > What is the reasoning behind this new public API? It doesn't follow the usual Java convention, which is to have overloads for each type. And it doesn't seem to provide anything not already provided by `Arrays.sort()`. Hi Andrew, the reason for the public API is to make AVX512 sort available to other data structures like MemorySegment (including the ones backed by native heap). The API of the arraySort() AVX512 intrinsic is similar to the public API of ArraysSupport.vectorizedMismatch() which is used by MemorySegment.mismatch(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1280953141 From duke at openjdk.org Tue Aug 1 17:42:57 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 1 Aug 2023 17:42:57 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v6] In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 19:07:12 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> fix license in one file > > test/micro/org/openjdk/bench/java/util/ArraysSort.java line 104: > >> 102: @Benchmark >> 103: public void floatSort() throws Throwable { >> 104: floats_sorted = floats_unsorted.clone(); > > We can move clone out of benchmarking methods into per inovcation setup. This suggestion was incorporated. Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1280954741 From duke at openjdk.org Tue Aug 1 17:54:05 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 1 Aug 2023 17:54:05 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update src/java.base/share/classes/java/util/Arrays.java Co-authored-by: David Schlosnagle ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/240fde18..17b51270 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=12-13 Stats: 8 lines in 1 file changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Tue Aug 1 17:54:06 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 1 Aug 2023 17:54:06 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v13] In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 20:30:31 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add special cases to float and double arrays @schlosna Thanks David for suggesting a more elegant way by using switch()! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1660810552 From duke at openjdk.org Tue Aug 1 17:54:07 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 1 Aug 2023 17:54:07 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v6] In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 19:02:45 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> fix license in one file > > test/micro/org/openjdk/bench/java/util/ArraysSort.java line 85: > >> 83: ints_unsorted[i] = rnd.nextInt(); >> 84: longs_unsorted[i] = rnd.nextLong(); >> 85: floats_unsorted[i] = rnd.nextFloat(); > > Can you also introduce NaN, Infinity, +0.0, -0.0 in input floating point arrays. This suggestion was also incorporated. Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1280963197 From duke at openjdk.org Tue Aug 1 18:51:45 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Tue, 1 Aug 2023 18:51:45 GMT Subject: RFR: 8312596: Null pointer access in Compile::TracePhase::~TracePhase after JDK-8311976 [v4] In-Reply-To: References: Message-ID: <5c6Q9_81hLki7rSqyQy8KPIvTY9bp28i15nPawvDh5c=.4d051bc7-b030-4bb4-8de8-79951b3698d7@github.com> On Fri, 28 Jul 2023 15:04:36 GMT, Aleksey Shipilev wrote: >> Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test config to use vm.compiler2.enabled >> >> Signed-off-by: Ashutosh Mehra > > Marked as reviewed by shade (Reviewer). @shipilev can you please sponsor this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15002#issuecomment-1660893950 From shade at openjdk.org Tue Aug 1 19:30:02 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Aug 2023 19:30:02 GMT Subject: RFR: 8312596: Null pointer access in Compile::TracePhase::~TracePhase after JDK-8311976 [v5] In-Reply-To: References: Message-ID: On Fri, 28 Jul 2023 16:41:18 GMT, Ashutosh Mehra wrote: >> Please review this PR to fix a potential null pointer access in using `_compile`. >> Updated the code to unconditionally initialize `_compile` and added an assert (similar to C1's `PhaseTraceTime` constructor) for it to be non-null. > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Add -XX:-TieredCompilation in test config > > Signed-off-by: Ashutosh Mehra Yes! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15002#issuecomment-1660947570 From duke at openjdk.org Tue Aug 1 19:30:04 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Tue, 1 Aug 2023 19:30:04 GMT Subject: Integrated: 8312596: Null pointer access in Compile::TracePhase::~TracePhase after JDK-8311976 In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 18:16:19 GMT, Ashutosh Mehra wrote: > Please review this PR to fix a potential null pointer access in using `_compile`. > Updated the code to unconditionally initialize `_compile` and added an assert (similar to C1's `PhaseTraceTime` constructor) for it to be non-null. This pull request has now been integrated. Changeset: 7ba8c69a Author: Ashutosh Mehra Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/7ba8c69a2cb094f124234fef5a0f7ac98993c1a4 Stats: 44 lines in 2 files changed: 41 ins; 1 del; 2 mod 8312596: Null pointer access in Compile::TracePhase::~TracePhase after JDK-8311976 Reviewed-by: chagedorn, dlong, shade ------------- PR: https://git.openjdk.org/jdk/pull/15002 From jvernee at openjdk.org Wed Aug 2 02:18:00 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 2 Aug 2023 02:18:00 GMT Subject: RFR: 8313406: nep_invoker_blob can be simplified more In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 12:22:00 GMT, Yasumasa Suenaga wrote: > In FFM, native function would be called via `nep_invoker_blob`. If the function has two arguments, it would be following: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fcae394cd10 > -------------------------------------------------------------------------------- > 0x00007fcae394cd80: pushq %rbp > 0x00007fcae394cd81: movq %rsp, %rbp > 0x00007fcae394cd84: subq $0, %rsp > ;; { argument shuffle > 0x00007fcae394cd88: movq %r8, %rax > 0x00007fcae394cd8b: movq %rsi, %r10 > 0x00007fcae394cd8e: movq %rcx, %rsi > 0x00007fcae394cd91: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fcae394cd94: callq *%r10 > 0x00007fcae394cd97: leave > 0x00007fcae394cd98: retq > > > `subq $0, %rsp` is for shadow space on stack, and `movq %r8, %rax` is number of args for variadic function. So they are not necessary in some case. They should be remove following if they are not needed: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fd8778e2810 > -------------------------------------------------------------------------------- > 0x00007fd8778e2880: pushq %rbp > 0x00007fd8778e2881: movq %rsp, %rbp > ;; { argument shuffle > 0x00007fd8778e2884: movq %rsi, %r10 > 0x00007fd8778e2887: movq %rcx, %rsi > 0x00007fd8778e288a: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fd8778e288d: callq *%r10 > 0x00007fd8778e2890: leave > 0x00007fd8778e2891: retq > > > All java/foreign jtreg tests are passed. > > We can see these stub code on [ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/examples/cpumodel) with `-XX:+UnlockDiagnosticVMOptions -XX:+PrintStubCode` and hsdis library. This testcase linked the code with `Linker.Option.isTrivial()`. > > After this change, FFM performance on [another ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/benchmarks/funccall) was improved: > > before: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 106664071.816 ? 14396524.718 ops/s > FuncCallComparison.rdtsc thrpt 3 108024079.738 ? 13223921.011 ops/s > > > after: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 107622971.525 ? 12249767.134 ops/s > FuncCallComparison.rdtsc thrpt 3 107695741.608 ? 23983281.346 ops/s > > > Environment: > * CPU: AMD Ryzen 3 3300X > * OS: Fedora 38 x86_64 (Kernel 6.3.8-200.fc38.x86_64) > * Hyper-V 4vCPU, 8GB mem FWIW, if you want to look into reducing the generated code further, I think we can potentially reduce the amount of shuffling between registers that's needed by reordering the arguments on the Java side so that each VMStorage corresponding to an argument of the leaf method handle is the same as the register for that argument in the Java calling convention. I think the right place to do this is in DowncallLinker where we are creating the NativeEntryPoint. The way I think it should work: 1. compute the Java calling convention's argument registers for the leaf method type. 2. compute a re-ordered VMStorage[] for the arguments, and a re-ordered method type, such that the VMStorage/type for a particular argument index matches the register for the same index used in the Java calling convention as much as possible. 3. use these 2 to create the native entry point + native method handle 4. apply the same reordering to the created native method handle (using MethodHandles::permuteArguments) so that the resulting method handle has the original argument order/method type. Pushing this shuffling to the Java side will allow the JIT to reduce data motion, and this should result in reduced shuffling being needed overall I think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15089#issuecomment-1661382669 From thartmann at openjdk.org Wed Aug 2 05:19:57 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Aug 2023 05:19:57 GMT Subject: [jdk21] RFR: 8313023: Return value corrupted when using CCS + isTrivial (mainline) In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 08:13:55 GMT, Jorn Vernee wrote: > Hi all, > > This pull request contains a backport of commit [6fca2898](https://github.com/openjdk/jdk/commit/6fca28988794b52a6aa974bed1ed6f4f07e0994b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jorn Vernee on 31 Jul 2023 and was reviewed by Maurizio Cimadamore and Vladimir Ivanov. > > Thanks! Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/150#pullrequestreview-1558210925 From thartmann at openjdk.org Wed Aug 2 05:21:41 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Aug 2023 05:21:41 GMT Subject: RFR: JDK-8312617: SIGSEGV in ConnectionGraph::verify_ram_nodes [v3] In-Reply-To: References: Message-ID: On Fri, 28 Jul 2023 15:52:10 GMT, Cesar Soares Lucas wrote: >> - Return early from `verify_ram_nodes` if compilation is already failing. >> - Add back check for `failing()` after `eliminate_macro_nodes()`. >> - Print additional diagnostic information when an unexpected user of RAM is encountered. >> >> Tested with tier1-3 on Linux x64. > > Cesar Soares Lucas has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/compile.cpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/share/opto/escape.cpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/share/opto/escape.cpp > > Co-authored-by: Tobias Hartmann All tests passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15048#pullrequestreview-1558212470 From thartmann at openjdk.org Wed Aug 2 07:34:59 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Aug 2023 07:34:59 GMT Subject: RFR: 8313402: C1: Incorrect LoadIndexed value numbering [v2] In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 16:10:12 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. >> >> This manifests in current tests, if you run them with C1. The root cause for the failure is that we fold two `LoadIndexed` nodes, when one of them reads `char` from `byte[]` via `_getCharStringU` intrinsic, and another one reads `byte` normally. So we can fold the "char"-reading load with "byte" reading load, effectively reading the wrong thing. New regression test shows it: it would read "42" instead of full char. >> >> >> $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:TieredStopAtLevel=1 -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintIR0 -XX:+PrintValueNumbering Test8313402.java >> >> . 8 0 i152 a141[i110](i144) (B) [rc] >> ... >> . 7 0 i180 a162[i110] (C) >> ... >> Value Numbering: LoadIndexed i180 equal to i152 (size 47, entries 27, nesting-diff 0) >> ``` >> >> GVN hash discriminates on `type()->tag()`, but that `ValueType` maps to the same `T_INT` for both `char` and `byte`! Instead of hashing on that, let's hash on the original element type instead. >> >> Testing: >> - [x] New regression test fails without the fix, passes after the fix >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8313402-c1-gvn-loadindexed > - Initial fix Nice analysis, the fix looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15091#pullrequestreview-1558380239 From shade at openjdk.org Wed Aug 2 08:00:13 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Aug 2023 08:00:13 GMT Subject: RFR: 8313248: C2: setScopedValueCache intrinsic exposes nullptr pre-values to store barriers [v2] In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 07:47:09 GMT, Tobias Hartmann wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8313248-shenandoah-nullcheck >> - Proper fix >> - Trying to pin more >> - Reverts >> - Debugging > > src/hotspot/share/opto/library_call.cpp line 3591: > >> 3589: const Type* LibraryCallKit::scopedValueCache_type() { >> 3590: ciKlass *objects_klass = ciObjArrayKlass::make(env()->Object_klass()); >> 3591: const TypeOopPtr *etype = TypeOopPtr::make_from_klass(env()->Object_klass()); > > Suggestion: > > ciKlass* objects_klass = ciObjArrayKlass::make(env()->Object_klass()); > const TypeOopPtr* etype = TypeOopPtr::make_from_klass(env()->Object_klass()); Done so, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15105#discussion_r1281534065 From shade at openjdk.org Wed Aug 2 08:00:12 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Aug 2023 08:00:12 GMT Subject: RFR: 8313248: C2: setScopedValueCache intrinsic exposes nullptr pre-values to store barriers [v3] In-Reply-To: References: Message-ID: > See the bug for investigation breadcrumbs. The root cause for failures seen with Shenandoah seem to be as follows. > > The setter (`setScopedValueCache`) intrinsic passes `val_type` of `_gvn.type(arr)`, which is `narrowoop: java/lang/Object *[int:32] (java/lang/Cloneable,java/io/Serializable):NotNull:exact *`, derived from the `argument(0)`, and thus implies non-nullity. > > So when Shenandoah's SATB barrier loads the `pre_val`, it folds the null-check, assuming the `pre_val` is not null, due to `val_type`. This passes `nullptr` to SATB queues or slowpath, and we crash in either queue filtering or barrier code that does not expect nullptrs on SATB paths. The getter (`scopedValueCache`) constructs the `objects_type` explicitly to imply the value can be null. I think we should do the same for setter, since it can hide the "getter" from SATB barrier inside of it. > > Arguably, it is a landmine that GC barriers assume the `val_type` is the type of both stored value and the pre-value read from memory. So the non-null-ness derived for stored value gets used to reason for non-null-ness for pre-value. We can explore the solutions to that generic problem after we plug this leak. Other `access_store_at` uses in C2 intrinsics seem to only operate on thread fields that are not null, so the are not susceptible to this problem. `scopedValueCache` is a notable exception of lazily initialized thread OopHandle accessed from C2. > > I think G1 SATB barriers have the same problem, but I have not tried to reproduce the failure very hard there. (It would, AFAIU, require writing the test which does G1 concurrent marks, not just young GCs.) > > Attn @theRealAph ;) > > Additional testing: > - [x] Linux x86_64 fastdebug, 10+ iterations of `java/lang/ScopedValue/StressStackOverflow.java` with Shenandoah > - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with Shenandoah > - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with G1 > - [ ] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Move the stars ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15105/files - new: https://git.openjdk.org/jdk/pull/15105/files/ff3ad44b..a2452082 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15105&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15105&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15105.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15105/head:pull/15105 PR: https://git.openjdk.org/jdk/pull/15105 From shade at openjdk.org Wed Aug 2 08:31:52 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Aug 2023 08:31:52 GMT Subject: RFR: 8313402: C1: Incorrect LoadIndexed value numbering [v2] In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 07:31:47 GMT, Tobias Hartmann wrote: > Nice analysis, the fix looks good to me! Thanks Tobias! I wonder if it makes sense to wait for more reviewers, or the fix is simple enough to go in now? Not sure if other compiler folks are on vacations or not :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15091#issuecomment-1661744041 From pli at openjdk.org Wed Aug 2 08:51:19 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 2 Aug 2023 08:51:19 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v2] In-Reply-To: References: Message-ID: > This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. > > To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. > > Tested various of compiler control related VM flags on x86 and AArch64. Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: Re-work correctness check to allow "-Xbatch" ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15011/files - new: https://git.openjdk.org/jdk/pull/15011/files/014d7511..ac509680 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15011&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15011&range=00-01 Stats: 68 lines in 23 files changed: 6 ins; 22 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/15011.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15011/head:pull/15011 PR: https://git.openjdk.org/jdk/pull/15011 From pli at openjdk.org Wed Aug 2 08:55:55 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 2 Aug 2023 08:55:55 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 19:55:19 GMT, Vladimir Kozlov wrote: >> `LoopArrayIndexComputeTest.java` fails on x64 with `-XX:UseAVX=0 -XX:UseSSE=3` and with `-XX:UseAVX=0 -XX:UseSSE=2`: >> >> >> Failed IR Rules (1) of Methods (1) >> ---------------------------------- >> 1) Method "public byte[] compiler.vectorization.runner.LoopArrayIndexComputeTest.byteArrayWithDependencePos()" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, applyIf={"AlignVector", "false"}, applyIfCPUFeature={}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" >> - Failed comparison: [found] 0 > 0 [given] >> - No nodes matched! > >> Hi @TobiHartmann , >> >> > `LoopArrayIndexComputeTest.java` fails on x64 with `-XX:UseAVX=0 -XX:UseSSE=3` and with `-XX:UseAVX=0 -XX:UseSSE=2`: >> >> I just tried this on multiple x86 machines we have but didn't reproduce the failure. Could you share more info (cpu features, etc.) of your test machine so I can find why the test method is not vectorized. > > The test has Byte multiply operation but it checks only presence of SSE2: > `` > @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, > ... > res[i] *= bytes[i + 3]; > `` > But we don't support it with SSE < 4 - see `Matcher::match_rule_supported()`. > > `vm.flagless` prevented running it before with such flags combination. > > I am not sure how your testing passed. Hi @vnkozlov , I re-worked the correctness check in my latest commit. This time we get reference results from C1 execution so we can allow "-Xbatch" and remove the time-based check now. But we have to add another check of tiered compilation for C1 execution. The failed case is also fixed in my latest commit. Could you help re-review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1661805054 From thartmann at openjdk.org Wed Aug 2 08:00:13 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Aug 2023 08:00:13 GMT Subject: RFR: 8313248: C2: setScopedValueCache intrinsic exposes nullptr pre-values to store barriers [v2] In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 16:10:27 GMT, Aleksey Shipilev wrote: >> See the bug for investigation breadcrumbs. The root cause for failures seen with Shenandoah seem to be as follows. >> >> The setter (`setScopedValueCache`) intrinsic passes `val_type` of `_gvn.type(arr)`, which is `narrowoop: java/lang/Object *[int:32] (java/lang/Cloneable,java/io/Serializable):NotNull:exact *`, derived from the `argument(0)`, and thus implies non-nullity. >> >> So when Shenandoah's SATB barrier loads the `pre_val`, it folds the null-check, assuming the `pre_val` is not null, due to `val_type`. This passes `nullptr` to SATB queues or slowpath, and we crash in either queue filtering or barrier code that does not expect nullptrs on SATB paths. The getter (`scopedValueCache`) constructs the `objects_type` explicitly to imply the value can be null. I think we should do the same for setter, since it can hide the "getter" from SATB barrier inside of it. >> >> Arguably, it is a landmine that GC barriers assume the `val_type` is the type of both stored value and the pre-value read from memory. So the non-null-ness derived for stored value gets used to reason for non-null-ness for pre-value. We can explore the solutions to that generic problem after we plug this leak. Other `access_store_at` uses in C2 intrinsics seem to only operate on thread fields that are not null, so the are not susceptible to this problem. `scopedValueCache` is a notable exception of lazily initialized thread OopHandle accessed from C2. >> >> I think G1 SATB barriers have the same problem, but I have not tried to reproduce the failure very hard there. (It would, AFAIU, require writing the test which does G1 concurrent marks, not just young GCs.) >> >> Attn @theRealAph ;) >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, 10+ iterations of `java/lang/ScopedValue/StressStackOverflow.java` with Shenandoah >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with Shenandoah >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with G1 >> - [ ] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8313248-shenandoah-nullcheck > - Proper fix > - Trying to pin more > - Reverts > - Debugging Looks good to me. src/hotspot/share/opto/library_call.cpp line 3591: > 3589: const Type* LibraryCallKit::scopedValueCache_type() { > 3590: ciKlass *objects_klass = ciObjArrayKlass::make(env()->Object_klass()); > 3591: const TypeOopPtr *etype = TypeOopPtr::make_from_klass(env()->Object_klass()); Suggestion: ciKlass* objects_klass = ciObjArrayKlass::make(env()->Object_klass()); const TypeOopPtr* etype = TypeOopPtr::make_from_klass(env()->Object_klass()); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15105#pullrequestreview-1558405243 PR Review Comment: https://git.openjdk.org/jdk/pull/15105#discussion_r1281526194 From pli at openjdk.org Wed Aug 2 08:55:55 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 2 Aug 2023 08:55:55 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 19:55:19 GMT, Vladimir Kozlov wrote: >> `LoopArrayIndexComputeTest.java` fails on x64 with `-XX:UseAVX=0 -XX:UseSSE=3` and with `-XX:UseAVX=0 -XX:UseSSE=2`: >> >> >> Failed IR Rules (1) of Methods (1) >> ---------------------------------- >> 1) Method "public byte[] compiler.vectorization.runner.LoopArrayIndexComputeTest.byteArrayWithDependencePos()" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={DEFAULT}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, applyIf={"AlignVector", "false"}, applyIfCPUFeature={}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" >> - Failed comparison: [found] 0 > 0 [given] >> - No nodes matched! > >> Hi @TobiHartmann , >> >> > `LoopArrayIndexComputeTest.java` fails on x64 with `-XX:UseAVX=0 -XX:UseSSE=3` and with `-XX:UseAVX=0 -XX:UseSSE=2`: >> >> I just tried this on multiple x86 machines we have but didn't reproduce the failure. Could you share more info (cpu features, etc.) of your test machine so I can find why the test method is not vectorized. > > The test has Byte multiply operation but it checks only presence of SSE2: > `` > @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, > ... > res[i] *= bytes[i + 3]; > `` > But we don't support it with SSE < 4 - see `Matcher::match_rule_supported()`. > > `vm.flagless` prevented running it before with such flags combination. > > I am not sure how your testing passed. Hi @vnkozlov , I re-worked the correctness check in my latest commit. This time we get reference results from C1 execution so we can allow "-Xbatch" and remove the time-based check now. But we have to add another check of tiered compilation for C1 execution. The failed case is also fixed in my latest commit. Could you help re-review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1661805054 From qamai at openjdk.org Wed Aug 2 09:38:53 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 2 Aug 2023 09:38:53 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz wrote: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... In general I think this approach is very elegant. A point I want to address is the definition of setting a flag, a `test` can be elided after an `and` because the `and` sets the flag according to **the result**, instructions can set flags based on other factors, such as `add` sets OF based on the operation, `bsr` sets ZF based on the input, etc. As a result, I think for those that does not set the flags based on the result, it should be marked separately, and anything else should be marked as cloberring. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1661873540 From qamai at openjdk.org Wed Aug 2 09:41:53 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 2 Aug 2023 09:41:53 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz wrote: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... IR tests can match machine nodes so please do so, ideally for both branching and conditional moving. Thank a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1661878683 From luhenry at openjdk.org Wed Aug 2 09:56:40 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 2 Aug 2023 09:56:40 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 08:22:52 GMT, Ilya Gavrilin wrote: > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14991#pullrequestreview-1558642043 From luhenry at openjdk.org Wed Aug 2 09:56:42 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 2 Aug 2023 09:56:42 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint In-Reply-To: References: Message-ID: On Wed, 26 Jul 2023 09:56:03 GMT, Ilya Gavrilin wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4283: >> >>> 4281: // generating constant (tmp2) >>> 4282: // tmp2 = 100...0000 >>> 4283: addi(mask, zr, 1); >> >> There are other ways to [implement these functions with less instructions](https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1AB9U8lJL6yAngGVG6AMKpaAVxYM9DgDJ4GmADl3ACNMYhAAVgBmUgAHVAVCWwZnNw89eMSbAV9/IJZQ8OiLTCtshiECJmICVPdPLhKy5MrqglzAkLDImIUqmrr0xr62jvzCnoBKC1RXYmR2DgBSACYov2Q3LABqJajHPvxBADoEPewljQBBVfWGTdcdvccWJgIEU/PLm%2BuCAE9YpgsFRtolgP50NtkAhqttgqgXHsAELfb7oWbBejfMyYOjbCDo1yYzDbVSTVEAdhR12220JxO2r2AuyiABFtlQmMEFPiycjUTTtngQRAmSzHPiuHJvN5xc9tqZWUjvMYALJXAIAFWMrIAkgBxUyTcmC2lLKnfWlWoWCABskmMBCF/NNVvp9G2/gA7sZVC6fldrbSmAoWNsAG4uN50EkQVYrKjIcMEY60Y6Q1YRDSkXYrCKNbbEVyxePbEC5lZ7VnEUsQPCTMsVqi1snkqLUwNBkNhyMGGweuMrBNJlPoVO5rM5zMFosloeN%2BNV5vziDe30N8vxmsr%2BttjuWoMAegAVFDcbQIABaDQNqtV7bXnOGSF%2BbbARhhMTbL2YMAcWi0NsYhekwfw8jC4Yku8JIKKwmACgAnKCeDgkBPKzMQNqxK4Ka0seh4HtaqgsuyaCxGBKEMPia6qDmfLtoRuwUqyAqdsQmAEHMVF%2Bgx1zmixfHXO68HXKYVC0AimEEhiHp8nxFqCsJjJMMy96ctyvJ7qxtLCviYryhAUoynK%2BwKjqypqhq2p6oaxjGox5odkGN oEPajrOrxnbWkpNH%2BoxwahhGUb9rG8aJsmqbphO2YTjO6AMKWm5DlW24rPi9YLkOy5pRArb%2Bs53ZBX2Mb4mFI7puOmYxdOObEPFiUVkutY0RuFapel9mef52wnhyEkkFeN4kfej5AQwL5Ue%2B/jEF%2BP5/gBQG0CBYHbBBUEIDBcGIchqEhtsGFYThxx4QRrq0sR95kRRqGrpgPq0aSWmuvx2mFhxXGkn5zGsWiMkiVcpjELM434kpck3ApnZKfpbIclyPK5U9AY6SKMMSoZ0qys88qKhZ6pajqBpGiaXlMU5QZ%2BK5DpOng%2BVBj5d2%2Bn5Z1oT2wUlYOw4RWmGZ5tVeYziwLANYubLtXWrVhS2nXk9ahW9tGA5lRFY6AVVU4C7VQsi8lbLZdRjNkplKzi7uzOk7SvVA6442DbebIjTFz42m%2BH4zYBc3/oBwGgeBTCQds0GgltglIWCVF7QdfjYbh2z4d1F2w1d4cG/ddFI9aL2CWx73ENxX0CVcHDTLQnARLwngcFopCoJwEoKBhCy5lEPCkAQmjF9MADWkTZqXHCSLwLASBo2aV9XtccLwCggNm7dV8XpBwLASBoCwsQxmQFAQGvG/0OExDhval7IJshjAKQWDhngCwAGp4HdADygKV63NC0AQYQzxAwQd6QwR%2BGqH8TgrcAHMGIH8R%2BwRtCYGsCA3ga82CCEfgwWgwCF6X0wMEVwwBHBiFoDPbgvAsCvCMOIDB%2BB2LWDwJBQh1dMCqFgThRYrdKalD/rQPAwQZoQOcFgP%2BBBiB4GHkQ6Y4kVIKHvk/F%2B8CZCCBEGIdgUg5HyCUGoP%2BuhGgGCMCAUwxhzCcO5PAaYqBYjlEIZeR%2BUReCoEgsQIRWAZ6QGmJYWB5R7DjUGJ4KI2YfB%2BE6AUboXAVhxASEkAQXiMhhPKGMLoRQmhuJaP0WoLh6ggB8Qk6hAhWg1FiYE8IwSLDJMicMZJe SJgRBcY3JRJcy4Vz/pPbYAAlXUQhHCXlvoWI%2BkhgDIChNo5kEBBE2y7g2CAuBCAkGblwSYvB55aGNKQDaTAsDhAgN3Xu%2BhOCD1IOPGxnBp6zzbh3RZ/cVj1IwZPOZJzph2MSHYSQQA). Also, given you're not checking for NaN (and neither is it done for other architectures), I assume that this is done before this is called? > > Hi, thanks for your review. > > - About NaN, INF and other special values: > > According to RISC-V ISA paragraph 8.7 for fcvt.l instruction (Table 8.4) we have some special return values: > 1) if we exceed minimum input value, or got -INF it returns (-2^63) > 2) if we exceed maximum input value, or got +INF or NaN it returns (2^63-1) > > (Also, if we exceed maximum/minimum input values on input we have already integer value, > because according to IEEE754 double-precision f.p. format all doubles more +/- 2^52 are already integer values) > > So we need to check if we got -2^63 or 2^63-1 after double->long int conversion, > comment lines from [src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:4288](https://github.com/openjdk/jdk/pull/14991/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR4288-R4291) describes how do we change result (converted_dbl -> converted_dbl_masked) and constant (mask) to check this values. I have tried to check for NaN and +-inf with just one conditional branch. > If we got -2^63 or 2^63-1 we return input value, therefore NaN -> NaN; +/- INF -> +/- INF; double that is already integer stays same as required by the ceil/floor/rint function descriptions. > > - About case when we can use less instructions: > > Of course, we can use a bit less instructions, but the main goal during intrinsic writing was minimizing count of expensive instructions (so instead of flt.d was used integer instructions on converted_dbl etc.) > We already have some cases when less expensive instructions were chosen instead of reducing their number. For example: https://bugs.openjdk.org/browse/JDK-8297359 Generally speaking, I'm a bit concerned we are over-optimizing for HiFive Unmatched (or equivalent "small" boards). I understand this is what we have today, and that there is no great way to project performance otherwise (maybe [llvm-mca](https://www.llvm.org/docs/CommandGuide/llvm-mca.html) but I haven't played with it), I just wish there were already broadly available high-performance RISC-V board (soon...) Happy to go with that approach for now, and we can revisit the specific instruction sequence when we have more powerful boards to benchmark with. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1281677022 From vkempik at openjdk.org Wed Aug 2 10:26:51 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 2 Aug 2023 10:26:51 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 09:52:55 GMT, Ludovic Henry wrote: >> Hi, thanks for your review. >> >> - About NaN, INF and other special values: >> >> According to RISC-V ISA paragraph 8.7 for fcvt.l instruction (Table 8.4) we have some special return values: >> 1) if we exceed minimum input value, or got -INF it returns (-2^63) >> 2) if we exceed maximum input value, or got +INF or NaN it returns (2^63-1) >> >> (Also, if we exceed maximum/minimum input values on input we have already integer value, >> because according to IEEE754 double-precision f.p. format all doubles more +/- 2^52 are already integer values) >> >> So we need to check if we got -2^63 or 2^63-1 after double->long int conversion, >> comment lines from [src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:4288](https://github.com/openjdk/jdk/pull/14991/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR4288-R4291) describes how do we change result (converted_dbl -> converted_dbl_masked) and constant (mask) to check this values. I have tried to check for NaN and +-inf with just one conditional branch. >> If we got -2^63 or 2^63-1 we return input value, therefore NaN -> NaN; +/- INF -> +/- INF; double that is already integer stays same as required by the ceil/floor/rint function descriptions. >> >> - About case when we can use less instructions: >> >> Of course, we can use a bit less instructions, but the main goal during intrinsic writing was minimizing count of expensive instructions (so instead of flt.d was used integer instructions on converted_dbl etc.) >> We already have some cases when less expensive instructions were chosen instead of reducing their number. For example: https://bugs.openjdk.org/browse/JDK-8297359 > > Generally speaking, I'm a bit concerned we are over-optimizing for HiFive Unmatched (or equivalent "small" boards). I understand this is what we have today, and that there is no great way to project performance otherwise (maybe [llvm-mca](https://www.llvm.org/docs/CommandGuide/llvm-mca.html) but I haven't played with it), I just wish there were already broadly available high-performance RISC-V board (soon...) > > Happy to go with that approach for now, and we can revisit the specific instruction sequence when we have more powerful boards to benchmark with. It also should be available on public market, so anyone can reproduce results claimed in PR so far it's only hifive u74 and thead c910, hence we mostly mention results from these two ( but testing a bit more) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1281712339 From thartmann at openjdk.org Wed Aug 2 10:27:49 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Aug 2023 10:27:49 GMT Subject: RFR: 8313402: C1: Incorrect LoadIndexed value numbering [v2] In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 16:10:12 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. >> >> This manifests in current tests, if you run them with C1. The root cause for the failure is that we fold two `LoadIndexed` nodes, when one of them reads `char` from `byte[]` via `_getCharStringU` intrinsic, and another one reads `byte` normally. So we can fold the "char"-reading load with "byte" reading load, effectively reading the wrong thing. New regression test shows it: it would read "42" instead of full char. >> >> >> $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:TieredStopAtLevel=1 -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintIR0 -XX:+PrintValueNumbering Test8313402.java >> >> . 8 0 i152 a141[i110](i144) (B) [rc] >> ... >> . 7 0 i180 a162[i110] (C) >> ... >> Value Numbering: LoadIndexed i180 equal to i152 (size 47, entries 27, nesting-diff 0) >> ``` >> >> GVN hash discriminates on `type()->tag()`, but that `ValueType` maps to the same `T_INT` for both `char` and `byte`! Instead of hashing on that, let's hash on the original element type instead. >> >> Testing: >> - [x] New regression test fails without the fix, passes after the fix >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8313402-c1-gvn-loadindexed > - Initial fix I think this is good to go but I'll run some quick sanity testing and report back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15091#issuecomment-1661953963 From mcimadamore at openjdk.org Wed Aug 2 10:42:50 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 2 Aug 2023 10:42:50 GMT Subject: [jdk21] RFR: 8313023: Return value corrupted when using CCS + isTrivial (mainline) In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 08:13:55 GMT, Jorn Vernee wrote: > Hi all, > > This pull request contains a backport of commit [6fca2898](https://github.com/openjdk/jdk/commit/6fca28988794b52a6aa974bed1ed6f4f07e0994b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jorn Vernee on 31 Jul 2023 and was reviewed by Maurizio Cimadamore and Vladimir Ivanov. > > Thanks! Looks good (already approved in jdk22 and panama repo) ------------- Marked as reviewed by mcimadamore (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/150#pullrequestreview-1558722084 From jvernee at openjdk.org Wed Aug 2 11:02:00 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 2 Aug 2023 11:02:00 GMT Subject: [jdk21] Integrated: 8313023: Return value corrupted when using CCS + isTrivial (mainline) In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 08:13:55 GMT, Jorn Vernee wrote: > Hi all, > > This pull request contains a backport of commit [6fca2898](https://github.com/openjdk/jdk/commit/6fca28988794b52a6aa974bed1ed6f4f07e0994b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jorn Vernee on 31 Jul 2023 and was reviewed by Maurizio Cimadamore and Vladimir Ivanov. > > Thanks! This pull request has now been integrated. Changeset: 20ca0465 Author: Jorn Vernee URL: https://git.openjdk.org/jdk21/commit/20ca0465b59b601d669ff17f19eec5df782f1d27 Stats: 39 lines in 5 files changed: 12 ins; 0 del; 27 mod 8313023: Return value corrupted when using CCS + isTrivial (mainline) Reviewed-by: thartmann, mcimadamore Backport-of: 6fca28988794b52a6aa974bed1ed6f4f07e0994b ------------- PR: https://git.openjdk.org/jdk21/pull/150 From yzheng at openjdk.org Wed Aug 2 11:05:12 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 2 Aug 2023 11:05:12 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v11] In-Reply-To: References: Message-ID: <_xhY05iLfIuABX0G_7UkyrAz6iCnJzZPzWNe09_ryjI=.372ae759-fc49-496d-b1c1-c10ad065987b@github.com> On Wed, 21 Oct 2020 23:42:33 GMT, Fei Yang wrote: >> Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com >> >> This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. >> Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 >> >> Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. >> For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. >> "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. >> >> We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. >> Patch passed jtreg tier1-3 tests with QEMU system emulator. >> Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make sure that there's no regression. >> >> We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java >> We measured the performance benefit with an aarch64 cycle-accurate simulator. >> Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. >> >> For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is detected. > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Add if (isJDK16OrHigher()) check for SHA3 in CheckGraalIntrinsics.java src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3473: > 3471: __ bcax(v24, __ T16B, v24, v8, v31); > 3472: > 3473: __ ld1r(v31, __ T2D, __ post(rscratch1, 8)); is it intentional to load 16 bytes and post-increment by 8? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/207#discussion_r1281749663 From thartmann at openjdk.org Wed Aug 2 11:09:47 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Aug 2023 11:09:47 GMT Subject: RFR: 8313402: C1: Incorrect LoadIndexed value numbering [v2] In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 16:10:12 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. >> >> This manifests in current tests, if you run them with C1. The root cause for the failure is that we fold two `LoadIndexed` nodes, when one of them reads `char` from `byte[]` via `_getCharStringU` intrinsic, and another one reads `byte` normally. So we can fold the "char"-reading load with "byte" reading load, effectively reading the wrong thing. New regression test shows it: it would read "42" instead of full char. >> >> >> $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:TieredStopAtLevel=1 -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintIR0 -XX:+PrintValueNumbering Test8313402.java >> >> . 8 0 i152 a141[i110](i144) (B) [rc] >> ... >> . 7 0 i180 a162[i110] (C) >> ... >> Value Numbering: LoadIndexed i180 equal to i152 (size 47, entries 27, nesting-diff 0) >> ``` >> >> GVN hash discriminates on `type()->tag()`, but that `ValueType` maps to the same `T_INT` for both `char` and `byte`! Instead of hashing on that, let's hash on the original element type instead. >> >> Testing: >> - [x] New regression test fails without the fix, passes after the fix >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8313402-c1-gvn-loadindexed > - Initial fix Testing looks good. Ship it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15091#issuecomment-1662010690 From shade at openjdk.org Wed Aug 2 11:24:02 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Aug 2023 11:24:02 GMT Subject: RFR: 8313402: C1: Incorrect LoadIndexed value numbering [v2] In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 16:10:12 GMT, Aleksey Shipilev wrote: >> See the bug for more investigation. >> >> This manifests in current tests, if you run them with C1. The root cause for the failure is that we fold two `LoadIndexed` nodes, when one of them reads `char` from `byte[]` via `_getCharStringU` intrinsic, and another one reads `byte` normally. So we can fold the "char"-reading load with "byte" reading load, effectively reading the wrong thing. New regression test shows it: it would read "42" instead of full char. >> >> >> $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:TieredStopAtLevel=1 -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintIR0 -XX:+PrintValueNumbering Test8313402.java >> >> . 8 0 i152 a141[i110](i144) (B) [rc] >> ... >> . 7 0 i180 a162[i110] (C) >> ... >> Value Numbering: LoadIndexed i180 equal to i152 (size 47, entries 27, nesting-diff 0) >> ``` >> >> GVN hash discriminates on `type()->tag()`, but that `ValueType` maps to the same `T_INT` for both `char` and `byte`! Instead of hashing on that, let's hash on the original element type instead. >> >> Testing: >> - [x] New regression test fails without the fix, passes after the fix >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8313402-c1-gvn-loadindexed > - Initial fix Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15091#issuecomment-1662027584 From shade at openjdk.org Wed Aug 2 11:24:05 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Aug 2023 11:24:05 GMT Subject: Integrated: 8313402: C1: Incorrect LoadIndexed value numbering In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 16:47:56 GMT, Aleksey Shipilev wrote: > See the bug for more investigation. > > This manifests in current tests, if you run them with C1. The root cause for the failure is that we fold two `LoadIndexed` nodes, when one of them reads `char` from `byte[]` via `_getCharStringU` intrinsic, and another one reads `byte` normally. So we can fold the "char"-reading load with "byte" reading load, effectively reading the wrong thing. New regression test shows it: it would read "42" instead of full char. > > > $ build/macosx-aarch64-server-fastdebug/images/jdk/bin/java -XX:TieredStopAtLevel=1 -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintIR0 -XX:+PrintValueNumbering Test8313402.java > > . 8 0 i152 a141[i110](i144) (B) [rc] > ... > . 7 0 i180 a162[i110] (C) > ... > Value Numbering: LoadIndexed i180 equal to i152 (size 47, entries 27, nesting-diff 0) > ``` > > GVN hash discriminates on `type()->tag()`, but that `ValueType` maps to the same `T_INT` for both `char` and `byte`! Instead of hashing on that, let's hash on the original element type instead. > > Testing: > - [x] New regression test fails without the fix, passes after the fix > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` This pull request has now been integrated. Changeset: 46fbedb2 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/46fbedb2be98a9b8aba042fa9f90c3b25c312cd6 Stats: 59 lines in 2 files changed: 58 ins; 0 del; 1 mod 8313402: C1: Incorrect LoadIndexed value numbering Reviewed-by: phh, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/15091 From duke at openjdk.org Wed Aug 2 12:37:13 2023 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 2 Aug 2023 12:37:13 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v11] In-Reply-To: <_xhY05iLfIuABX0G_7UkyrAz6iCnJzZPzWNe09_ryjI=.372ae759-fc49-496d-b1c1-c10ad065987b@github.com> References: <_xhY05iLfIuABX0G_7UkyrAz6iCnJzZPzWNe09_ryjI=.372ae759-fc49-496d-b1c1-c10ad065987b@github.com> Message-ID: On Wed, 2 Aug 2023 11:02:07 GMT, Yudi Zheng wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Add if (isJDK16OrHigher()) check for SHA3 in CheckGraalIntrinsics.java > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3473: > >> 3471: __ bcax(v24, __ T16B, v24, v8, v31); >> 3472: >> 3473: __ ld1r(v31, __ T2D, __ post(rscratch1, 8)); > > is it intentional to load 16 bytes and post-increment by 8? Actually, with the ld1r instruction the post increment should be the same as the size of the memory accessed. So T2D requires 8 as it reads 8 bytes(and duplicates it into both halves of the SIMD register). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/207#discussion_r1281840355 From ysuenaga at openjdk.org Wed Aug 2 12:38:51 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Wed, 2 Aug 2023 12:38:51 GMT Subject: RFR: 8313406: nep_invoker_blob can be simplified more In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 02:12:43 GMT, Jorn Vernee wrote: >> In FFM, native function would be called via `nep_invoker_blob`. If the function has two arguments, it would be following: >> >> >> Decoding RuntimeStub - nep_invoker_blob 0x00007fcae394cd10 >> -------------------------------------------------------------------------------- >> 0x00007fcae394cd80: pushq %rbp >> 0x00007fcae394cd81: movq %rsp, %rbp >> 0x00007fcae394cd84: subq $0, %rsp >> ;; { argument shuffle >> 0x00007fcae394cd88: movq %r8, %rax >> 0x00007fcae394cd8b: movq %rsi, %r10 >> 0x00007fcae394cd8e: movq %rcx, %rsi >> 0x00007fcae394cd91: movq %rdx, %rdi >> ;; } argument shuffle >> 0x00007fcae394cd94: callq *%r10 >> 0x00007fcae394cd97: leave >> 0x00007fcae394cd98: retq >> >> >> `subq $0, %rsp` is for shadow space on stack, and `movq %r8, %rax` is number of args for variadic function. So they are not necessary in some case. They should be remove following if they are not needed: >> >> >> Decoding RuntimeStub - nep_invoker_blob 0x00007fd8778e2810 >> -------------------------------------------------------------------------------- >> 0x00007fd8778e2880: pushq %rbp >> 0x00007fd8778e2881: movq %rsp, %rbp >> ;; { argument shuffle >> 0x00007fd8778e2884: movq %rsi, %r10 >> 0x00007fd8778e2887: movq %rcx, %rsi >> 0x00007fd8778e288a: movq %rdx, %rdi >> ;; } argument shuffle >> 0x00007fd8778e288d: callq *%r10 >> 0x00007fd8778e2890: leave >> 0x00007fd8778e2891: retq >> >> >> All java/foreign jtreg tests are passed. >> >> We can see these stub code on [ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/examples/cpumodel) with `-XX:+UnlockDiagnosticVMOptions -XX:+PrintStubCode` and hsdis library. This testcase linked the code with `Linker.Option.isTrivial()`. >> >> After this change, FFM performance on [another ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/benchmarks/funccall) was improved: >> >> before: >> >> Benchmark Mode Cnt Score Error Units >> FuncCallComparison.invokeFFMRDTSC thrpt 3 106664071.816 ? 14396524.718 ops/s >> FuncCallComparison.rdtsc thrpt 3 108024079.738 ? 13223921.011 ops/s >> >> >> after: >> >> Benchmark Mode Cnt Score Error Units >> FuncCallComparison.invokeFFMRDTSC thrpt 3 107622971.525 ? 12249767.134 ops/s >> FuncCallComparison.rdtsc thrpt 3 107695741.608 ? 23983281.346 ops/s >> >> >> Environment: >> * CPU: AMD Ry... > > FWIW, if you want to look into reducing the generated code further, I think we can potentially reduce the amount of shuffling between registers that's needed by reordering the arguments on the Java side so that each VMStorage corresponding to an argument of the leaf method handle is the same as the register for that argument in the Java calling convention. > > I think the right place to do this is in DowncallLinker where we are creating the NativeEntryPoint. The way I think it should work: > 1. compute the Java calling convention's argument registers for the leaf method type. > 2. compute a re-ordered VMStorage[] for the arguments, and a re-ordered method type, such that the VMStorage/type for a particular argument index matches the register for the same index used in the Java calling convention as much as possible. > 3. use the re-ordered VMStorage[] + MethodType to create the native entry point + native method handle > 4. apply the same reordering in reverse to the arguments of the created native method handle (using MethodHandles::permuteArguments) so that the resulting method handle has the original argument order/method type. > > Pushing this shuffling to the Java side will allow the JIT to reduce data motion, and this should result in reduced shuffling being needed overall I think. @JornVernee Thanks for your review! I will integrate this when I get second reviewer. > I think we can potentially reduce the amount of shuffling between registers that's needed by reordering the arguments on the Java side so that each VMStorage corresponding to an argument of the leaf method handle is the same as the register for that argument in the Java calling convention. It would be great! I guess you suggested that `ArgumentShuffle` in HotSpot moves into `DowncallLinker`, right? To be honest, I haven't yet understood well about this, and also I do not have other testbed excepting Linux x64. So it is difficult to work for this now. Again, this idea is great. I'd like to call native function via FFM with less overhead. So I'm happy to help if I can. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15089#issuecomment-1662132113 From fjiang at openjdk.org Wed Aug 2 13:41:50 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 2 Aug 2023 13:41:50 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 08:22:52 GMT, Ilya Gavrilin wrote: > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Thanks for the work! With one comment: src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4286: > 4284: slli(mask, mask, 63); > 4285: // conversion from double to long > 4286: fcvt_l_d(converted_dbl, src, rm_direct); How about using `fclass` [1] to check the special cases of input, then we can just do `fcvt.l.d` and `fcvt.d.l` for normal inputs? We can check the result of `fclass`. If the input contains NaN/infinity/+0/-0, we could return the value without conversion. 1. https://github.com/riscv/riscv-isa-manual/blob/3a6edf7ebf6af9e6ad92ace865c0069090870c20/src/f-st-ext.adoc?plain=1#L487-L500 ------------- PR Review: https://git.openjdk.org/jdk/pull/14991#pullrequestreview-1559023432 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1281913449 From yzheng at openjdk.org Wed Aug 2 14:10:13 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 2 Aug 2023 14:10:13 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v11] In-Reply-To: References: <_xhY05iLfIuABX0G_7UkyrAz6iCnJzZPzWNe09_ryjI=.372ae759-fc49-496d-b1c1-c10ad065987b@github.com> Message-ID: On Wed, 2 Aug 2023 12:33:43 GMT, Ferenc Rakoczi wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3473: >> >>> 3471: __ bcax(v24, __ T16B, v24, v8, v31); >>> 3472: >>> 3473: __ ld1r(v31, __ T2D, __ post(rscratch1, 8)); >> >> is it intentional to load 16 bytes and post-increment by 8? > > Actually, with the ld1r instruction the post increment should be the same as the size of the memory accessed. So T2D requires 8 as it reads 8 bytes(and duplicates it into both halves of the SIMD register). Thanks for the clarification! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/207#discussion_r1281961206 From cslucas at openjdk.org Wed Aug 2 14:21:46 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 2 Aug 2023 14:21:46 GMT Subject: RFR: JDK-8312617: SIGSEGV in ConnectionGraph::verify_ram_nodes [v3] In-Reply-To: References: Message-ID: <59yooRWHPPrV5pryal5XycsKHDV2rbtsqtlGrQQ5iDo=.58929f4c-6b6b-4beb-87b2-4c413bd60e69@github.com> On Wed, 2 Aug 2023 05:18:34 GMT, Tobias Hartmann wrote: >> Cesar Soares Lucas has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update src/hotspot/share/opto/compile.cpp >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/share/opto/escape.cpp >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/share/opto/escape.cpp >> >> Co-authored-by: Tobias Hartmann > > All tests passed. Thank you for testing/reviewing @TobiHartmann / @vnkozlov ------------- PR Comment: https://git.openjdk.org/jdk/pull/15048#issuecomment-1662300283 From jvernee at openjdk.org Wed Aug 2 14:23:56 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 2 Aug 2023 14:23:56 GMT Subject: RFR: 8313406: nep_invoker_blob can be simplified more In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 12:35:50 GMT, Yasumasa Suenaga wrote: > I guess you suggested that ArgumentShuffle in HotSpot moves into DowncallLinker, right? No, ArgumentShuffle should stay inside HotSpot. We can not do all the shuffling on the Java side. We can only eliminate some of the register moves that are needed by re-ordering the arguments on the Java side. For instance, if you look at the comment in `assembler_x86.hpp`, where we define `j_rarg*` Register constants, you'll see this: // |-------------------------------------------------------| // | c_rarg0 c_rarg1 c_rarg2 c_rarg3 c_rarg4 c_rarg5 | // |-------------------------------------------------------| // | rcx rdx r8 r9 rdi* rsi* | windows (* not a c_rarg) // | rdi rsi rdx rcx r8 r9 | solaris/linux // |-------------------------------------------------------| // | j_rarg5 j_rarg0 j_rarg1 j_rarg2 j_rarg3 j_rarg4 | // |-------------------------------------------------------| i.e. all the registers in the Java calling convention are 'off by one' compared to the native calling convention. This makes sense for JNI since we need to prepend the JNIEnv* to the start of the argument list, but it doesn't make sense for Panama. Let's say we have a native function taking five `long`s. On Linux/x64 the VMStorage[] for the arguments (the one we use when creating the NativeEntryPoint inside DowncallLinker) would be: [rdi, rsi, rdx, rcx, r8, r9] i.e. the first argument we pass on the Java side gets moved (by the downcall stub) into `rdi`, the second into `rsi`, etc. This doesn't match the incoming registers of the Java calling convention, where the first argument is passed passing in `rsi`, the second is passed in `rdx`, etc. (i.e. off-by-one). We can simply re-arrange the entries in the `VMStorage[]` to match the order of registers in the Java calling convention: [rsi, rdx, rcx, r8, r9, rdi] i.e. the argument that should go into `rdi` is passed in the fifth position instead. Since now the registers for each argument match the Java calling convention, the downcall stub doesn't need to do any shuffling! (I'm being very hand-wavey here. Figuring out how to correctly do the re-ordering is the hard part of this). Ok, but now the arguments we pass to the downcall stub are going to go in the wrong registers as well :( So, to compensate for that, we have to also re-order the incoming argument values on the Java side so that each argument will correspond to it's original register again. To do that, we just need to pass the first argument in the fifth position as well, and shift arguments 1-4 forward by one spot. Make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15089#issuecomment-1662302891 From cslucas at openjdk.org Wed Aug 2 14:29:58 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 2 Aug 2023 14:29:58 GMT Subject: Integrated: JDK-8312617: SIGSEGV in ConnectionGraph::verify_ram_nodes In-Reply-To: References: Message-ID: On Wed, 26 Jul 2023 22:26:05 GMT, Cesar Soares Lucas wrote: > - Return early from `verify_ram_nodes` if compilation is already failing. > - Add back check for `failing()` after `eliminate_macro_nodes()`. > - Print additional diagnostic information when an unexpected user of RAM is encountered. > > Tested with tier1-3 on Linux x64. This pull request has now been integrated. Changeset: 64467923 Author: Cesar Soares Lucas Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/6446792327c629dbd1dfc1edfb547065f6fce651 Stats: 31 lines in 3 files changed: 22 ins; 0 del; 9 mod 8312617: SIGSEGV in ConnectionGraph::verify_ram_nodes Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/15048 From duke at openjdk.org Wed Aug 2 15:14:51 2023 From: duke at openjdk.org (Tobias Hotz) Date: Wed, 2 Aug 2023 15:14:51 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz wrote: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... First of all, thanks for the feedback. It already handles the case when flags are set depending on the operation. Right now, there are 3 cases that can be expressed: - The flag is set based on the result of the operation. That means OF is set if an overflow occured during the Operation, ZF is set if the result is zero, etc. This is expressed through the `KILL CR` effect in combination with the flag `Flag_sets_xxx_flag` - The flag is zeroed by the instruction. An example for this is the OF and CF flags in case of `and`/`test`. This is expressed through the `KILL CR` effect in combination with the flag `Flag_clears_xxx_flag` - The state of the flag in unknown/undefined. Examples here are the PF flag in case of the `andn` instruction. This is also the case for all flags for the `bsr` instruction. This is expressed through the `KILL CR` effect and no matching Flag that specifies sets or clears. I mainly used https://www.felixcloutier.com/x86/index.html as a reference (which is based on the official intel developer manuel) to specify all these flags. Adding support for additional states such as "Sets flag based on source operand" would take up even more space in the flags int, and as there is only room for 2 additional flags, this would not work, as flags is a juint. Also, I think the cases where we would be able to remove a TEST instruction based on that would be very rare due to 1) Not many instructions that are commonly used settings flags based on the input and 2) The instructions that set flags due to the input not setting many flags, only leaving simple zero checks as valid. Regarding IR tests: Yeah it seems like that would make sense. I will look into that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1662391908 From qamai at openjdk.org Wed Aug 2 15:43:53 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 2 Aug 2023 15:43:53 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: <3D36E2LVr-YbbyQ_w8NNfRUJhKlZl0IHhyJ3Ztg3UKg=.e23d2654-fc02-44dc-92e4-5d19d18dcab4@github.com> On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz wrote: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... I see, personally I think OF should be marked as being clobbered for `add` because it is really specific to addition that it overflows and not on the result of that addition like `ZF` or `SF`. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1662445247 From sviswanathan at openjdk.org Wed Aug 2 18:49:54 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 2 Aug 2023 18:49:54 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v13] In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 17:47:52 GMT, Srinivas Vamsi Parasa wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> add special cases to float and double arrays > > @schlosna Thanks David for suggesting a more elegant way by using switch()! @vamsi-parasa With fastdebug build I see the following error: Internal Error (jdk/src/hotspot/share/opto/escape.cpp:1196), pid=3543536, tid=3543559 fatal error: EA unexpected CallLeaf arraysort_stub Please take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1662773022 From sviswanathan at openjdk.org Wed Aug 2 18:49:55 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 2 Aug 2023 18:49:55 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 17:54:05 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update src/java.base/share/classes/java/util/Arrays.java > > Co-authored-by: David Schlosnagle src/java.base/share/classes/java/util/Arrays.java line 95: > 93: */ > 94: @IntrinsicCandidate > 95: public static void arraySort(Class elemType, Object array, long offset, int fromIndex, int toIndex) { Does this method need to be public? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1282206397 From duke at openjdk.org Wed Aug 2 20:48:51 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Wed, 2 Aug 2023 20:48:51 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 13:32:36 GMT, Feilong Jiang wrote: >> Please review this changes into risc-v double rounding intrinsic. >> >> On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). >> >> Also, we should provide some rounding mode to fcvt.x.x instruction. >> >> Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: >> >>> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). >> >> For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. >> For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. >> >> For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). >> As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). >> >> We have observed significant improvement on hifive and thead boards. >> >> testing: tier1, tier2 and hotspot:tier3 on hifive >> >> Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): >> >> Without intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms >> >> With intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4286: > >> 4284: slli(mask, mask, 63); >> 4285: // conversion from double to long >> 4286: fcvt_l_d(converted_dbl, src, rm_direct); > > How about using `fclass` [1] to check the special cases of input, then we can just do `fcvt.l.d` and `fcvt.d.l` for normal inputs? We can check the result of `fclass`. If the input contains NaN/infinity/+0/-0, we could return the value without conversion. > > 1. https://github.com/riscv/riscv-isa-manual/blob/3a6edf7ebf6af9e6ad92ace865c0069090870c20/src/f-st-ext.adoc?plain=1#L487-L500 Hi, thanks for your review. Also, we can use `fclass` to check cases NaN/+(-)INF/+(-)0.0 but we still need to check if value exeed `2^63 - 1 `(for positive input value) and `-2^63` (for negative one). So, we should leave check of converted value and we can add branch with a result of `fclass`. It will give an additional branch on regular values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1282402592 From sviswanathan at openjdk.org Wed Aug 2 23:41:01 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 2 Aug 2023 23:41:01 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 17:54:05 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update src/java.base/share/classes/java/util/Arrays.java > > Co-authored-by: David Schlosnagle Also need to handle arraySort in file: share/gc/shenandoah/c2/shenandoahSupport.cpp, function: ShenandoahBarrierC2Support::verify around line 3000. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1663099103 From sviswanathan at openjdk.org Thu Aug 3 00:18:54 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Aug 2023 00:18:54 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 17:54:05 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update src/java.base/share/classes/java/util/Arrays.java > > Co-authored-by: David Schlosnagle src/hotspot/share/runtime/vmStructs.cpp line 535: > 533: static_field(StubRoutines, _arraysort_long, address) \ > 534: static_field(StubRoutines, _arraysort_float, address) \ > 535: static_field(StubRoutines, _arraysort_double, address) \ Should this be in hotspot/share/jvmci/vmStructs_jvmci.cpp instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1282520387 From fjiang at openjdk.org Thu Aug 3 01:11:42 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 3 Aug 2023 01:11:42 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint In-Reply-To: References: Message-ID: <19WAdFX2lqYf-Hka7-pw2r5Um6sjztL1JVx8lfFyKu4=.12997dbc-506b-4470-ba2c-a5275c587ff2@github.com> On Mon, 24 Jul 2023 08:22:52 GMT, Ilya Gavrilin wrote: > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4302: > 4300: // if got conversion overflow return src > 4301: bind(bad_val); > 4302: fsgnj_d(dst, src, src); We can use `fmv_d(dst, src)` here for better understanding. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1282541489 From lmesnik at openjdk.org Thu Aug 3 02:49:47 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 3 Aug 2023 02:49:47 GMT Subject: RFR: 8307462: [REDO] VmObjectAlloc is not generated by intrinsics methods which allocate objects Message-ID: The fix adds posting VmObjectAlloc events by Unsafe.allocateInstance(Class cls). The previous attempt to post event directly from 'LibraryCallKit::inline_unsafe_allocate()' cause performance regression even if jvmti event is not enabled. Some optimizations have been disabled just because possible usage and escaping of newly allocated object. So event posting is doing by returning to interpreter if events are enabled. I verified that that performance (run locally only) of org.renaissance.jdk.streams.JmhScrabble.runOperation doesn't change if events are not enabled. There might be other intrinsics like 'LibraryCallKit::inline_unsafe_newArray()' where VM allocate memory. I'm going to file separate issue to find and fix them. Many thanks to Tobias H. for proposed solution. Testing with all tiers. ------------- Commit messages: - fixed comments and problemlist - fixed - 8307462: [REDO] VmObjectAlloc is not generated by intrinsics methods which allocate objects Changes: https://git.openjdk.org/jdk/pull/15110/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15110&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307462 Stats: 39 lines in 6 files changed: 35 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15110/head:pull/15110 PR: https://git.openjdk.org/jdk/pull/15110 From sviswanathan at openjdk.org Thu Aug 3 04:08:36 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Aug 2023 04:08:36 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 17:54:05 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update src/java.base/share/classes/java/util/Arrays.java > > Co-authored-by: David Schlosnagle test/micro/org/openjdk/bench/java/util/ArraysSort.java line 55: > 53: @State(Scope.Thread) > 54: @Warmup(iterations = 3, time=60) > 55: @Measurement(iterations = 3, time=120) Warmup/measurement time could be reduced in the jmh micro to 2s/5s. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1282618111 From ysuenaga at openjdk.org Thu Aug 3 04:10:29 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Thu, 3 Aug 2023 04:10:29 GMT Subject: RFR: 8313406: nep_invoker_blob can be simplified more In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 12:22:00 GMT, Yasumasa Suenaga wrote: > In FFM, native function would be called via `nep_invoker_blob`. If the function has two arguments, it would be following: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fcae394cd10 > -------------------------------------------------------------------------------- > 0x00007fcae394cd80: pushq %rbp > 0x00007fcae394cd81: movq %rsp, %rbp > 0x00007fcae394cd84: subq $0, %rsp > ;; { argument shuffle > 0x00007fcae394cd88: movq %r8, %rax > 0x00007fcae394cd8b: movq %rsi, %r10 > 0x00007fcae394cd8e: movq %rcx, %rsi > 0x00007fcae394cd91: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fcae394cd94: callq *%r10 > 0x00007fcae394cd97: leave > 0x00007fcae394cd98: retq > > > `subq $0, %rsp` is for shadow space on stack, and `movq %r8, %rax` is number of args for variadic function. So they are not necessary in some case. They should be remove following if they are not needed: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fd8778e2810 > -------------------------------------------------------------------------------- > 0x00007fd8778e2880: pushq %rbp > 0x00007fd8778e2881: movq %rsp, %rbp > ;; { argument shuffle > 0x00007fd8778e2884: movq %rsi, %r10 > 0x00007fd8778e2887: movq %rcx, %rsi > 0x00007fd8778e288a: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fd8778e288d: callq *%r10 > 0x00007fd8778e2890: leave > 0x00007fd8778e2891: retq > > > All java/foreign jtreg tests are passed. > > We can see these stub code on [ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/examples/cpumodel) with `-XX:+UnlockDiagnosticVMOptions -XX:+PrintStubCode` and hsdis library. This testcase linked the code with `Linker.Option.isTrivial()`. > > After this change, FFM performance on [another ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/benchmarks/funccall) was improved: > > before: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 106664071.816 ? 14396524.718 ops/s > FuncCallComparison.rdtsc thrpt 3 108024079.738 ? 13223921.011 ops/s > > > after: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 107622971.525 ? 12249767.134 ops/s > FuncCallComparison.rdtsc thrpt 3 107695741.608 ? 23983281.346 ops/s > > > Environment: > * CPU: AMD Ryzen 3 3300X > * OS: Fedora 38 x86_64 (Kernel 6.3.8-200.fc38.x86_64) > * Hyper-V 4vCPU, 8GB mem Ideally it is the best if we eliminate all of the shuffling completely, but it is impossible I think. We have to use `MethodHandles::permuteArguments` to apply reordering to `NativeMethodHandle` as you said, then shuffling would remain somewhare even if we could eliminate them from NEP stub. Thus this topic would be lower priority if my guessing is correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15089#issuecomment-1663260805 From jvernee at openjdk.org Thu Aug 3 05:12:29 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 3 Aug 2023 05:12:29 GMT Subject: RFR: 8313406: nep_invoker_blob can be simplified more In-Reply-To: References: Message-ID: <6T79GckPQ7lIgiaw0wYNjhLWMnI0NyAQJysLBOT4yX0=.c97c8155-de39-440a-9ae7-7ca706fd4830@github.com> On Thu, 3 Aug 2023 04:07:27 GMT, Yasumasa Suenaga wrote: > then shuffling would remain somewhare even if we could eliminate them from NEP stub. Since things on the Java side are visible to the JIT, it should be able to avoid the extra data motion. > Thus this topic would be lower priority if my guessing is correct. Yes, it is lower priority. It's relatively complex to solve, and also CPUs, in my experience, don't generally care that much about the shuffling. They probably just change their internal register allocation table instead of doing the actual moves. Also, it will ultimately help more to implement C2 intrinsics for native calls, as that avoids going through the downcall stub altogether. I have an old POC for that which I will dust off. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15089#issuecomment-1663298793 From dnsimon at openjdk.org Thu Aug 3 07:43:34 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 3 Aug 2023 07:43:34 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v2] In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 07:12:08 GMT, Yudi Zheng wrote: >> This PR exports `vmIntrinsic::is_intrinsic_available`, `Compiler::is_intrinsic_supported`, and `C2Compiler::is_intrinsic_supported` results to JVMCI compiler. This allows JVMCI compiler to comply with `-XX:DisableIntrinsic`, `-XX:ControlIntrinsic`, `-XX:-UseXXXIntrinsic`, and is essential for running test that depends on these flags, e.g., `java/lang/Float/Binary16ConversionNaN` that returns different result in the interpreter with `-XX:DisableIntrinsic=_float16ToFloat,_floatToFloat16`. >> This PR also attempts to fix some of the `is_intrinsic_available` results. Please see the inlined comments. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > update is_intrinsic_supported for _dcopySign,_fcopySign. The JVMCI changes look good to me but someone else still needs to review the C1, C2 and shared assembler changes. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15133#pullrequestreview-1560433644 From dlong at openjdk.org Thu Aug 3 08:02:37 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 3 Aug 2023 08:02:37 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz wrote: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... I don't see where the peephole rule is checking that the test instruction is testing the same register that was the destination of the earlier instruction that set the flags. Also, is there an example of an instruction annotated with the new flag information that does NOT set the required flags? If not, then I don't see why we need to track individual flags. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1663476312 From chagedorn at openjdk.org Thu Aug 3 08:07:26 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Aug 2023 08:07:26 GMT Subject: RFR: 8305636: Expand and clean up predicate classes and move them into separate files [v3] In-Reply-To: References: Message-ID: > This is the third clean-up PR towards fixing issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch does not change anything in the way the old Assertion Predicates work. > > After collecting and moving the predicate code in the last clean-up PR https://github.com/openjdk/jdk/pull/14017 to the classes `Predicates/ParsePredicates`, I'm now completely moving the code to separate `predicates.cpp/hpp` files. By doing so, I also updated the predicate description and updated some namings. Since this description is also moved to the new files, I've committed the description update separately to better reflect these changes. > > Changes include: > - Moved `Predicates/ParsePredicates` classes to new files `predicates.cpp/hpp`. > - Turning the `Predicates` utility class into a real class to represent all predicates: > - Contains three `PredicateBlock` fields for each Predicate Block (see description of `Predicate Block`). > - The `PredicateBlock` class offers methods to query the presence of predicates and to access them (e.g. get the Parse Predicate projection). > - In the process, the `ParsePredicates` could be removed as the Parse Predicates are now covered by the `PredicateBlock` class. > - New `AssertionPredicatesWithHalt` class to skip over Assertion Predicates (will be further cleaned up later with the complete fix in JDK-8288981). > - Updated predicate description and moved to `predicates.hpp`. > - While testing a prototype fix of JDK-8288981, I've came to the conclusion that we should not move all Assertion Predicates to a separate block below the Parse and Hoisted Predicates, because it prevented further application of Loop Predication due to pins of data nodes to these Assertion Predicates while the Hoisted Predicates needed them above the Assertion Predicates (i.e. dominance problems leading to bad graph assertions). I've removed that part of the description that gave a heads-up about that change. > - Small clean-ups such as variable renaming or code move. > > Not included: > - Refactoring predicate traversal to clone/copy/initialize predicates for loop unswitching, pre/main/post, loop peeling etc. (this is only done in the actual fix in JDK-8288981 which requires some updates anyways - so this refactoring is not done here (yet)). > > Testing: Tier1-7, hs-precheckin-comp, hs-comp-stress > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8305636 - 8308682: Enhance AES performance Reviewed-by: adinn, sviswanathan, dlong, kvn - 8308682: Enhance AES performance - Update src/hotspot/share/opto/predicates.hpp Co-authored-by: Tobias Hartmann - Renaming Hoisted Predicate -> Hoisted Check Predicate in description and comments as discussed offline with Tobias, fixing additional typos in description - 8305636: Expand and clean up predicate classes and move them into separate files - Update description ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14814/files - new: https://git.openjdk.org/jdk/pull/14814/files/020cff3c..b8d7759d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14814&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14814&range=01-02 Stats: 90638 lines in 1161 files changed: 27823 ins; 59504 del; 3311 mod Patch: https://git.openjdk.org/jdk/pull/14814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14814/head:pull/14814 PR: https://git.openjdk.org/jdk/pull/14814 From chagedorn at openjdk.org Thu Aug 3 08:21:57 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Aug 2023 08:21:57 GMT Subject: RFR: 8305636: Expand and clean up predicate classes and move them into separate files [v4] In-Reply-To: References: Message-ID: > This is the third clean-up PR towards fixing issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch does not change anything in the way the old Assertion Predicates work. > > After collecting and moving the predicate code in the last clean-up PR https://github.com/openjdk/jdk/pull/14017 to the classes `Predicates/ParsePredicates`, I'm now completely moving the code to separate `predicates.cpp/hpp` files. By doing so, I also updated the predicate description and updated some namings. Since this description is also moved to the new files, I've committed the description update separately to better reflect these changes. > > Changes include: > - Moved `Predicates/ParsePredicates` classes to new files `predicates.cpp/hpp`. > - Turning the `Predicates` utility class into a real class to represent all predicates: > - Contains three `PredicateBlock` fields for each Predicate Block (see description of `Predicate Block`). > - The `PredicateBlock` class offers methods to query the presence of predicates and to access them (e.g. get the Parse Predicate projection). > - In the process, the `ParsePredicates` could be removed as the Parse Predicates are now covered by the `PredicateBlock` class. > - New `AssertionPredicatesWithHalt` class to skip over Assertion Predicates (will be further cleaned up later with the complete fix in JDK-8288981). > - Updated predicate description and moved to `predicates.hpp`. > - While testing a prototype fix of JDK-8288981, I've came to the conclusion that we should not move all Assertion Predicates to a separate block below the Parse and Hoisted Predicates, because it prevented further application of Loop Predication due to pins of data nodes to these Assertion Predicates while the Hoisted Predicates needed them above the Assertion Predicates (i.e. dominance problems leading to bad graph assertions). I've removed that part of the description that gave a heads-up about that change. > - Small clean-ups such as variable renaming or code move. > > Not included: > - Refactoring predicate traversal to clone/copy/initialize predicates for loop unswitching, pre/main/post, loop peeling etc. (this is only done in the actual fix in JDK-8288981 which requires some updates anyways - so this refactoring is not done here (yet)). > > Testing: Tier1-7, hs-precheckin-comp, hs-comp-stress > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8305636 - Update src/hotspot/share/opto/predicates.hpp Co-authored-by: Tobias Hartmann - Renaming Hoisted Predicate -> Hoisted Check Predicate in description and comments as discussed offline with Tobias, fixing additional typos in description - 8305636: Expand and clean up predicate classes and move them into separate files - Update description ------------- Changes: https://git.openjdk.org/jdk/pull/14814/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14814&range=03 Stats: 1182 lines in 10 files changed: 566 ins; 470 del; 146 mod Patch: https://git.openjdk.org/jdk/pull/14814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14814/head:pull/14814 PR: https://git.openjdk.org/jdk/pull/14814 From never at openjdk.org Thu Aug 3 08:34:50 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 3 Aug 2023 08:34:50 GMT Subject: RFR: 8313421: [JVMCI] avoid locking class loader in CompilerToVM.lookupType In-Reply-To: References: Message-ID: <9XH2Qt_pWvg2D4UXp0Yg0DlhgA_l8tiL97Xj7A1RSpU=.8f52c42a-841a-4557-9c9b-9aabfd5ffc12@github.com> On Wed, 2 Aug 2023 20:33:49 GMT, Doug Simon wrote: > This PR removes the need to lock the system class loader when converting Class instances for boot and platform classes to ResolvedJavaType objects. Not only is the system class loader a suboptimal loader for resolving these classes but locking it can cause deadlock in some JDK tests (e.g. `test/jdk/java/lang/System/LoggerFinder/`) when run with `-Xcomp`. For example, a thread that holds the system class loader lock and triggers a blocking compilation will deadlock with the compiler thread servicing the compilation if the compilation requires calling `CompilerToVM.lookupType` (which most compilations do). This looks looks like a nice cleanup ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15128#pullrequestreview-1559962501 From dnsimon at openjdk.org Thu Aug 3 08:34:49 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 3 Aug 2023 08:34:49 GMT Subject: RFR: 8313421: [JVMCI] avoid locking class loader in CompilerToVM.lookupType Message-ID: This PR removes the need to lock the system class loader when converting Class instances for boot and platform classes to ResolvedJavaType objects. Not only is the system class loader a suboptimal loader for resolving these classes but locking it can cause deadlock in some JDK tests (e.g. `test/jdk/java/lang/System/LoggerFinder/`) when run with `-Xcomp`. For example, a thread that holds the system class loader lock and triggers a blocking compilation will deadlock with the compiler thread servicing the compilation if the compilation requires calling `CompilerToVM.lookupType` (which most compilations do). ------------- Commit messages: - avoid locking class loader in CompilerToVM.lookupType (JDK-8313421) Changes: https://git.openjdk.org/jdk/pull/15128/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15128&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313421 Stats: 93 lines in 8 files changed: 41 ins; 19 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/15128.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15128/head:pull/15128 PR: https://git.openjdk.org/jdk/pull/15128 From mbaesken at openjdk.org Thu Aug 3 08:50:41 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 3 Aug 2023 08:50:41 GMT Subject: RFR: JDK-8313632: ciEnv::dump_replay_data use fclose Message-ID: Seems we miss to call fclose at the end of ciEnv::dump_replay_data . This should better be done like it is documented here in the fdopen example : https://www.ibm.com/docs/en/i/7.3?topic=functions-fdopen-associates-stream-file-descriptor I also added close calls in case fdopen fails, should we use them too? ------------- Commit messages: - JDK-8313632 Changes: https://git.openjdk.org/jdk/pull/15135/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15135&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313632 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15135.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15135/head:pull/15135 PR: https://git.openjdk.org/jdk/pull/15135 From thartmann at openjdk.org Thu Aug 3 09:15:31 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Aug 2023 09:15:31 GMT Subject: RFR: JDK-8313632: ciEnv::dump_replay_data use fclose In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 08:43:03 GMT, Matthias Baesken wrote: > Seems we miss to call fclose at the end of ciEnv::dump_replay_data . > This should better be done like it is documented here in the fdopen example : > https://www.ibm.com/docs/en/i/7.3?topic=functions-fdopen-associates-stream-file-descriptor > > I also added close calls in case fdopen fails, should we use them too? Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15135#pullrequestreview-1560605517 From shade at openjdk.org Thu Aug 3 09:48:30 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 3 Aug 2023 09:48:30 GMT Subject: RFR: 8313248: C2: setScopedValueCache intrinsic exposes nullptr pre-values to store barriers [v3] In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 08:00:12 GMT, Aleksey Shipilev wrote: >> See the bug for investigation breadcrumbs. The root cause for failures seen with Shenandoah seem to be as follows. >> >> The setter (`setScopedValueCache`) intrinsic passes `val_type` of `_gvn.type(arr)`, which is `narrowoop: java/lang/Object *[int:32] (java/lang/Cloneable,java/io/Serializable):NotNull:exact *`, derived from the `argument(0)`, and thus implies non-nullity. >> >> So when Shenandoah's SATB barrier loads the `pre_val`, it folds the null-check, assuming the `pre_val` is not null, due to `val_type`. This passes `nullptr` to SATB queues or slowpath, and we crash in either queue filtering or barrier code that does not expect nullptrs on SATB paths. The getter (`scopedValueCache`) constructs the `objects_type` explicitly to imply the value can be null. I think we should do the same for setter, since it can hide the "getter" from SATB barrier inside of it. >> >> Arguably, it is a landmine that GC barriers assume the `val_type` is the type of both stored value and the pre-value read from memory. So the non-null-ness derived for stored value gets used to reason for non-null-ness for pre-value. We can explore the solutions to that generic problem after we plug this leak. Other `access_store_at` uses in C2 intrinsics seem to only operate on thread fields that are not null, so the are not susceptible to this problem. `scopedValueCache` is a notable exception of lazily initialized thread OopHandle accessed from C2. >> >> I think G1 SATB barriers have the same problem, but I have not tried to reproduce the failure very hard there. (It would, AFAIU, require writing the test which does G1 concurrent marks, not just young GCs.) >> >> Attn @theRealAph ;) >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, 10+ iterations of `java/lang/ScopedValue/StressStackOverflow.java` with Shenandoah >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with Shenandoah >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with G1 >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move the stars Thanks! Any other reviewers for this one? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15105#issuecomment-1663657449 From chagedorn at openjdk.org Thu Aug 3 10:25:32 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Aug 2023 10:25:32 GMT Subject: RFR: 8312909: C1 should not inline through interface calls with non-subtype receiver In-Reply-To: <9srO-AzEErczEhwmXkbjL_21BwV8uF_xoEG1t67x0OI=.37f04313-b6cd-437f-9c17-d8e1a654b155@github.com> References: <9srO-AzEErczEhwmXkbjL_21BwV8uF_xoEG1t67x0OI=.37f04313-b6cd-437f-9c17-d8e1a654b155@github.com> Message-ID: On Tue, 25 Jul 2023 12:47:00 GMT, Tobias Hartmann wrote: > This is a problem with C1 compiling an interface call with an invalid receiver (see `TestInvokeinterfaceWithBadReceiverHelper`): > ``` > ldc String "42"; > invokeinterface InterfaceMethod MyInterface.get:"()Ljava/lang/String;", 1; > > > `String` does not implement `MyInterface` but Class Hierarchy Analysis determined that there is only one implementor of MyInterface: > > class MyClass implements MyInterface { > @Stable > String field = "42"; > > public String get() { > return field; > } > } > > C1 emits a receiver subtype check (that will obviously fail at runtime and trigger an `IncompatibleClassChangeError`) and proceeds with inlining the `MyClass::get` method on the `String` receiver. It then tries to fold a stable field load by loading it's value at compile time which asserts/fails because the `String` receiver does not have such a field. The fix is to bail out from inlining when we can statically determine that the receiver subtype check will always fail at runtime. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15018#pullrequestreview-1560774657 From shade at openjdk.org Thu Aug 3 10:51:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 3 Aug 2023 10:51:39 GMT Subject: RFR: 8313676: Amend TestLoadIndexedMismatch test to target intrinsic directly Message-ID: See the bug for the reasons. Basically, we want to target the intrinsic directly, to avoid the dependence on the JDK code shape. Additional testing: - [x] mainline: test is still sensitive to JDK-8313402 fix - [x] 17u: test is _now_ sensitive to JDK-8313402 fix ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/15136/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15136&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313676 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15136/head:pull/15136 PR: https://git.openjdk.org/jdk/pull/15136 From rkennke at openjdk.org Thu Aug 3 10:57:32 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 3 Aug 2023 10:57:32 GMT Subject: RFR: 8313248: C2: setScopedValueCache intrinsic exposes nullptr pre-values to store barriers [v3] In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 08:00:12 GMT, Aleksey Shipilev wrote: >> See the bug for investigation breadcrumbs. The root cause for failures seen with Shenandoah seem to be as follows. >> >> The setter (`setScopedValueCache`) intrinsic passes `val_type` of `_gvn.type(arr)`, which is `narrowoop: java/lang/Object *[int:32] (java/lang/Cloneable,java/io/Serializable):NotNull:exact *`, derived from the `argument(0)`, and thus implies non-nullity. >> >> So when Shenandoah's SATB barrier loads the `pre_val`, it folds the null-check, assuming the `pre_val` is not null, due to `val_type`. This passes `nullptr` to SATB queues or slowpath, and we crash in either queue filtering or barrier code that does not expect nullptrs on SATB paths. The getter (`scopedValueCache`) constructs the `objects_type` explicitly to imply the value can be null. I think we should do the same for setter, since it can hide the "getter" from SATB barrier inside of it. >> >> Arguably, it is a landmine that GC barriers assume the `val_type` is the type of both stored value and the pre-value read from memory. So the non-null-ness derived for stored value gets used to reason for non-null-ness for pre-value. We can explore the solutions to that generic problem after we plug this leak. Other `access_store_at` uses in C2 intrinsics seem to only operate on thread fields that are not null, so the are not susceptible to this problem. `scopedValueCache` is a notable exception of lazily initialized thread OopHandle accessed from C2. >> >> I think G1 SATB barriers have the same problem, but I have not tried to reproduce the failure very hard there. (It would, AFAIU, require writing the test which does G1 concurrent marks, not just young GCs.) >> >> Attn @theRealAph ;) >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, 10+ iterations of `java/lang/ScopedValue/StressStackOverflow.java` with Shenandoah >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with Shenandoah >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with G1 >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move the stars Looks good. Thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15105#pullrequestreview-1560830390 From thartmann at openjdk.org Thu Aug 3 11:05:42 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Aug 2023 11:05:42 GMT Subject: RFR: 8312909: C1 should not inline through interface calls with non-subtype receiver In-Reply-To: <9srO-AzEErczEhwmXkbjL_21BwV8uF_xoEG1t67x0OI=.37f04313-b6cd-437f-9c17-d8e1a654b155@github.com> References: <9srO-AzEErczEhwmXkbjL_21BwV8uF_xoEG1t67x0OI=.37f04313-b6cd-437f-9c17-d8e1a654b155@github.com> Message-ID: On Tue, 25 Jul 2023 12:47:00 GMT, Tobias Hartmann wrote: > This is a problem with C1 compiling an interface call with an invalid receiver (see `TestInvokeinterfaceWithBadReceiverHelper`): > ``` > ldc String "42"; > invokeinterface InterfaceMethod MyInterface.get:"()Ljava/lang/String;", 1; > > > `String` does not implement `MyInterface` but Class Hierarchy Analysis determined that there is only one implementor of MyInterface: > > class MyClass implements MyInterface { > @Stable > String field = "42"; > > public String get() { > return field; > } > } > > C1 emits a receiver subtype check (that will obviously fail at runtime and trigger an `IncompatibleClassChangeError`) and proceeds with inlining the `MyClass::get` method on the `String` receiver. It then tries to fold a stable field load by loading it's value at compile time which asserts/fails because the `String` receiver does not have such a field. The fix is to bail out from inlining when we can statically determine that the receiver subtype check will always fail at runtime. > > Thanks, > Tobias Thanks for the review, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15018#issuecomment-1663777353 From thartmann at openjdk.org Thu Aug 3 11:05:44 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Aug 2023 11:05:44 GMT Subject: Integrated: 8312909: C1 should not inline through interface calls with non-subtype receiver In-Reply-To: <9srO-AzEErczEhwmXkbjL_21BwV8uF_xoEG1t67x0OI=.37f04313-b6cd-437f-9c17-d8e1a654b155@github.com> References: <9srO-AzEErczEhwmXkbjL_21BwV8uF_xoEG1t67x0OI=.37f04313-b6cd-437f-9c17-d8e1a654b155@github.com> Message-ID: On Tue, 25 Jul 2023 12:47:00 GMT, Tobias Hartmann wrote: > This is a problem with C1 compiling an interface call with an invalid receiver (see `TestInvokeinterfaceWithBadReceiverHelper`): > ``` > ldc String "42"; > invokeinterface InterfaceMethod MyInterface.get:"()Ljava/lang/String;", 1; > > > `String` does not implement `MyInterface` but Class Hierarchy Analysis determined that there is only one implementor of MyInterface: > > class MyClass implements MyInterface { > @Stable > String field = "42"; > > public String get() { > return field; > } > } > > C1 emits a receiver subtype check (that will obviously fail at runtime and trigger an `IncompatibleClassChangeError`) and proceeds with inlining the `MyClass::get` method on the `String` receiver. It then tries to fold a stable field load by loading it's value at compile time which asserts/fails because the `String` receiver does not have such a field. The fix is to bail out from inlining when we can statically determine that the receiver subtype check will always fail at runtime. > > Thanks, > Tobias This pull request has now been integrated. Changeset: ab1c212a Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/ab1c212ac1097ae6e1122ef1aba47ca51eca11f2 Stats: 109 lines in 3 files changed: 105 ins; 0 del; 4 mod 8312909: C1 should not inline through interface calls with non-subtype receiver Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/15018 From lucy at openjdk.org Thu Aug 3 11:34:31 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 3 Aug 2023 11:34:31 GMT Subject: RFR: JDK-8313632: ciEnv::dump_replay_data use fclose In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 08:43:03 GMT, Matthias Baesken wrote: > Seems we miss to call fclose at the end of ciEnv::dump_replay_data . > This should better be done like it is documented here in the fdopen example : > https://www.ibm.com/docs/en/i/7.3?topic=functions-fdopen-associates-stream-file-descriptor > > I also added close calls in case fdopen fails, should we use them too? Looks good to me. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15135#pullrequestreview-1560888969 From thartmann at openjdk.org Thu Aug 3 11:51:32 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Aug 2023 11:51:32 GMT Subject: RFR: 8313676: Amend TestLoadIndexedMismatch test to target intrinsic directly In-Reply-To: References: Message-ID: <01p5iFJcTz_B4gamxmk43IVLcynkttMDFXAxqEdEme0=.58a40c42-2c9c-4c00-a643-9fb96254c5a1@github.com> On Thu, 3 Aug 2023 10:43:34 GMT, Aleksey Shipilev wrote: > See the bug for the reasons. Basically, we want to target the intrinsic directly, to avoid the dependence on the JDK code shape. > > Additional testing: > - [x] mainline: test is still sensitive to JDK-8313402 fix > - [x] 17u: test is _now_ sensitive to JDK-8313402 fix Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15136#pullrequestreview-1560913736 From mbaesken at openjdk.org Thu Aug 3 11:54:29 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 3 Aug 2023 11:54:29 GMT Subject: RFR: JDK-8313632: ciEnv::dump_replay_data use fclose In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 08:43:03 GMT, Matthias Baesken wrote: > Seems we miss to call fclose at the end of ciEnv::dump_replay_data . > This should better be done like it is documented here in the fdopen example : > https://www.ibm.com/docs/en/i/7.3?topic=functions-fdopen-associates-stream-file-descriptor > > I also added close calls in case fdopen fails, should we use them too? Hi Tobias and Lutz, thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15135#issuecomment-1663844507 From mbaesken at openjdk.org Thu Aug 3 12:05:38 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 3 Aug 2023 12:05:38 GMT Subject: RFR: JDK-8313632: ciEnv::dump_replay_data use fclose In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 08:43:03 GMT, Matthias Baesken wrote: > Seems we miss to call fclose at the end of ciEnv::dump_replay_data . > This should better be done like it is documented here in the fdopen example : > https://www.ibm.com/docs/en/i/7.3?topic=functions-fdopen-associates-stream-file-descriptor > > I also added close calls in case fdopen fails, should we use them too? GHA errors are unrelated, they seem to be partly broken currently. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15135#issuecomment-1663856772 From mbaesken at openjdk.org Thu Aug 3 12:05:38 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 3 Aug 2023 12:05:38 GMT Subject: Integrated: JDK-8313632: ciEnv::dump_replay_data use fclose In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 08:43:03 GMT, Matthias Baesken wrote: > Seems we miss to call fclose at the end of ciEnv::dump_replay_data . > This should better be done like it is documented here in the fdopen example : > https://www.ibm.com/docs/en/i/7.3?topic=functions-fdopen-associates-stream-file-descriptor > > I also added close calls in case fdopen fails, should we use them too? This pull request has now been integrated. Changeset: 0f2fce71 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/0f2fce71680355412896b2cb2d96cc85f69324e7 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8313632: ciEnv::dump_replay_data use fclose Reviewed-by: thartmann, lucy ------------- PR: https://git.openjdk.org/jdk/pull/15135 From rehn at openjdk.org Thu Aug 3 12:55:41 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 3 Aug 2023 12:55:41 GMT Subject: RFR: 8295795: hsdis does not build with binutils 2.39+ Message-ID: Hi please consider. This works with 2.30, 2.34, 2.38, 2.39, 2.40, 2.41 and current master head. (tested x64 and some RV) There are 4 changes in binutils we work around. - zstd compressed debug sections - libsframe added - init_disassemble_info() change - libbfd.a is only present in .lib directory in newer binutils builds (older it is in both directories) (I think the issue is that we never do make install, thus have dependency on internal artifact placement) Specific to RV, there is a bug in binutils causing the standard extensions not being added to disassembler if we pass in NULL. This no way near perfect, but at least we can build hsdis with any contemporary binutils. Todo better I think we need to build and install binutils to check the version and then use that version to figure out what options to use when re-building and re-installing binutils for hsdis. I asked tool-chain people about our issues, they said, you can't do that. I.e. have source dependencies on many binutils versions. As RV is new and have new instructions added to it frequently we really need to be able to build with bleeding-edge binutils. (capstone RV is not actively worked on, llvm have many more dependencies) ------------- Commit messages: - Binutils fix Changes: https://git.openjdk.org/jdk/pull/15138/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15138&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295795 Stats: 42 lines in 2 files changed: 34 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/15138.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15138/head:pull/15138 PR: https://git.openjdk.org/jdk/pull/15138 From rehn at openjdk.org Thu Aug 3 15:41:31 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 3 Aug 2023 15:41:31 GMT Subject: RFR: 8295795: hsdis does not build with binutils 2.39+ In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 12:48:50 GMT, Robbin Ehn wrote: > Hi please consider. > > This works with 2.30, 2.34, 2.38, 2.39, 2.40, 2.41 and current master head. (tested x64 and some RV) > > There are 4 changes in binutils we work around. > - zstd compressed debug sections > - libsframe added > - init_disassemble_info() change > - libbfd.a is only present in .lib directory in newer binutils builds (older it is in both directories) (I think the issue is that we never do make install, thus have dependency on internal artifact placement) > > Specific to RV, there is a bug in binutils causing the standard extensions not being added to disassembler if we pass in NULL. > > This no way near perfect, but at least we can build hsdis with any contemporary binutils. > > Todo better I think we need to build and install binutils to check the version and then use that version to figure out what options to use when re-building and re-installing binutils for hsdis. > > I asked tool-chain people about our issues, they said, you can't do that. I.e. have source dependencies on many binutils versions. > > As RV is new and have new instructions added to it frequently we really need to be able to build with bleeding-edge binutils. (capstone RV is not actively worked on, llvm have many more dependencies) I notice if you prebuild (make install) layout, this does not work. I need to fix more. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15138#issuecomment-1664213288 From epeter at openjdk.org Thu Aug 3 16:52:32 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Aug 2023 16:52:32 GMT Subject: RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE In-Reply-To: References: <-McZdVKFHZcQcCJhosf7KVw34o6ZcAHr0hqGH7QIqsw=.eafa3a9e-f521-4097-aa6c-b00d5302b63d@github.com> Message-ID: On Mon, 31 Jul 2023 07:54:16 GMT, Pengfei Li wrote: >> Emanuel is on vacation until Aug 9. Can this wait or should we problem list? > >> Emanuel is on vacation until Aug 9. Can this wait or should we problem list? > > Thanks for the info. It's ok for us to wait, since very few people are using 512-bit SVE today. @pfustc @TobiHartmann I just saw this on my emails. So I'll give a quick response: We had this running on Aarch64 machines with `asimd` but without `sve`. Why do you think that this even passed with my 32 byte assumption (256 bit)? You say it should only have 128 bit. What is the `max_pre` for? Is it necessary? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15010#issuecomment-1664315717 From rehn at openjdk.org Thu Aug 3 17:35:07 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 3 Aug 2023 17:35:07 GMT Subject: RFR: 8295795: hsdis does not build with binutils 2.39+ [v2] In-Reply-To: References: Message-ID: > Hi please consider. > > This works with 2.30, 2.34, 2.38, 2.39, 2.40, 2.41 and current master head. (tested x64 and some RV) > > There are 4 changes in binutils we work around. > - zstd compressed debug sections > - libsframe added > - init_disassemble_info() change > - libbfd.a is only present in .lib directory in newer binutils builds (older it is in both directories) (I think the issue is that we never do make install, thus have dependency on internal artifact placement) > > Specific to RV, there is a bug in binutils causing the standard extensions not being added to disassembler if we pass in NULL. > > This no way near perfect, but at least we can build hsdis with any contemporary binutils. > > Todo better I think we need to build and install binutils to check the version and then use that version to figure out what options to use when re-building and re-installing binutils for hsdis. > > I asked tool-chain people about our issues, they said, you can't do that. I.e. have source dependencies on many binutils versions. > > As RV is new and have new instructions added to it frequently we really need to be able to build with bleeding-edge binutils. (capstone RV is not actively worked on, llvm have many more dependencies) Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Added parameter name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15138/files - new: https://git.openjdk.org/jdk/pull/15138/files/2d23bab6..a8ce2d37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15138&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15138&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15138.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15138/head:pull/15138 PR: https://git.openjdk.org/jdk/pull/15138 From thartmann at openjdk.org Thu Aug 3 18:03:46 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Aug 2023 18:03:46 GMT Subject: RFR: 8313712: [BACKOUT] 8313632: ciEnv::dump_replay_data use fclose Message-ID: Clean backout of [JDK-8313712](https://bugs.openjdk.org/browse/JDK-8313712). Thanks, Tobias ------------- Commit messages: - 8313712: [BACKOUT] 8313632: ciEnv::dump_replay_data use fclose Changes: https://git.openjdk.org/jdk/pull/15144/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15144&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313712 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15144/head:pull/15144 PR: https://git.openjdk.org/jdk/pull/15144 From mikael at openjdk.org Thu Aug 3 18:07:36 2023 From: mikael at openjdk.org (Mikael Vidstedt) Date: Thu, 3 Aug 2023 18:07:36 GMT Subject: RFR: 8313712: [BACKOUT] 8313632: ciEnv::dump_replay_data use fclose In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 17:56:43 GMT, Tobias Hartmann wrote: > Clean backout of [JDK-8313712](https://bugs.openjdk.org/browse/JDK-8313712). > > Thanks, > Tobias Marked as reviewed by mikael (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15144#pullrequestreview-1561611390 From thartmann at openjdk.org Thu Aug 3 18:07:37 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Aug 2023 18:07:37 GMT Subject: RFR: 8313712: [BACKOUT] 8313632: ciEnv::dump_replay_data use fclose In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 17:56:43 GMT, Tobias Hartmann wrote: > Clean backout of [JDK-8313712](https://bugs.openjdk.org/browse/JDK-8313712). > > Thanks, > Tobias Thanks for the quick review, Mikael. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15144#issuecomment-1664410730 From thartmann at openjdk.org Thu Aug 3 18:11:38 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Aug 2023 18:11:38 GMT Subject: Integrated: 8313712: [BACKOUT] 8313632: ciEnv::dump_replay_data use fclose In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 17:56:43 GMT, Tobias Hartmann wrote: > Clean backout of [JDK-8313712](https://bugs.openjdk.org/browse/JDK-8313712). > > Thanks, > Tobias This pull request has now been integrated. Changeset: 45771479 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/4577147993c2f87e6ba298a664acad5decc968f0 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod 8313712: [BACKOUT] 8313632: ciEnv::dump_replay_data use fclose Reviewed-by: mikael ------------- PR: https://git.openjdk.org/jdk/pull/15144 From kvn at openjdk.org Thu Aug 3 19:29:30 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 3 Aug 2023 19:29:30 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v2] In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 08:51:19 GMT, Pengfei Li wrote: >> This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. >> >> To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. >> >> Tested various of compiler control related VM flags on x86 and AArch64. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Re-work correctness check to allow "-Xbatch" Thank you for addressing the test issue. About your change to allow -Xbatch. Let me clarify, if you exclude `-Xcomp` mode (which I agree with) by checking `UseInterpreter` flag for `true`, then a method could be always executed in Interpeter to get reference result (even with -XX:CompileThreshold=100) by calling method once first (we do that in other tests). You don't need to limit test to `Tiered` mode when Interpreter is available and original test code should work. >For -Xbatch we need check BackgroundCompilation. As we lock the compilation before running the test method in the interpreter to get reference result, the VM will hang if both background compilation is disabled and the compilation is locked by WhiteBox. You don't need to call `WB.lockCompilation()` if you exclude `-Xcomp` mode. There will be no compilation requests for called method when you call the method first time because compilation threshold will not be reached - it is guarantee that method will be executed in Interpreter. And you have the assert to verify that. Note, this work with and without BackgroundCompilation enabled. If you called a method once it will be executed in Interpreter if `-Xcomp` is excluded. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1664513886 From aph at openjdk.org Thu Aug 3 22:08:32 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 3 Aug 2023 22:08:32 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v13] In-Reply-To: <2OFyLMeiiFJdZDD-BKUcaW4lfaeis9VG4jALBQlYbOc=.f3ccd800-05e4-4dec-8980-ddf3392e1cc9@github.com> References: <2OFyLMeiiFJdZDD-BKUcaW4lfaeis9VG4jALBQlYbOc=.f3ccd800-05e4-4dec-8980-ddf3392e1cc9@github.com> Message-ID: On Tue, 1 Aug 2023 17:38:06 GMT, Srinivas Vamsi Parasa wrote: >> What is the reasoning behind this new public API? It doesn't follow the usual Java convention, which is to have overloads for each type. And it doesn't seem to provide anything not already provided by `Arrays.sort()`. > > Hi Andrew, the reason for the public API is to make AVX512 sort available to other data structures like MemorySegment (including the ones backed by native heap). The API of the arraySort() AVX512 intrinsic is similar to the public API of ArraysSupport.vectorizedMismatch() which is used by MemorySegment.mismatch(). There's no need to make this method public, and it should not be. There is no need to have it in the Java API. `ArraysSupport.vectorizedMismatch()` is in an internal JDK class, not part of the Java API. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1283764321 From pli at openjdk.org Fri Aug 4 03:21:34 2023 From: pli at openjdk.org (Pengfei Li) Date: Fri, 4 Aug 2023 03:21:34 GMT Subject: RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE In-Reply-To: References: <-McZdVKFHZcQcCJhosf7KVw34o6ZcAHr0hqGH7QIqsw=.eafa3a9e-f521-4097-aa6c-b00d5302b63d@github.com> Message-ID: On Thu, 3 Aug 2023 16:50:02 GMT, Emanuel Peter wrote: > We had this running on Aarch64 machines with asimd but without sve. Why do you think that this even passed with my 32 byte assumption (256 bit)? You say it should only have 128 bit. Assuming NEON has larger vector size (256 bit, which is wrong) won't result in any failure on NEON-only machines. But it results in running less IR checks on 256-bit SVE. Let's take below IR condition change as an example. - applyIfAnd = {"AlignVector", "false", "MaxVectorSize", ">= 8", "MaxVectorSize", "<= 16"}, + applyIfAnd = {"AlignVector", "false", "MaxVectorSize", ">= 8"}, Before this patch, the existence of vector IRs won't be checked on 256-bit SVE as we have `MaxVectorSize <= 16`. After this patch, it will be checked. The main reason of failures on 512-bit SVE is the lack of `sve == false` check so the IR tests will run on machines with vector length > 256 bits. > What is the max_pre for? Is it necessary? It just adds a prefix to make the comment more precise, as SVE uses scalable vectors and the vector length ranges from 128 bits to 2048 bits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15010#issuecomment-1664914590 From pli at openjdk.org Fri Aug 4 06:34:47 2023 From: pli at openjdk.org (Pengfei Li) Date: Fri, 4 Aug 2023 06:34:47 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v2] In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 19:27:11 GMT, Vladimir Kozlov wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Re-work correctness check to allow "-Xbatch" > > Thank you for addressing the test issue. > > About your change to allow -Xbatch. Let me clarify, if you exclude `-Xcomp` mode (which I agree with) by checking `UseInterpreter` flag for `true`, then a method could be always executed in Interpeter to get reference result (even with -XX:CompileThreshold=100) by calling method once first (we do that in other tests). > You don't need to limit test to `Tiered` mode when Interpreter is available and original test code should work. > >>For -Xbatch we need check BackgroundCompilation. As we lock the compilation before running the test method in the interpreter to get reference result, the VM will hang if both background compilation is disabled and the compilation is locked by WhiteBox. > > You don't need to call `WB.lockCompilation()` if you exclude `-Xcomp` mode. There will be no compilation requests for called method when you call the method first time because compilation threshold will not be reached - it is guarantee that method will be executed in Interpreter. And you have the assert to verify that. > > Note, this work with and without BackgroundCompilation enabled. If you called a method once it will be executed in Interpreter if `-Xcomp` is excluded. Hi @vnkozlov , Thanks for your reply. But it still has problems. > About your change to allow -Xbatch. Let me clarify, if you exclude -Xcomp mode (which I agree with) by checking UseInterpreter flag for true, then a method could be always executed in Interpeter to get reference result (even with -XX:CompileThreshold=100) by calling method once first (we do that in other tests). > You don't need to call WB.lockCompilation() if you exclude -Xcomp mode. There will be no compilation requests for called method when you call the method first time because compilation threshold will not be reached - it is guarantee that method will be executed in Interpreter. And you have the assert to verify that. These tests are a bit different because we test loops. If the loop iteration count reaches some threshold, the loop will be *OSR compiled* even test method is called only once. I just did an experiment according to your suggestion. After removing `WB.lockCompilation()` and updating loop iteration count to 100,000, I got assertion failure that tells me the test method is NOT running in interpreter. STDERR: java.lang.AssertionError at compiler.vectorization.runner.VectorizationTestRunner.runTestOnMethod(VectorizationTestRunner.java:131) at compiler.vectorization.runner.VectorizationTestRunner.run(VectorizationTestRunner.java:73) at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:215) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1570) A solution to this may be adding one more check of `CICompileOSR` is OFF if we still want to use interpreted execution for the reference result. Now the question is, which verification approach do you think is better? "C2 vs. interpreted" or "C2 vs. C1"? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1665082893 From lmesnik at openjdk.org Fri Aug 4 06:44:05 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 4 Aug 2023 06:44:05 GMT Subject: RFR: 8307462: [REDO] VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: > The fix adds posting VmObjectAlloc events by Unsafe.allocateInstance(Class cls). The previous attempt to post event directly from 'LibraryCallKit::inline_unsafe_allocate()' cause performance regression even if jvmti event is not enabled. Some optimizations have been disabled just because possible usage and escaping of newly allocated object. > So event posting is doing by returning to interpreter if events are enabled. > > I verified that that performance (run locally only) of > org.renaissance.jdk.streams.JmhScrabble.runOperation > doesn't change if events are not enabled. > > There might be other intrinsics like 'LibraryCallKit::inline_unsafe_newArray()' where VM allocate memory. I'm going to file separate issue to find and fix them. > > Many thanks to Tobias H. for proposed solution. > > Testing with all tiers. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: combined jvmti code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15110/files - new: https://git.openjdk.org/jdk/pull/15110/files/07499679..5095114f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15110&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15110&range=00-01 Stats: 38 lines in 1 file changed: 17 ins; 21 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15110/head:pull/15110 PR: https://git.openjdk.org/jdk/pull/15110 From dholmes at openjdk.org Fri Aug 4 06:54:38 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Aug 2023 06:54:38 GMT Subject: RFR: JDK-8313632: ciEnv::dump_replay_data use fclose In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 08:43:03 GMT, Matthias Baesken wrote: > Seems we miss to call fclose at the end of ciEnv::dump_replay_data . > This should better be done like it is documented here in the fdopen example : > https://www.ibm.com/docs/en/i/7.3?topic=functions-fdopen-associates-stream-file-descriptor > > I also added close calls in case fdopen fails, should we use them too? src/hotspot/share/ci/ciEnv.cpp line 1711: > 1709: dump_replay_data(&replay_data_stream); > 1710: tty->print_cr("# Compiler replay data is saved as: %s", buffer); > 1711: fclose(replay_data_file); Why are you doing this when the fileStream will close it in the destructor? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15135#discussion_r1284048435 From dholmes at openjdk.org Fri Aug 4 06:54:39 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Aug 2023 06:54:39 GMT Subject: RFR: JDK-8313632: ciEnv::dump_replay_data use fclose In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 06:49:41 GMT, David Holmes wrote: >> Seems we miss to call fclose at the end of ciEnv::dump_replay_data . >> This should better be done like it is documented here in the fdopen example : >> https://www.ibm.com/docs/en/i/7.3?topic=functions-fdopen-associates-stream-file-descriptor >> >> I also added close calls in case fdopen fails, should we use them too? > > src/hotspot/share/ci/ciEnv.cpp line 1711: > >> 1709: dump_replay_data(&replay_data_stream); >> 1710: tty->print_cr("# Compiler replay data is saved as: %s", buffer); >> 1711: fclose(replay_data_file); > > Why are you doing this when the fileStream will close it in the destructor? Never mind I see the follow up issue to fix this again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15135#discussion_r1284049850 From duke at openjdk.org Fri Aug 4 09:39:12 2023 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 4 Aug 2023 09:39:12 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v2] In-Reply-To: References: Message-ID: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: - Add IR test Currently, the peephole only works for branches, not conditional moves. - Add assert to verify that the machProj and the test operate on the same register. Also fix compilation on macos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14172/files - new: https://git.openjdk.org/jdk/pull/14172/files/c434ade8..71737a77 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=00-01 Stats: 157 lines in 5 files changed: 155 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14172/head:pull/14172 PR: https://git.openjdk.org/jdk/pull/14172 From duke at openjdk.org Fri Aug 4 09:53:57 2023 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 4 Aug 2023 09:53:57 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v3] In-Reply-To: References: Message-ID: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... Tobias Hotz has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into testPeephole - Use LF instead of CRLF - Add IR test Currently, the peephole only works for branches, not conditional moves. - Add assert to verify that the machProj and the test operate on the same register. Also fix compilation on macos - Use a new approach by telling the peephole which rules set and clear which flags By using this approach, the peephole rule is much more general and can cover more cases. This also means we can remove test instructions after add instructions if only specific flags are required. - Merge remote-tracking branch 'upstream/master' into testPeephole - Remove the old peepreplace empty block - we didn't use them - Add more benchmark cases - Add new benchmarks Also fix an error in the xor long peep definition - Merge remote-tracking branch 'upstream/master' into testPeephole - ... and 9 more: https://git.openjdk.org/jdk/compare/aad05427...18c6f790 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14172/files - new: https://git.openjdk.org/jdk/pull/14172/files/71737a77..18c6f790 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=01-02 Stats: 48273 lines in 1039 files changed: 27104 ins; 15761 del; 5408 mod Patch: https://git.openjdk.org/jdk/pull/14172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14172/head:pull/14172 PR: https://git.openjdk.org/jdk/pull/14172 From duke at openjdk.org Fri Aug 4 09:53:57 2023 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 4 Aug 2023 09:53:57 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v2] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 09:39:12 GMT, Tobias Hotz wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: > > - Add IR test > > Currently, the peephole only works for branches, not conditional moves. > - Add assert to verify that the machProj and the test operate on the same register. > > Also fix compilation on macos The peephole only applies the instruction that would be emitted immediately before the test is the input of the test. If that is the case, it must operate on the same register as the result of the above instruction. We then check the flags of the nodes. We need all the information cause specific operation require specific flags. For example, if the instruction following the test checks if the value is greater (than zero), the sign and zero flag need to be set and the overflow flag needs to be cleared. Not all instructions (such as add) satisfy this requirement, so in this case we would need to emit the test, but we could omit it if we only check for zero, as that only requires the ZF, which the test instruction sets. I also noticed another problem during the construction of the IR tests: Test instructions before conditional moves are currently not removed. This is due to the Matcher thinking that it needs to load a zero into one register when it emits a setb instruction. This is not the case, but the matcher does not know this and 1) emits a pointless register clear and 2) still has the loadConI0 in its graph, which causes the test peephole to bail out as its input is not the preceeding instruction. I think the removal of the loadConI0 in this case is a topic for another PR though. Fixing this will result in the peephole also working for setb instructions, which would be a huge win. ![idealgraphvisualizer64_c1r2gDJIi7](https://github.com/openjdk/jdk/assets/20151702/937d779e-1090-40e0-9128-dc172199250f) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1665334251 From shade at openjdk.org Fri Aug 4 09:55:41 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 4 Aug 2023 09:55:41 GMT Subject: RFR: 8313248: C2: setScopedValueCache intrinsic exposes nullptr pre-values to store barriers [v3] In-Reply-To: References: Message-ID: <3Zfol7mpLnSpR_mYklHyUmSDhZdcwCsyRD62860dx4Y=.d5a6ad84-3a94-4cd7-b7d7-97ed494bcb48@github.com> On Wed, 2 Aug 2023 08:00:12 GMT, Aleksey Shipilev wrote: >> See the bug for investigation breadcrumbs. The root cause for failures seen with Shenandoah seem to be as follows. >> >> The setter (`setScopedValueCache`) intrinsic passes `val_type` of `_gvn.type(arr)`, which is `narrowoop: java/lang/Object *[int:32] (java/lang/Cloneable,java/io/Serializable):NotNull:exact *`, derived from the `argument(0)`, and thus implies non-nullity. >> >> So when Shenandoah's SATB barrier loads the `pre_val`, it folds the null-check, assuming the `pre_val` is not null, due to `val_type`. This passes `nullptr` to SATB queues or slowpath, and we crash in either queue filtering or barrier code that does not expect nullptrs on SATB paths. The getter (`scopedValueCache`) constructs the `objects_type` explicitly to imply the value can be null. I think we should do the same for setter, since it can hide the "getter" from SATB barrier inside of it. >> >> Arguably, it is a landmine that GC barriers assume the `val_type` is the type of both stored value and the pre-value read from memory. So the non-null-ness derived for stored value gets used to reason for non-null-ness for pre-value. We can explore the solutions to that generic problem after we plug this leak. Other `access_store_at` uses in C2 intrinsics seem to only operate on thread fields that are not null, so the are not susceptible to this problem. `scopedValueCache` is a notable exception of lazily initialized thread OopHandle accessed from C2. >> >> I think G1 SATB barriers have the same problem, but I have not tried to reproduce the failure very hard there. (It would, AFAIU, require writing the test which does G1 concurrent marks, not just young GCs.) >> >> Attn @theRealAph ;) >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, 10+ iterations of `java/lang/ScopedValue/StressStackOverflow.java` with Shenandoah >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with Shenandoah >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with G1 >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move the stars All right, thanks for reviews! I am integrating to unbreak Shenandoah/G1 with Loom. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15105#issuecomment-1665343915 From shade at openjdk.org Fri Aug 4 09:55:42 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 4 Aug 2023 09:55:42 GMT Subject: Integrated: 8313248: C2: setScopedValueCache intrinsic exposes nullptr pre-values to store barriers In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 12:18:22 GMT, Aleksey Shipilev wrote: > See the bug for investigation breadcrumbs. The root cause for failures seen with Shenandoah seem to be as follows. > > The setter (`setScopedValueCache`) intrinsic passes `val_type` of `_gvn.type(arr)`, which is `narrowoop: java/lang/Object *[int:32] (java/lang/Cloneable,java/io/Serializable):NotNull:exact *`, derived from the `argument(0)`, and thus implies non-nullity. > > So when Shenandoah's SATB barrier loads the `pre_val`, it folds the null-check, assuming the `pre_val` is not null, due to `val_type`. This passes `nullptr` to SATB queues or slowpath, and we crash in either queue filtering or barrier code that does not expect nullptrs on SATB paths. The getter (`scopedValueCache`) constructs the `objects_type` explicitly to imply the value can be null. I think we should do the same for setter, since it can hide the "getter" from SATB barrier inside of it. > > Arguably, it is a landmine that GC barriers assume the `val_type` is the type of both stored value and the pre-value read from memory. So the non-null-ness derived for stored value gets used to reason for non-null-ness for pre-value. We can explore the solutions to that generic problem after we plug this leak. Other `access_store_at` uses in C2 intrinsics seem to only operate on thread fields that are not null, so the are not susceptible to this problem. `scopedValueCache` is a notable exception of lazily initialized thread OopHandle accessed from C2. > > I think G1 SATB barriers have the same problem, but I have not tried to reproduce the failure very hard there. (It would, AFAIU, require writing the test which does G1 concurrent marks, not just young GCs.) > > Attn @theRealAph ;) > > Additional testing: > - [x] Linux x86_64 fastdebug, 10+ iterations of `java/lang/ScopedValue/StressStackOverflow.java` with Shenandoah > - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with Shenandoah > - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` with G1 > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` This pull request has now been integrated. Changeset: e8a37b90 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e8a37b90db8dca4dc3653970b2d66d2faf8ef452 Stats: 22 lines in 2 files changed: 10 ins; 8 del; 4 mod 8313248: C2: setScopedValueCache intrinsic exposes nullptr pre-values to store barriers Reviewed-by: thartmann, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/15105 From shade at openjdk.org Fri Aug 4 14:42:30 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 4 Aug 2023 14:42:30 GMT Subject: RFR: 8313676: Amend TestLoadIndexedMismatch test to target intrinsic directly In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 10:43:34 GMT, Aleksey Shipilev wrote: > See the bug for the reasons. Basically, we want to target the intrinsic directly, to avoid the dependence on the JDK code shape. > > Additional testing: > - [x] mainline: test is still sensitive to JDK-8313402 fix > - [x] 17u: test is _now_ sensitive to JDK-8313402 fix Thanks! Any more reviewers, or is this trivial enough to go in? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15136#issuecomment-1665723262 From duke at openjdk.org Fri Aug 4 15:58:02 2023 From: duke at openjdk.org (Tobias Hotz) Date: Fri, 4 Aug 2023 15:58:02 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v4] In-Reply-To: References: Message-ID: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: Exchange and for or in the tests and will get matched to a test_reg_reg, so it was pointless ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14172/files - new: https://git.openjdk.org/jdk/pull/14172/files/18c6f790..af934150 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=02-03 Stats: 19 lines in 1 file changed: 0 ins; 1 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/14172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14172/head:pull/14172 PR: https://git.openjdk.org/jdk/pull/14172 From duke at openjdk.org Fri Aug 4 16:27:50 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Fri, 4 Aug 2023 16:27:50 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v2] In-Reply-To: References: Message-ID: > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Change fsgnj_d(dst, src, src) to fmv_d(dst, src) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14991/files - new: https://git.openjdk.org/jdk/pull/14991/files/ba609de3..1c43b040 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14991/head:pull/14991 PR: https://git.openjdk.org/jdk/pull/14991 From qamai at openjdk.org Fri Aug 4 17:26:32 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 4 Aug 2023 17:26:32 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v2] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 09:45:40 GMT, Tobias Hotz wrote: >> Tobias Hotz has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add IR test >> >> Currently, the peephole only works for branches, not conditional moves. >> - Add assert to verify that the machProj and the test operate on the same register. >> >> Also fix compilation on macos > > The peephole only applies the instruction that would be emitted immediately before the test is the input of the test. If that is the case, it must operate on the same register as the result of the above instruction. We then check the flags of the nodes. > We need all the information cause specific operation require specific flags. For example, if the instruction following the test checks if the value is greater (than zero), the sign and zero flag need to be set and the overflow flag needs to be cleared. Not all instructions (such as add) satisfy this requirement, so in this case we would need to emit the test, but we could omit it if we only check for zero, as that only requires the ZF, which the test instruction sets. > > > I also noticed another problem during the construction of the IR tests: Test instructions before conditional moves are currently not removed. This is due to the Matcher thinking that it needs to load a zero into one register when it emits a setb instruction. This is not the case, but the matcher does not know this and 1) emits a pointless register clear and 2) still has the loadConI0 in its graph, which causes the test peephole to bail out as its input is not the preceeding instruction. > I think the removal of the loadConI0 in this case is a topic for another PR though. Fixing this will result in the peephole also working for setb instructions, which would be a huge win. > > ![idealgraphvisualizer64_c1r2gDJIi7](https://github.com/openjdk/jdk/assets/20151702/937d779e-1090-40e0-9128-dc172199250f) @ichttt The register clear via `xor` is mandatory since `setb` only sets the lowest byte. We can be more clever and do more complex transformations such as relaxing consecutiveness requirement, reordering nodes, converting a `xor` into a `mov`, etc but I think it can be achieved in another patch and this patch covers the most prominent appearances of redundant `test`. Thanks a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1665949667 From duke at openjdk.org Fri Aug 4 18:05:24 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 18:05:24 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v13] In-Reply-To: References: <2OFyLMeiiFJdZDD-BKUcaW4lfaeis9VG4jALBQlYbOc=.f3ccd800-05e4-4dec-8980-ddf3392e1cc9@github.com> Message-ID: On Thu, 3 Aug 2023 22:06:04 GMT, Andrew Haley wrote: >> Hi Andrew, the reason for the public API is to make AVX512 sort available to other data structures like MemorySegment (including the ones backed by native heap). The API of the arraySort() AVX512 intrinsic is similar to the public API of ArraysSupport.vectorizedMismatch() which is used by MemorySegment.mismatch(). > > There's no need to make this method public, and it should not be. There is no need to have it in the Java API. `ArraysSupport.vectorizedMismatch()` is in an internal JDK class, not part of the Java API. Sure Andrew. Will make this method private as suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1284707308 From duke at openjdk.org Fri Aug 4 18:27:57 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 18:27:57 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v15] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: fix arraySort API and fastdebug issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/17b51270..a2e14d45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=13-14 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Fri Aug 4 18:27:58 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 18:27:58 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 17:15:45 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/java.base/share/classes/java/util/Arrays.java >> >> Co-authored-by: David Schlosnagle > > src/java.base/share/classes/java/util/Arrays.java line 95: > >> 93: */ >> 94: @IntrinsicCandidate >> 95: public static void arraySort(Class elemType, Object array, long offset, int fromIndex, int toIndex) { > > Does this method need to be public? Method signature was changed to private. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1284723201 From duke at openjdk.org Fri Aug 4 18:30:37 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 18:30:37 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v6] In-Reply-To: <4TFxrQA6h60f8RJZBtehi8_qEanj0xZqveUOqjX3Feo=.1c9d0e49-67aa-4bf5-898d-d79e933e5cef@github.com> References: <4TFxrQA6h60f8RJZBtehi8_qEanj0xZqveUOqjX3Feo=.1c9d0e49-67aa-4bf5-898d-d79e933e5cef@github.com> Message-ID: On Tue, 6 Jun 2023 19:18:40 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> fix license in one file > > src/java.base/share/classes/java/util/Arrays.java line 82: > >> 80: >> 81: @IntrinsicCandidate >> 82: private static void arraySort(int[] array, int fromIndex, int toIndex) { > > A minor styling comment: We can use same all small caps naming convention as used for System.arraycopy. Thanks for the suggestion. For now, we will stick with arraySort as this is a private method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1284726224 From duke at openjdk.org Fri Aug 4 18:42:06 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 18:42:06 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v16] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: moved stubroutines definitions to vmStructs_jvmci.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/a2e14d45..7065f1cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=14-15 Stats: 8 lines in 2 files changed: 4 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Fri Aug 4 18:42:06 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 18:42:06 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 00:14:29 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/java.base/share/classes/java/util/Arrays.java >> >> Co-authored-by: David Schlosnagle > > src/hotspot/share/runtime/vmStructs.cpp line 535: > >> 533: static_field(StubRoutines, _arraysort_long, address) \ >> 534: static_field(StubRoutines, _arraysort_float, address) \ >> 535: static_field(StubRoutines, _arraysort_double, address) \ > > Should this be in hotspot/share/jvmci/vmStructs_jvmci.cpp instead? That's true. Moved it to hotspot/share/jvmci/vmStructs_jvmci.cpp. Thanks for catching this! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1284732789 From duke at openjdk.org Fri Aug 4 19:36:35 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 19:36:35 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 04:06:09 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/java.base/share/classes/java/util/Arrays.java >> >> Co-authored-by: David Schlosnagle > > test/micro/org/openjdk/bench/java/util/ArraysSort.java line 55: > >> 53: @State(Scope.Thread) >> 54: @Warmup(iterations = 3, time=60) >> 55: @Measurement(iterations = 3, time=120) > > Warmup/measurement time could be reduced in the jmh micro to 2s/5s. Warmup/measurement time of 2s/5s works well for array size <= 10,000 is not giving sufficient time to warmup for size>=100,000. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1284782800 From lmesnik at openjdk.org Fri Aug 4 19:45:56 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 4 Aug 2023 19:45:56 GMT Subject: RFR: 8307462: [REDO] VmObjectAlloc is not generated by intrinsics methods which allocate objects [v3] In-Reply-To: References: Message-ID: <9_aaKm7wC_9cQH2qHKbb5myvx7roacZJVzWRLZ8NdQM=.6dc482e0-c215-4a1b-b84a-d819f0ee3979@github.com> > The fix adds posting VmObjectAlloc events by Unsafe.allocateInstance(Class cls). The previous attempt to post event directly from 'LibraryCallKit::inline_unsafe_allocate()' cause performance regression even if jvmti event is not enabled. Some optimizations have been disabled just because possible usage and escaping of newly allocated object. > So event posting is doing by returning to interpreter if events are enabled. > > I verified that that performance (run locally only) of > org.renaissance.jdk.streams.JmhScrabble.runOperation > doesn't change if events are not enabled. > > There might be other intrinsics like 'LibraryCallKit::inline_unsafe_newArray()' where VM allocate memory. I'm going to file separate issue to find and fix them. > > Many thanks to Tobias H. for proposed solution. > > Testing with all tiers. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: The too many deopts check should be first. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15110/files - new: https://git.openjdk.org/jdk/pull/15110/files/5095114f..64871b91 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15110&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15110&range=01-02 Stats: 13 lines in 1 file changed: 7 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15110/head:pull/15110 PR: https://git.openjdk.org/jdk/pull/15110 From kvn at openjdk.org Fri Aug 4 20:13:32 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 4 Aug 2023 20:13:32 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v2] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 06:31:46 GMT, Pengfei Li wrote: > A solution to this may be adding one more check of `CICompileOSR` is OFF if we still want to use interpreted execution for the reference result. I would suggest to use `WB.setBooleanVMFlag("CICompileOSR", false);`. But it is debug flag which can be set only in debug VM. There are may be other product flags you can temporary set to avoid compilation without locking. > > Now the question is, which verification approach do you think is better? "C2 vs. interpreted" or "C2 vs. C1"? We usually use Interpreter as gold standard. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1666121398 From sspitsyn at openjdk.org Fri Aug 4 20:39:30 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 4 Aug 2023 20:39:30 GMT Subject: RFR: 8307462: [REDO] VmObjectAlloc is not generated by intrinsics methods which allocate objects [v3] In-Reply-To: <9_aaKm7wC_9cQH2qHKbb5myvx7roacZJVzWRLZ8NdQM=.6dc482e0-c215-4a1b-b84a-d819f0ee3979@github.com> References: <9_aaKm7wC_9cQH2qHKbb5myvx7roacZJVzWRLZ8NdQM=.6dc482e0-c215-4a1b-b84a-d819f0ee3979@github.com> Message-ID: On Fri, 4 Aug 2023 19:45:56 GMT, Leonid Mesnik wrote: >> The fix adds posting VmObjectAlloc events by Unsafe.allocateInstance(Class cls). The previous attempt to post event directly from 'LibraryCallKit::inline_unsafe_allocate()' cause performance regression even if jvmti event is not enabled. Some optimizations have been disabled just because possible usage and escaping of newly allocated object. >> So event posting is doing by returning to interpreter if events are enabled. >> >> I verified that that performance (run locally only) of >> org.renaissance.jdk.streams.JmhScrabble.runOperation >> doesn't change if events are not enabled. >> >> There might be other intrinsics like 'LibraryCallKit::inline_unsafe_newArray()' where VM allocate memory. I'm going to file separate issue to find and fix them. >> >> Many thanks to Tobias H. for proposed solution. >> >> Testing with all tiers. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > The too many deopts check should be first. This looks okay to me. It needs to be reviewed by someone from the compiler team. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15110#pullrequestreview-1563532886 From duke at openjdk.org Fri Aug 4 22:34:50 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 22:34:50 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v17] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update avx512 sort, benchmarks, shenandoahSupport ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/7065f1cf..37f3c527 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=15-16 Stats: 525 lines in 8 files changed: 52 ins; 469 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Fri Aug 4 22:34:50 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 22:34:50 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 23:38:12 GMT, Sandhya Viswanathan wrote: > Also need to handle arraySort in file: share/gc/shenandoah/c2/shenandoahSupport.cpp, function: ShenandoahBarrierC2Support::verify around line 3000. Updated the code in ShenandoahBarrierC2Support as suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1666237899 From duke at openjdk.org Fri Aug 4 22:34:51 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 22:34:51 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 22:28:48 GMT, Srinivas Vamsi Parasa wrote: >> Also need to handle arraySort in file: share/gc/shenandoah/c2/shenandoahSupport.cpp, function: ShenandoahBarrierC2Support::verify around line 3000. > >> Also need to handle arraySort in file: share/gc/shenandoah/c2/shenandoahSupport.cpp, function: ShenandoahBarrierC2Support::verify around line 3000. > > Updated the code in ShenandoahBarrierC2Support as suggested. > @vamsi-parasa With fastdebug build I see the following error: Internal Error (jdk/src/hotspot/share/opto/escape.cpp:1196), pid=3543536, tid=3543559 fatal error: EA unexpected CallLeaf arraysort_stub > > Please take a look. This was fixed as well. >> test/micro/org/openjdk/bench/java/util/ArraysSort.java line 55: >> >>> 53: @State(Scope.Thread) >>> 54: @Warmup(iterations = 3, time=60) >>> 55: @Measurement(iterations = 3, time=120) >> >> Warmup/measurement time could be reduced in the jmh micro to 2s/5s. > > Warmup/measurement time of 2s/5s works well for array size <= 10,000 is not giving sufficient time to warmup for size>=100,000. Updated the benchmark with different warmup times depending on the size of the array. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1666238364 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1284886655 From duke at openjdk.org Fri Aug 4 22:54:03 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 22:54:03 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v18] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: More avx512 sort cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/37f3c527..e0ffc81d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=16-17 Stats: 258 lines in 2 files changed: 0 ins; 258 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Fri Aug 4 23:19:54 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Aug 2023 23:19:54 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v19] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Change name from libavx512_x86_64 to libx86_64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/e0ffc81d..13f4aaf4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=17-18 Stats: 14 lines in 8 files changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From dlong at openjdk.org Sat Aug 5 00:21:33 2023 From: dlong at openjdk.org (Dean Long) Date: Sat, 5 Aug 2023 00:21:33 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v4] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 15:58:02 GMT, Tobias Hotz wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Exchange and for or in the tests > > and will get matched to a test_reg_reg, so it was pointless Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14172#pullrequestreview-1563679993 From dlong at openjdk.org Sat Aug 5 00:21:34 2023 From: dlong at openjdk.org (Dean Long) Date: Sat, 5 Aug 2023 00:21:34 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v2] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 09:45:40 GMT, Tobias Hotz wrote: > For example, if the instruction following the test checks if the value is greater (than zero), the sign and zero flag need to be set and the overflow flag needs to be cleared. Not all instructions (such as add) satisfy this requirement, so in this case we would need to emit the test, but we could omit it if we only check for zero, as that only requires the ZF, which the test instruction sets. OK, I see your concern about ADD. Rather than reject it completely, you allow it depending on the kind of compare, which requires tracking individual flags. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1666297192 From ysuenaga at openjdk.org Sun Aug 6 09:14:28 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Sun, 6 Aug 2023 09:14:28 GMT Subject: RFR: 8313406: nep_invoker_blob can be simplified more In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 12:22:00 GMT, Yasumasa Suenaga wrote: > In FFM, native function would be called via `nep_invoker_blob`. If the function has two arguments, it would be following: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fcae394cd10 > -------------------------------------------------------------------------------- > 0x00007fcae394cd80: pushq %rbp > 0x00007fcae394cd81: movq %rsp, %rbp > 0x00007fcae394cd84: subq $0, %rsp > ;; { argument shuffle > 0x00007fcae394cd88: movq %r8, %rax > 0x00007fcae394cd8b: movq %rsi, %r10 > 0x00007fcae394cd8e: movq %rcx, %rsi > 0x00007fcae394cd91: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fcae394cd94: callq *%r10 > 0x00007fcae394cd97: leave > 0x00007fcae394cd98: retq > > > `subq $0, %rsp` is for shadow space on stack, and `movq %r8, %rax` is number of args for variadic function. So they are not necessary in some case. They should be remove following if they are not needed: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fd8778e2810 > -------------------------------------------------------------------------------- > 0x00007fd8778e2880: pushq %rbp > 0x00007fd8778e2881: movq %rsp, %rbp > ;; { argument shuffle > 0x00007fd8778e2884: movq %rsi, %r10 > 0x00007fd8778e2887: movq %rcx, %rsi > 0x00007fd8778e288a: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fd8778e288d: callq *%r10 > 0x00007fd8778e2890: leave > 0x00007fd8778e2891: retq > > > All java/foreign jtreg tests are passed. > > We can see these stub code on [ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/examples/cpumodel) with `-XX:+UnlockDiagnosticVMOptions -XX:+PrintStubCode` and hsdis library. This testcase linked the code with `Linker.Option.isTrivial()`. > > After this change, FFM performance on [another ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/benchmarks/funccall) was improved: > > before: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 106664071.816 ? 14396524.718 ops/s > FuncCallComparison.rdtsc thrpt 3 108024079.738 ? 13223921.011 ops/s > > > after: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 107622971.525 ? 12249767.134 ops/s > FuncCallComparison.rdtsc thrpt 3 107695741.608 ? 23983281.346 ops/s > > > Environment: > * CPU: AMD Ryzen 3 3300X > * OS: Fedora 38 x86_64 (Kernel 6.3.8-200.fc38.x86_64) > * Hyper-V 4vCPU, 8GB mem Can I get second reviewer? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15089#issuecomment-1666785232 From jvernee at openjdk.org Mon Aug 7 06:50:37 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 7 Aug 2023 06:50:37 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v4] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 15:58:02 GMT, Tobias Hotz wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Exchange and for or in the tests > > and will get matched to a test_reg_reg, so it was pointless Thanks for the updated version. (Sorry that it took me a while to get back to this) Left a few small inline comments. What testing did you run on this? (I'll also run tier 1-4 in our CI) src/hotspot/cpu/x86/peephole_x86_64.cpp line 197: > 195: juint required_flags = 0; > 196: // Search for the uses of the node and compute which flags are required > 197: for (DUIterator i = test_to_check->outs(); test_to_check->has_out(i); i++) { AFAICS `fast_outs` can be used here, since you don't modify the `_opnds`. src/hotspot/cpu/x86/x86.ad line 1268: > 1266: Flag_clears_overflow_flag = Node::_last_flag << 10, > 1267: Flag_clears_sign_flag = Node::_last_flag << 11, > 1268: _last_flag = Flag_clears_sign_flag I think adding the flags here is good. If the number of flags becomes a problem, we could instead generate virtual methods on all the nodes to return the flag mask. src/hotspot/cpu/x86/x86_64.ad line 7679: > 7677: effect(KILL cr); > 7678: flag(PD::Flag_sets_overflow_flag, PD::Flag_sets_sign_flag, PD::Flag_sets_zero_flag, PD::Flag_sets_carry_flag, PD::Flag_sets_parity_flag); > 7679: Please remove these extra blank lines. On `addI_rReg_mem`, `addI_mem_rReg`, and `blsrL_rReg_rReg`. src/hotspot/share/adlc/adlparse.cpp line 232: > 230: else if (!strcmp(ident, "size")) instr->_size = size_parse(instr); > 231: else if (!strcmp(ident, "effect")) effect_parse(instr); > 232: else if (!strcmp(ident, "flag")) instr->_flag = flag_parse(instr); Suggestion: else if (!strcmp(ident, "size")) instr->_size = size_parse(instr); else if (!strcmp(ident, "effect")) effect_parse(instr); else if (!strcmp(ident, "flag")) instr->_flag = flag_parse(instr); src/hotspot/share/adlc/output_c.cpp line 4019: > 4017: if (inst->_flag != nullptr) { > 4018: Flag* node = inst->_flag; > 4019: const char* prefix = "Node::"; You could potentially make the prefix here `Node::PD::`, then the extra `PD::` could be removed from the .ad file (I don't think it really adds much?). src/hotspot/share/adlc/output_c.cpp line 4023: > 4021: do { > 4022: if (!node_flags_set) { > 4023: fprintf(fp_cpp, "%s node->add_flag(%s%s", indent, strncmp(node->_name, prefix, strlen(prefix)) != 0 ? prefix : "", node->_name); This seems to be guarding against a case where the flag is declared with the prefix already in the .ad file. Is this required for something? (Otherwise I suggest just using `node->_name` here, as it forces the flag declarations in the .ad file to be consistent). test/hotspot/jtreg/compiler/c2/irTests/TestTestRemovalPeephole.java line 33: > 31: /* > 32: * @test > 33: * @summary Test that patterns leading to Conv2B are correctly expanded. Summary seems to be incorrect. ------------- PR Review: https://git.openjdk.org/jdk/pull/14172#pullrequestreview-1564605775 PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285432374 PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285420960 PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285421697 PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285413991 PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285419421 PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285420190 PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285438292 From chagedorn at openjdk.org Mon Aug 7 09:17:51 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Aug 2023 09:17:51 GMT Subject: RFR: 8305636: Expand and clean up predicate classes and move them into separate files [v4] In-Reply-To: References: Message-ID: <4N9cXrbYycEO6CFkhAgkElGGR47Q-M88Bl83r7DyOZY=.e3d8b2e4-d728-4602-b7d5-4f29272871d4@github.com> On Thu, 3 Aug 2023 08:21:57 GMT, Christian Hagedorn wrote: >> This is the third clean-up PR towards fixing issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch does not change anything in the way the old Assertion Predicates work. >> >> After collecting and moving the predicate code in the last clean-up PR https://github.com/openjdk/jdk/pull/14017 to the classes `Predicates/ParsePredicates`, I'm now completely moving the code to separate `predicates.cpp/hpp` files. By doing so, I also updated the predicate description and updated some namings. Since this description is also moved to the new files, I've committed the description update separately to better reflect these changes. >> >> Changes include: >> - Moved `Predicates/ParsePredicates` classes to new files `predicates.cpp/hpp`. >> - Turning the `Predicates` utility class into a real class to represent all predicates: >> - Contains three `PredicateBlock` fields for each Predicate Block (see description of `Predicate Block`). >> - The `PredicateBlock` class offers methods to query the presence of predicates and to access them (e.g. get the Parse Predicate projection). >> - In the process, the `ParsePredicates` could be removed as the Parse Predicates are now covered by the `PredicateBlock` class. >> - New `AssertionPredicatesWithHalt` class to skip over Assertion Predicates (will be further cleaned up later with the complete fix in JDK-8288981). >> - Updated predicate description and moved to `predicates.hpp`. >> - While testing a prototype fix of JDK-8288981, I've came to the conclusion that we should not move all Assertion Predicates to a separate block below the Parse and Hoisted Predicates, because it prevented further application of Loop Predication due to pins of data nodes to these Assertion Predicates while the Hoisted Predicates needed them above the Assertion Predicates (i.e. dominance problems leading to bad graph assertions). I've removed that part of the description that gave a heads-up about that change. >> - Small clean-ups such as variable renaming or code move. >> >> Not included: >> - Refactoring predicate traversal to clone/copy/initialize predicates for loop unswitching, pre/main/post, loop peeling etc. (this is only done in the actual fix in JDK-8288981 which requires some updates anyways - so this refactoring is not done here (yet)). >> >> Testing: Tier1-7, hs-precheckin-comp, hs-comp-stress >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8305636 > - Update src/hotspot/share/opto/predicates.hpp > > Co-authored-by: Tobias Hartmann > - Renaming Hoisted Predicate -> Hoisted Check Predicate in description and comments as discussed offline with Tobias, fixing additional typos in description > - 8305636: Expand and clean up predicate classes and move them into separate files > - Update description Additional testing after merging with master looked good. Thanks @TobiHartmann and @rwestrel again for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14814#issuecomment-1667491798 From chagedorn at openjdk.org Mon Aug 7 09:17:53 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Aug 2023 09:17:53 GMT Subject: Integrated: 8305636: Expand and clean up predicate classes and move them into separate files In-Reply-To: References: Message-ID: On Mon, 10 Jul 2023 15:17:37 GMT, Christian Hagedorn wrote: > This is the third clean-up PR towards fixing issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch does not change anything in the way the old Assertion Predicates work. > > After collecting and moving the predicate code in the last clean-up PR https://github.com/openjdk/jdk/pull/14017 to the classes `Predicates/ParsePredicates`, I'm now completely moving the code to separate `predicates.cpp/hpp` files. By doing so, I also updated the predicate description and updated some namings. Since this description is also moved to the new files, I've committed the description update separately to better reflect these changes. > > Changes include: > - Moved `Predicates/ParsePredicates` classes to new files `predicates.cpp/hpp`. > - Turning the `Predicates` utility class into a real class to represent all predicates: > - Contains three `PredicateBlock` fields for each Predicate Block (see description of `Predicate Block`). > - The `PredicateBlock` class offers methods to query the presence of predicates and to access them (e.g. get the Parse Predicate projection). > - In the process, the `ParsePredicates` could be removed as the Parse Predicates are now covered by the `PredicateBlock` class. > - New `AssertionPredicatesWithHalt` class to skip over Assertion Predicates (will be further cleaned up later with the complete fix in JDK-8288981). > - Updated predicate description and moved to `predicates.hpp`. > - While testing a prototype fix of JDK-8288981, I've came to the conclusion that we should not move all Assertion Predicates to a separate block below the Parse and Hoisted Predicates, because it prevented further application of Loop Predication due to pins of data nodes to these Assertion Predicates while the Hoisted Predicates needed them above the Assertion Predicates (i.e. dominance problems leading to bad graph assertions). I've removed that part of the description that gave a heads-up about that change. > - Small clean-ups such as variable renaming or code move. > > Not included: > - Refactoring predicate traversal to clone/copy/initialize predicates for loop unswitching, pre/main/post, loop peeling etc. (this is only done in the actual fix in JDK-8288981 which requires some updates anyways - so this refactoring is not done here (yet)). > > Testing: Tier1-7, hs-precheckin-comp, hs-comp-stress > > Thanks, > Christian This pull request has now been integrated. Changeset: dc016047 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/dc01604756c22889412f9f25b534488180327317 Stats: 1182 lines in 10 files changed: 566 ins; 470 del; 146 mod 8305636: Expand and clean up predicate classes and move them into separate files Reviewed-by: thartmann, roland ------------- PR: https://git.openjdk.org/jdk/pull/14814 From pli at openjdk.org Mon Aug 7 09:53:43 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 7 Aug 2023 09:53:43 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v3] In-Reply-To: References: Message-ID: > This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. > > To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. > > Tested various of compiler control related VM flags on x86 and AArch64. Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: Revert to the 1st commit and re-address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15011/files - new: https://git.openjdk.org/jdk/pull/15011/files/ac509680..5bb67000 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15011&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15011&range=01-02 Stats: 63 lines in 23 files changed: 17 ins; 9 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/15011.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15011/head:pull/15011 PR: https://git.openjdk.org/jdk/pull/15011 From pli at openjdk.org Mon Aug 7 09:56:31 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 7 Aug 2023 09:56:31 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v2] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 20:10:19 GMT, Vladimir Kozlov wrote: >> Hi @vnkozlov , >> >> Thanks for your reply. But it still has problems. >> >>> About your change to allow -Xbatch. Let me clarify, if you exclude -Xcomp mode (which I agree with) by checking UseInterpreter flag for true, then a method could be always executed in Interpeter to get reference result (even with -XX:CompileThreshold=100) by calling method once first (we do that in other tests). >> >>> You don't need to call WB.lockCompilation() if you exclude -Xcomp mode. There will be no compilation requests for called method when you call the method first time because compilation threshold will not be reached - it is guarantee that method will be executed in Interpreter. And you have the assert to verify that. >> >> These tests are a bit different because we test loops. If the loop iteration count reaches some threshold, the loop will be *OSR compiled* even test method is called only once. I just did an experiment according to your suggestion. After removing `WB.lockCompilation()` and updating loop iteration count to 100,000, I got assertion failure that tells me the test method is NOT running in interpreter. >> >> >> STDERR: >> java.lang.AssertionError >> at compiler.vectorization.runner.VectorizationTestRunner.runTestOnMethod(VectorizationTestRunner.java:131) >> at compiler.vectorization.runner.VectorizationTestRunner.run(VectorizationTestRunner.java:73) >> at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:215) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) >> at java.base/java.lang.reflect.Method.invoke(Method.java:580) >> at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) >> at java.base/java.lang.Thread.run(Thread.java:1570) >> >> >> A solution to this may be adding one more check of `CICompileOSR` is OFF if we still want to use interpreted execution for the reference result. >> >> Now the question is, which verification approach do you think is better? "C2 vs. interpreted" or "C2 vs. C1"? > >> A solution to this may be adding one more check of `CICompileOSR` is OFF if we still want to use interpreted execution for the reference result. > > I would suggest to use `WB.setBooleanVMFlag("CICompileOSR", false);`. But it is debug flag which can be set only in debug VM. There are may be other product flags you can temporary set to avoid compilation without locking. > >> >> Now the question is, which verification approach do you think is better? "C2 vs. interpreted" or "C2 vs. C1"? > > We usually use Interpreter as gold standard. Hi @vnkozlov , Thanks for your comments. I have reverted the patch to my 1st commit and re-addressed your comments. > I would suggest to use WB.setBooleanVMFlag("CICompileOSR", false);. But it is debug flag which can be set only in debug VM. There are may be other product flags you can temporary set to avoid compilation without locking. In my new commit, I choose to set and restore `UseCompiler` before and after the interpreter run. I have re-tested various of compiler control options and no jtreg timeout is seen now. Please let me know if this looks good to you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1667552388 From shade at openjdk.org Mon Aug 7 10:23:30 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Aug 2023 10:23:30 GMT Subject: RFR: 8313676: Amend TestLoadIndexedMismatch test to target intrinsic directly In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 14:39:40 GMT, Aleksey Shipilev wrote: > Thanks! Any more reviewers, or is this trivial enough to go in? Ping? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15136#issuecomment-1667592462 From rehn at openjdk.org Mon Aug 7 10:57:57 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 7 Aug 2023 10:57:57 GMT Subject: RFR: 8295795: hsdis does not build with binutils 2.39+ [v3] In-Reply-To: References: Message-ID: > Hi please consider. > > This works with 2.30, 2.34, 2.38, 2.39, 2.40, 2.41 and current master head. (tested x64 and some RV) > > There are 4 changes in binutils we work around. > - zstd compressed debug sections > - libsframe added > - init_disassemble_info() change > - libbfd.a is only present in .lib directory in newer binutils builds (older it is in both directories) (I think the issue is that we never do make install, thus have dependency on internal artifact placement) > > Specific to RV, there is a bug in binutils causing the standard extensions not being added to disassembler if we pass in NULL. > > This no way near perfect, but at least we can build hsdis with any contemporary binutils. > > Todo better I think we need to build and install binutils to check the version and then use that version to figure out what options to use when re-building and re-installing binutils for hsdis. > > I asked tool-chain people about our issues, they said, you can't do that. I.e. have source dependencies on many binutils versions. > > As RV is new and have new instructions added to it frequently we really need to be able to build with bleeding-edge binutils. (capstone RV is not actively worked on, llvm have many more dependencies) Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Reverted bad change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15138/files - new: https://git.openjdk.org/jdk/pull/15138/files/a8ce2d37..b3c81f88 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15138&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15138&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15138.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15138/head:pull/15138 PR: https://git.openjdk.org/jdk/pull/15138 From chagedorn at openjdk.org Mon Aug 7 11:18:30 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 7 Aug 2023 11:18:30 GMT Subject: RFR: 8313676: Amend TestLoadIndexedMismatch test to target intrinsic directly In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 10:43:34 GMT, Aleksey Shipilev wrote: > See the bug for the reasons. Basically, we want to target the intrinsic directly, to avoid the dependence on the JDK code shape. > > Additional testing: > - [x] mainline: test is still sensitive to JDK-8313402 fix > - [x] 17u: test is _now_ sensitive to JDK-8313402 fix Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15136#pullrequestreview-1565104855 From shade at openjdk.org Mon Aug 7 11:29:36 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Aug 2023 11:29:36 GMT Subject: RFR: 8313676: Amend TestLoadIndexedMismatch test to target intrinsic directly In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 10:43:34 GMT, Aleksey Shipilev wrote: > See the bug for the reasons. Basically, we want to target the intrinsic directly, to avoid the dependence on the JDK code shape. > > Additional testing: > - [x] mainline: test is still sensitive to JDK-8313402 fix > - [x] 17u: test is _now_ sensitive to JDK-8313402 fix Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15136#issuecomment-1667677699 From shade at openjdk.org Mon Aug 7 11:29:37 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Aug 2023 11:29:37 GMT Subject: Integrated: 8313676: Amend TestLoadIndexedMismatch test to target intrinsic directly In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 10:43:34 GMT, Aleksey Shipilev wrote: > See the bug for the reasons. Basically, we want to target the intrinsic directly, to avoid the dependence on the JDK code shape. > > Additional testing: > - [x] mainline: test is still sensitive to JDK-8313402 fix > - [x] 17u: test is _now_ sensitive to JDK-8313402 fix This pull request has now been integrated. Changeset: 4b192a8d Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/4b192a8dc37297f0746c0c68322e0168d9f47771 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod 8313676: Amend TestLoadIndexedMismatch test to target intrinsic directly Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/15136 From duke at openjdk.org Mon Aug 7 11:45:27 2023 From: duke at openjdk.org (Tobias Hotz) Date: Mon, 7 Aug 2023 11:45:27 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v5] In-Reply-To: References: Message-ID: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: Adress review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14172/files - new: https://git.openjdk.org/jdk/pull/14172/files/af934150..aae31d2d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=03-04 Stats: 10 lines in 5 files changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14172/head:pull/14172 PR: https://git.openjdk.org/jdk/pull/14172 From duke at openjdk.org Mon Aug 7 11:45:31 2023 From: duke at openjdk.org (Tobias Hotz) Date: Mon, 7 Aug 2023 11:45:31 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v4] In-Reply-To: References: Message-ID: On Mon, 7 Aug 2023 06:15:16 GMT, Jorn Vernee wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Exchange and for or in the tests >> >> and will get matched to a test_reg_reg, so it was pointless > > src/hotspot/share/adlc/output_c.cpp line 4019: > >> 4017: if (inst->_flag != nullptr) { >> 4018: Flag* node = inst->_flag; >> 4019: const char* prefix = "Node::"; > > You could potentially make the prefix here `Node::PD::`, then the extra `PD::` could be removed from the .ad file (I don't think it really adds much?). Well I thought that it could also be used to add some non-arch specific flags in the future, and with this the keyword would be more generic and allow for this as well. > src/hotspot/share/adlc/output_c.cpp line 4023: > >> 4021: do { >> 4022: if (!node_flags_set) { >> 4023: fprintf(fp_cpp, "%s node->add_flag(%s%s", indent, strncmp(node->_name, prefix, strlen(prefix)) != 0 ? prefix : "", node->_name); > > This seems to be guarding against a case where the flag is declared with the prefix already in the .ad file. Is this required for something? > > (Otherwise I suggest just using `node->_name` here, as it forces the flag declarations in the .ad file to be consistent). No, this is just a leftover from an earlier design, I'll remove the check and always add the prefix ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285750656 PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285749667 From jvernee at openjdk.org Mon Aug 7 13:04:34 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 7 Aug 2023 13:04:34 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v4] In-Reply-To: References: Message-ID: On Mon, 7 Aug 2023 11:39:10 GMT, Tobias Hotz wrote: >> src/hotspot/share/adlc/output_c.cpp line 4019: >> >>> 4017: if (inst->_flag != nullptr) { >>> 4018: Flag* node = inst->_flag; >>> 4019: const char* prefix = "Node::"; >> >> You could potentially make the prefix here `Node::PD::`, then the extra `PD::` could be removed from the .ad file (I don't think it really adds much?). > > Well I thought that it could also be used to add some non-arch specific flags in the future, and with this the keyword would be more generic and allow for this as well. Ok, that sounds good. Keep it then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1285838828 From thartmann at openjdk.org Mon Aug 7 13:21:31 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Aug 2023 13:21:31 GMT Subject: RFR: 8313421: [JVMCI] avoid locking class loader in CompilerToVM.lookupType In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 20:33:49 GMT, Doug Simon wrote: > This PR removes the need to lock the system class loader when converting Class instances for boot and platform classes to ResolvedJavaType objects. Not only is the system class loader a suboptimal loader for resolving these classes but locking it can cause deadlock in some JDK tests (e.g. `test/jdk/java/lang/System/LoggerFinder/`) when run with `-Xcomp`. For example, a thread that holds the system class loader lock and triggers a blocking compilation will deadlock with the compiler thread servicing the compilation if the compilation requires calling `CompilerToVM.lookupType` (which most compilations do). Looks good to me but I'm not an expert in this code. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15128#pullrequestreview-1565314386 From kvn at openjdk.org Mon Aug 7 16:02:32 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 7 Aug 2023 16:02:32 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v3] In-Reply-To: References: Message-ID: On Mon, 7 Aug 2023 09:53:43 GMT, Pengfei Li wrote: >> This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. >> >> To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. >> >> Tested various of compiler control related VM flags on x86 and AArch64. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Revert to the 1st commit and re-address comments This looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15011#pullrequestreview-1565682967 From xliu at openjdk.org Mon Aug 7 18:40:48 2023 From: xliu at openjdk.org (Xin Liu) Date: Mon, 7 Aug 2023 18:40:48 GMT Subject: RFR: 8312420: Integrate Graal's blender micro benchmark In-Reply-To: <3Tt1Oj75h-pOB0gIKdkQIuugSWz0hodGdb7YZmNtZ6g=.f065670a-abc5-4e71-96d5-f935656f66bd@github.com> References: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> <3Tt1Oj75h-pOB0gIKdkQIuugSWz0hodGdb7YZmNtZ6g=.f065670a-abc5-4e71-96d5-f935656f66bd@github.com> Message-ID: <6AlWt5ppfT4phpkHQLZYuiEJtLViZaoOUaBgu9qdiaw=.ab5eca49-873f-4333-a7bb-41bf5d70525b@github.com> On Fri, 21 Jul 2023 07:44:18 GMT, Joshua Cao wrote: >> We would like to integrate Graal's blender micro benchmark from https://www.graalvm.org/22.1/examples/java-performance-examples/. We have been using this benchmark to test our partial escape analysis work (https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2023-July/066670.html). This test can exist independently of the project. >> >> >> example command to run test: >> >> >> make run-test TEST=micro:org.openjdk.bench.vm.compiler.pea.Blender MICRO="FORK=1;OPTIONS=-prof gc -gc true" >> >> >> example output (not complete): >> >> >> Benchmark (iteration) Mode Cnt Score Error Units [29/1913] >> Blender.initialize 1 avgt 227997775.000 ns/op >> Blender.initialize:?gc.alloc.rate 1 avgt 167.192 MB/sec >> Blender.initialize:?gc.alloc.rate.norm 1 avgt 40000081.600 B/op >> Blender.initialize:?gc.count 1 avgt 4.000 counts >> Blender.initialize:?gc.time 1 avgt 65.000 ms >> Blender.initialize 2 avgt 226255767.800 ns/op >> Blender.initialize:?gc.alloc.rate 2 avgt 168.466 MB/sec >> Blender.initialize:?gc.alloc.rate.norm 2 avgt 40000081.600 B/op >> Blender.initialize:?gc.count 2 avgt 4.000 counts >> Blender.initialize:?gc.time 2 avgt 58.000 ms >> Blender.initialize 3 avgt 225596324.600 ns/op >> Blender.initialize:?gc.alloc.rate 3 avgt 168.960 MB/sec >> Blender.initialize:?gc.alloc.rate.norm 3 avgt 40000081.600 B/op >> Blender.initialize:?gc.count 3 avgt 4.000 counts >> Blender.initialize:?gc.time 3 avgt 55.000 ms >> Blender.initialize 4 avgt 224856811.000 ns/op >> Blender.initialize:?gc.alloc.rate 4 avgt 169.520 MB/sec >> Blender.initialize:?gc.alloc.rate.norm 4 avgt 40000081.600 B/op >> Blender.initialize:?gc.count 4 avgt ... > > Can we still merge this into OpenJDK? For example, I can close this PR, leave the JBS issue open, and let someone at Oracle author the patch. Would folks at Oracle want to integrate this benchmark into OpenJDK? hi, @caojoshua, I don't understand the purpose of parameter 'iteration'. Are you trying to differentiate the possibility of the predicate `(color.r + color.g + color.b) % 42 == 0`? All 11 candidate values are all less than 20, so `iteration / 20` is always 0, right? Do we expect to see anything difference in `Blender.initialize:?gc.alloc.rate.norm` if we run it with GraalVM? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14941#issuecomment-1668400563 From btaylor at openjdk.org Mon Aug 7 20:24:01 2023 From: btaylor at openjdk.org (Ben Taylor) Date: Mon, 7 Aug 2023 20:24:01 GMT Subject: RFR: 8312597: Convert TraceTypeProfile to UL Message-ID: This PR adds the output from `-XX:+TraceTypeProfile` to the `jit` and `inlining` tags in unified logging. It also adds minimal tests for `-XX:+TraceTypeProfile` and `-Xlog:jit*=debug`. Change passes tier1 tests. ------------- Commit messages: - 8312597: Convert TraceTypeProfile to UL Changes: https://git.openjdk.org/jdk/pull/15167/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15167&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312597 Stats: 93 lines in 3 files changed: 86 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/15167.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15167/head:pull/15167 PR: https://git.openjdk.org/jdk/pull/15167 From duke at openjdk.org Mon Aug 7 21:01:51 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 7 Aug 2023 21:01:51 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v20] In-Reply-To: References: Message-ID: <76QbMpTJL41HzLBGljF4qze4cGI6JR9hVYvqbnqc2I0=.32b9f770-04e1-4d7f-913e-5e36ff2a96b6@github.com> > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: change names from avx512 to x86_64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/13f4aaf4..c49657ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=18-19 Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From jvernee at openjdk.org Tue Aug 8 08:59:34 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 8 Aug 2023 08:59:34 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v5] In-Reply-To: References: Message-ID: <9JtUlhmYV2VspjO_0iYBl78q2wMfe8X6sjDs7fzxRIg=.8569d1d5-fc89-4885-b6b9-e8c5ac1f5c86@github.com> On Mon, 7 Aug 2023 11:45:27 GMT, Tobias Hotz wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Adress review comments The new test is failing in our CI. The failing cases are: testIntAddtionEquals0(int,int) testIntAddtionNotEquals0(int,int) testIntOrEquals0(int,int) testIntOrGreater0(int,int) testIntOrNotEquals0(int,int) testLongOrGreater0(long,long) Looks like all of them are failing with interfering `loadConI0` or `loadConL0` nodes. e.g. for `testIntAddtionEquals0(int,int)`: RDX 11 addI_rReg === _ 15 16 [[ 12 10 ]] RFLAGS 12 MachProj === 11 [[ ]] #1 RAX 13 loadConI0 === 1 [[ 14 9 ]] #0/0x00000000 RFLAGS 14 MachProj === 13 [[ ]] #1 RFLAGS 10 testI_reg === _ 11 [[ 9 ]] #0/0x00000000 RAX 9 cmovI_imm_01 === _ 10 13 [[ 2 ]] ne#1/0x00000001 !jvms: TestTestRemovalPeephole::testIntAddtionEquals0 @ bci:13 (line 50) The failure occurs in the same way on several platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1669203505 From thartmann at openjdk.org Tue Aug 8 11:00:01 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Aug 2023 11:00:01 GMT Subject: RFR: 8313345: SuperWord fails due to CMove without matching Bool pack Message-ID: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> SuperWord fails after [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302), when trying to convert `Bool + Cmp + CMove` packs into `VectorMaskCmp + VectorBlend` because it does not find the `Bool` (and `Cmp`) packs for a `CMoveD`: After filter_packs packset Pack: 0 align: 0 674 StoreD === 691 695 678 675 [[ 669 672 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=603,364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) align: 8 669 StoreD === 691 674 673 670 [[ 603 606 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) align: 16 603 StoreD === 691 669 607 604 [[ 367 364 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) align: 24 364 StoreD === 691 603 368 428 [[ 695 363 514 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) Pack: 1 align: 0 677 LoadD === 525 695 678 [[ 675 676 676 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=606,367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) align: 8 672 LoadD === 525 674 673 [[ 670 671 671 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) align: 16 606 LoadD === 525 669 607 [[ 604 605 605 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) align: 24 367 LoadD === 525 603 368 [[ 366 366 428 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) Pack: 2 align: 0 675 CMoveD === _ 327 676 677 [[ 674 ]] #double !orig=604,428,[393],278 !jvms: Reproducer2$A::fill @ bci:14 (line 16) align: 8 670 CMoveD === _ 327 671 672 [[ 669 ]] #double !orig=428,[393],278 !jvms: Reproducer2$A::fill @ bci:14 (line 16) align: 16 604 CMoveD === _ 327 605 606 [[ 603 ]] #double !orig=428,[393],278 !jvms: Reproducer2$A::fill @ bci:14 (line 16) align: 24 428 CMoveD === _ 327 366 367 [[ 364 ]] #double !orig=[393],278 !jvms: Reproducer2$A::fill @ bci:14 (line 16) Pack: 3 align: 0 676 MulD === _ 677 677 [[ 675 ]] !orig=605,366,273 !jvms: Reproducer2$A::transform @ bci:2 (line 21) Reproducer2$A::fill @ bci:14 (line 16) align: 8 671 MulD === _ 672 672 [[ 670 ]] !orig=366,273 !jvms: Reproducer2$A::transform @ bci:2 (line 21) Reproducer2$A::fill @ bci:14 (line 16) align: 16 605 MulD === _ 606 606 [[ 604 ]] !orig=366,273 !jvms: Reproducer2$A::transform @ bci:2 (line 21) Reproducer2$A::fill @ bci:14 (line 16) align: 24 366 MulD === _ 367 367 [[ 428 ]] !orig=273 !jvms: Reproducer2$A::transform @ bci:2 (line 21) Reproducer2$A::fill @ bci:14 (line 16) In the failing case, both the `Cmp` and the `Bool` are outside of the loop. I propose to detect this case in `SuperWord::profitable` and simply bail out. This is obviously not a profitability check but the fix for [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302) already added similar checks just above. This should be refactored at some point. @eme64, I think you had plans for that, right? Looking at https://github.com/openjdk/jdk/pull/13493, I noticed the following statement: > From what I understand, we currently never introduce a CMoveF/D, unless asked for by UseCMoveUnconditionally (C->use_c_move()). If the flag is set, we attribute no cost to the CMove, else we take Matcher::float_cmove_cost(), which seems to be ConditionalMoveLimit, and so the Phi is never converted into a CMove. This is not true, because `Matcher::float_cmove_cost()` is `0` on AArch64 and RISCV: https://github.com/openjdk/jdk/blob/055b4b426cbc56d97e82219f3dd3aba1ebf977e4/src/hotspot/cpu/aarch64/matcher_aarch64.hpp#L70-L73 It's true on the other platforms, where we would need to set `-XX:+UseCMoveUnconditionally` to avoid the bailout. I added runs for both cases to the test. As also stated in https://github.com/openjdk/jdk/pull/13493, this only affects `CMoveF` and `CMoveD`. For other `CMove` nodes, we bail out during `filter_packs` with "Unimplemented" because `VectorNode::implemented` -> `VectorNode::opcode` only handles `Op_CMoveF` and `Op_CMoveD`: https://github.com/openjdk/jdk/blob/055b4b426cbc56d97e82219f3dd3aba1ebf977e4/src/hotspot/share/opto/vectornode.cpp#L84-L87 I first tried to bail out in `SuperWord::output` but that does not work because the graph was already modified. We hit an assert in IGVN due to a vector vs. non-vector type mismatch. In general, I don't understand how the `do_reserve_copy` bailouts are supposed to work because we might have already replaced nodes by vector nodes and the `do_reserve_copy` logic does not undo these changes: https://github.com/openjdk/jdk/blob/055b4b426cbc56d97e82219f3dd3aba1ebf977e4/src/hotspot/share/opto/superword.cpp#L2690-L2694 @eme64 I slightly remember that we talked about this before, did you observe a similar issue? Do we have a tracking bug for this broken bailout logic? Big thanks to @SirYwell for the report and test case and to @eme64 for the initial investigation! Thanks, Tobias ------------- Commit messages: - Fix - 8313345: SuperWord fails due to CMove without matching Bool pack Changes: https://git.openjdk.org/jdk/pull/15189/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15189&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313345 Stats: 79 lines in 2 files changed: 79 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15189.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15189/head:pull/15189 PR: https://git.openjdk.org/jdk/pull/15189 From chagedorn at openjdk.org Tue Aug 8 11:10:33 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 8 Aug 2023 11:10:33 GMT Subject: RFR: 8313345: SuperWord fails due to CMove without matching Bool pack In-Reply-To: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> References: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> Message-ID: On Tue, 8 Aug 2023 10:50:19 GMT, Tobias Hartmann wrote: > SuperWord fails after [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302), when trying to convert `Bool + Cmp + CMove` packs into `VectorMaskCmp + VectorBlend` because it does not find the `Bool` (and `Cmp`) packs for a `CMoveD`: > > > After filter_packs > packset > Pack: 0 > align: 0 674 StoreD === 691 695 678 675 [[ 669 672 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=603,364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 8 669 StoreD === 691 674 673 670 [[ 603 606 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 16 603 StoreD === 691 669 607 604 [[ 367 364 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 24 364 StoreD === 691 603 368 428 [[ 695 363 514 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > Pack: 1 > align: 0 677 LoadD === 525 695 678 [[ 675 676 676 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=606,367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 8 672 LoadD === 525 674 673 [[ 670 671 671 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 16 606 LoadD === 525 669 607 [[ 604 605 605 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 24 367 LoadD === 525 603 368 [[ 366 366 428 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > Pack: 2 > align: 0 675 CMoveD === _ 327 676 677 [[ 674 ]] #double !orig=604,428,[393],278 !jvms: Reproducer2$A::fill @ bci:14 (line 16) > align: 8 670 CMoveD === _ 327 671 672 [... That looks reasonable to me. @eme64 should definitely also have a look at this. > This is obviously not a profitability check but the fix for [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302) already added similar checks just above Makes sense to add it there as well. I agree that we should clean this up at some point. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15189#pullrequestreview-1567054348 From thartmann at openjdk.org Tue Aug 8 11:43:31 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Aug 2023 11:43:31 GMT Subject: RFR: 8313345: SuperWord fails due to CMove without matching Bool pack In-Reply-To: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> References: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> Message-ID: On Tue, 8 Aug 2023 10:50:19 GMT, Tobias Hartmann wrote: > SuperWord fails after [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302), when trying to convert `Bool + Cmp + CMove` packs into `VectorMaskCmp + VectorBlend` because it does not find the `Bool` (and `Cmp`) packs for a `CMoveD`: > > > After filter_packs > packset > Pack: 0 > align: 0 674 StoreD === 691 695 678 675 [[ 669 672 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=603,364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 8 669 StoreD === 691 674 673 670 [[ 603 606 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 16 603 StoreD === 691 669 607 604 [[ 367 364 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 24 364 StoreD === 691 603 368 428 [[ 695 363 514 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > Pack: 1 > align: 0 677 LoadD === 525 695 678 [[ 675 676 676 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=606,367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 8 672 LoadD === 525 674 673 [[ 670 671 671 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 16 606 LoadD === 525 669 607 [[ 604 605 605 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 24 367 LoadD === 525 603 368 [[ 366 366 428 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > Pack: 2 > align: 0 675 CMoveD === _ 327 676 677 [[ 674 ]] #double !orig=604,428,[393],278 !jvms: Reproducer2$A::fill @ bci:14 (line 16) > align: 8 670 CMoveD === _ 327 671 672 [... Thanks for the review, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15189#issuecomment-1669453278 From dnsimon at openjdk.org Tue Aug 8 13:54:53 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 8 Aug 2023 13:54:53 GMT Subject: RFR: 8313421: [JVMCI] avoid locking class loader in CompilerToVM.lookupType [v2] In-Reply-To: References: Message-ID: <5y-ZYjEZQrFUVuAvDQTWVGb4hOtBVzzpnWoKmK_-GAY=.a0c10c2e-3a7c-418f-a71c-ed86ecda7eaa@github.com> > This PR removes the need to lock the system class loader when converting Class instances for boot and platform classes to ResolvedJavaType objects. Not only is the system class loader a suboptimal loader for resolving these classes but locking it can cause deadlock in some JDK tests (e.g. `test/jdk/java/lang/System/LoggerFinder/`) when run with `-Xcomp`. For example, a thread that holds the system class loader lock and triggers a blocking compilation will deadlock with the compiler thread servicing the compilation if the compilation requires calling `CompilerToVM.lookupType` (which most compilations do). Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8313421 - avoid locking class loader in CompilerToVM.lookupType (JDK-8313421) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15128/files - new: https://git.openjdk.org/jdk/pull/15128/files/c32899db..86b6489a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15128&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15128&range=00-01 Stats: 17879 lines in 733 files changed: 8386 ins; 4137 del; 5356 mod Patch: https://git.openjdk.org/jdk/pull/15128.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15128/head:pull/15128 PR: https://git.openjdk.org/jdk/pull/15128 From duke at openjdk.org Tue Aug 8 14:02:08 2023 From: duke at openjdk.org (Tobias Hotz) Date: Tue, 8 Aug 2023 14:02:08 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v5] In-Reply-To: References: Message-ID: On Mon, 7 Aug 2023 11:45:27 GMT, Tobias Hotz wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Adress review comments Hmm yeah then it is emitting a cmov, which is currently not working due to reasons outlined above. I've changed the tests so they always produce a branch. Regarding Testing on my own: I tested the previous commit using GHA and ran the tier 1 tests on my linux machine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1669672205 From duke at openjdk.org Tue Aug 8 14:02:08 2023 From: duke at openjdk.org (Tobias Hotz) Date: Tue, 8 Aug 2023 14:02:08 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v6] In-Reply-To: References: Message-ID: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: Add a side effect to the IR tests to make sure we do not emit CMOVs there Without Tiered Compilation, no profile data is present, which means a CMOV would always be emitted. Keep the compiler from doing that, as the peephole currently does not work with CMOV instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14172/files - new: https://git.openjdk.org/jdk/pull/14172/files/aae31d2d..9872e719 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=04-05 Stats: 53 lines in 1 file changed: 41 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/14172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14172/head:pull/14172 PR: https://git.openjdk.org/jdk/pull/14172 From jvernee at openjdk.org Tue Aug 8 14:21:36 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 8 Aug 2023 14:21:36 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v6] In-Reply-To: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> References: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> Message-ID: <9VFq3F8FbcL-Rf3JOrYrJ4j1BqJXDgnqdIY_bM2tAhY=.321bc122-f95c-4e88-9973-4f401d1c2a35@github.com> On Tue, 8 Aug 2023 14:02:08 GMT, Tobias Hotz wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Add a side effect to the IR tests to make sure we do not emit CMOVs there > > Without Tiered Compilation, no profile data is present, which means a CMOV would always be emitted. Keep the compiler from doing that, as the peephole currently does not work with CMOV instructions Ok, thanks. I've submitted another CI run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1669711258 From shade at openjdk.org Tue Aug 8 15:03:32 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 8 Aug 2023 15:03:32 GMT Subject: RFR: 8312597: Convert TraceTypeProfile to UL In-Reply-To: References: Message-ID: <8NKtLo__Trp9eksG2K4kCpM7d73MoA0ws5LYWQEVK_o=.d82a4611-1786-462b-a64e-7907e9b2475b@github.com> On Fri, 4 Aug 2023 20:44:36 GMT, Ben Taylor wrote: > This PR adds the output from `-XX:+TraceTypeProfile` to the `jit` and `inlining` tags in unified logging. It also adds minimal tests for `-XX:+TraceTypeProfile` and `-Xlog:jit*=debug`. > > Change passes tier1 tests. Looks okay to me! Other reviewers might want to take a look. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15167#pullrequestreview-1567519502 From epeter at openjdk.org Tue Aug 8 16:47:32 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Aug 2023 16:47:32 GMT Subject: RFR: 8313345: SuperWord fails due to CMove without matching Bool pack In-Reply-To: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> References: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> Message-ID: On Tue, 8 Aug 2023 10:50:19 GMT, Tobias Hartmann wrote: > SuperWord fails after [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302), when trying to convert `Bool + Cmp + CMove` packs into `VectorMaskCmp + VectorBlend` because it does not find the `Bool` (and `Cmp`) packs for a `CMoveD`: > > > After filter_packs > packset > Pack: 0 > align: 0 674 StoreD === 691 695 678 675 [[ 669 672 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=603,364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 8 669 StoreD === 691 674 673 670 [[ 603 606 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 16 603 StoreD === 691 669 607 604 [[ 367 364 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 24 364 StoreD === 691 603 368 428 [[ 695 363 514 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > Pack: 1 > align: 0 677 LoadD === 525 695 678 [[ 675 676 676 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=606,367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 8 672 LoadD === 525 674 673 [[ 670 671 671 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 16 606 LoadD === 525 669 607 [[ 604 605 605 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 24 367 LoadD === 525 603 368 [[ 366 366 428 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > Pack: 2 > align: 0 675 CMoveD === _ 327 676 677 [[ 674 ]] #double !orig=604,428,[393],278 !jvms: Reproducer2$A::fill @ bci:14 (line 16) > align: 8 670 CMoveD === _ 327 671 672 [... Thanks @TobiHartmann for fixing this. This is exactly what I was planning to do, either reject it in the implementable or profitable check. And yes, we need to disentangle the profitability and all the correctness checks in the future, I plan to take that up after a few other items. About `do_reserve_copy`. The idea is that we make a whole copy of the loop and can swap that back in if there are issues during output. Not sure why that did not work in your case exactly. But my proposal is that we should not do the copy, it is an unnecessary overhead. All correctness and profitability checks are to be run before output. So if any assumption is violated in output, that would be a bug. We could still bail out of compilation, but bailing out of SuperWord would not be possible as the graph is already partially modified. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15189#pullrequestreview-1567736814 From epeter at openjdk.org Tue Aug 8 17:20:01 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Aug 2023 17:20:01 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v26] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 71 commits: - manual merge from master - duplicate rules in VectorLogicalOpIdentityTest.java - Merge branch 'master' into JDK-8310308 - Duplicated =1 counts for vector nodes in compiler/vectorapi/reshape/tests/TestVectorCast.java - Merge branch 'master' into JDK-8310308 - Fix with canTrustVectorSize for Cascade Lake - TestSpillTheBeans.java - print VMInfo from Test VM - merge from master, manual merge for VectorLogicalOpIdentityTest.java - Response to Tobias' review - ... and 61 more: https://git.openjdk.org/jdk/compare/509f80bb...48fa52ba ------------- Changes: https://git.openjdk.org/jdk/pull/14539/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=25 Stats: 3561 lines in 67 files changed: 1494 ins; 21 del; 2046 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From duke at openjdk.org Tue Aug 8 17:23:34 2023 From: duke at openjdk.org (Hannes Greule) Date: Tue, 8 Aug 2023 17:23:34 GMT Subject: RFR: 8313345: SuperWord fails due to CMove without matching Bool pack In-Reply-To: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> References: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> Message-ID: <0-JFWgm7tR1ZUQFRP8Nv65PBz7fVwYz1a_NJ3FC6e_E=.85fbb0b2-0923-4617-86b9-6e6c29399037@github.com> On Tue, 8 Aug 2023 10:50:19 GMT, Tobias Hartmann wrote: > SuperWord fails after [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302), when trying to convert `Bool + Cmp + CMove` packs into `VectorMaskCmp + VectorBlend` because it does not find the `Bool` (and `Cmp`) packs for a `CMoveD`: > > > After filter_packs > packset > Pack: 0 > align: 0 674 StoreD === 691 695 678 675 [[ 669 672 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=603,364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 8 669 StoreD === 691 674 673 670 [[ 603 606 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 16 603 StoreD === 691 669 607 604 [[ 367 364 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 24 364 StoreD === 691 603 368 428 [[ 695 363 514 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > Pack: 1 > align: 0 677 LoadD === 525 695 678 [[ 675 676 676 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=606,367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 8 672 LoadD === 525 674 673 [[ 670 671 671 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 16 606 LoadD === 525 669 607 [[ 604 605 605 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 24 367 LoadD === 525 603 368 [[ 366 366 428 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > Pack: 2 > align: 0 675 CMoveD === _ 327 676 677 [[ 674 ]] #double !orig=604,428,[393],278 !jvms: Reproducer2$A::fill @ bci:14 (line 16) > align: 8 670 CMoveD === _ 327 671 672 [... I can confirm that this fixes the original issue. Thanks! ------------- Marked as reviewed by SirYwell at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/15189#pullrequestreview-1567800487 From epeter at openjdk.org Tue Aug 8 17:47:35 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 8 Aug 2023 17:47:35 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v2] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 20:10:19 GMT, Vladimir Kozlov wrote: >> Hi @vnkozlov , >> >> Thanks for your reply. But it still has problems. >> >>> About your change to allow -Xbatch. Let me clarify, if you exclude -Xcomp mode (which I agree with) by checking UseInterpreter flag for true, then a method could be always executed in Interpeter to get reference result (even with -XX:CompileThreshold=100) by calling method once first (we do that in other tests). >> >>> You don't need to call WB.lockCompilation() if you exclude -Xcomp mode. There will be no compilation requests for called method when you call the method first time because compilation threshold will not be reached - it is guarantee that method will be executed in Interpreter. And you have the assert to verify that. >> >> These tests are a bit different because we test loops. If the loop iteration count reaches some threshold, the loop will be *OSR compiled* even test method is called only once. I just did an experiment according to your suggestion. After removing `WB.lockCompilation()` and updating loop iteration count to 100,000, I got assertion failure that tells me the test method is NOT running in interpreter. >> >> >> STDERR: >> java.lang.AssertionError >> at compiler.vectorization.runner.VectorizationTestRunner.runTestOnMethod(VectorizationTestRunner.java:131) >> at compiler.vectorization.runner.VectorizationTestRunner.run(VectorizationTestRunner.java:73) >> at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:215) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) >> at java.base/java.lang.reflect.Method.invoke(Method.java:580) >> at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) >> at java.base/java.lang.Thread.run(Thread.java:1570) >> >> >> A solution to this may be adding one more check of `CICompileOSR` is OFF if we still want to use interpreted execution for the reference result. >> >> Now the question is, which verification approach do you think is better? "C2 vs. interpreted" or "C2 vs. C1"? > >> A solution to this may be adding one more check of `CICompileOSR` is OFF if we still want to use interpreted execution for the reference result. > > I would suggest to use `WB.setBooleanVMFlag("CICompileOSR", false);`. But it is debug flag which can be set only in debug VM. There are may be other product flags you can temporary set to avoid compilation without locking. > >> >> Now the question is, which verification approach do you think is better? "C2 vs. interpreted" or "C2 vs. C1"? > > We usually use Interpreter as gold standard. @vnkozlov @TobiHartmann we should re-run testing from our side. @pfustc Why do you only test correctness (compare results) in some conditions? Is there not a risk that we miss doing it in some cases we should do it, just because we get the conditions slightly wrong? Just FYI: we should integrate this whole correctness of results testing into the IR framework. I filed [JDK-8310533](https://bugs.openjdk.org/browse/JDK-8310533). That would make it easier to use for new tests. It could also be used for any test, not just the ones located in `test/hotspot/jtreg/compiler/vectorization`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1670043666 From erikj at openjdk.org Tue Aug 8 18:41:34 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Tue, 8 Aug 2023 18:41:34 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: <22KaxTbA_XWZS28f7xM5oEXoZhlNttZvNNNnc66Mi-c=.0e98ec5f-7a47-41f1-b65b-37c0ac81d308@github.com> On Tue, 1 Aug 2023 10:29:06 GMT, Jorn Vernee wrote: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Build change looks ok. ------------- PR Review: https://git.openjdk.org/jdk/pull/15103#pullrequestreview-1567918097 From phh at openjdk.org Wed Aug 9 00:03:30 2023 From: phh at openjdk.org (Paul Hohensee) Date: Wed, 9 Aug 2023 00:03:30 GMT Subject: RFR: 8312597: Convert TraceTypeProfile to UL In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 20:44:36 GMT, Ben Taylor wrote: > This PR adds the output from `-XX:+TraceTypeProfile` to the `jit` and `inlining` tags in unified logging. It also adds minimal tests for `-XX:+TraceTypeProfile` and `-Xlog:jit*=debug`. > > Change passes tier1 tests. Looks fine to me. ------------- Marked as reviewed by phh (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15167#pullrequestreview-1568513893 From lmesnik at openjdk.org Wed Aug 9 03:10:42 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 9 Aug 2023 03:10:42 GMT Subject: RFR: 8312194: test/hotspot/jtreg/applications/ctw/modules/jdk_crypto_ec.java cannot handle empty modules Message-ID: Removed empty module so CTW doesn't fail. ------------- Commit messages: - removed empty module Changes: https://git.openjdk.org/jdk/pull/15201/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15201&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312194 Stats: 39 lines in 2 files changed: 0 ins; 39 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15201/head:pull/15201 PR: https://git.openjdk.org/jdk/pull/15201 From thartmann at openjdk.org Wed Aug 9 05:18:59 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Aug 2023 05:18:59 GMT Subject: RFR: 8313345: SuperWord fails due to CMove without matching Bool pack In-Reply-To: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> References: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> Message-ID: <4genZKVllK49kDDJ7zLPRH6Aedabmafa4LJprfxQ4YY=.a7e4a536-54c7-45a4-9327-a071045186ca@github.com> On Tue, 8 Aug 2023 10:50:19 GMT, Tobias Hartmann wrote: > SuperWord fails after [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302), when trying to convert `Bool + Cmp + CMove` packs into `VectorMaskCmp + VectorBlend` because it does not find the `Bool` (and `Cmp`) packs for a `CMoveD`: > > > After filter_packs > packset > Pack: 0 > align: 0 674 StoreD === 691 695 678 675 [[ 669 672 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=603,364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 8 669 StoreD === 691 674 673 670 [[ 603 606 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 16 603 StoreD === 691 669 607 604 [[ 367 364 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 24 364 StoreD === 691 603 368 428 [[ 695 363 514 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > Pack: 1 > align: 0 677 LoadD === 525 695 678 [[ 675 676 676 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=606,367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 8 672 LoadD === 525 674 673 [[ 670 671 671 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 16 606 LoadD === 525 669 607 [[ 604 605 605 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 24 367 LoadD === 525 603 368 [[ 366 366 428 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > Pack: 2 > align: 0 675 CMoveD === _ 327 676 677 [[ 674 ]] #double !orig=604,428,[393],278 !jvms: Reproducer2$A::fill @ bci:14 (line 16) > align: 8 670 CMoveD === _ 327 671 672 [... Emanuel, thanks for the review and the details. Your proposal sounds great. Hannes, thanks for confirming that the fix resolves the issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15189#issuecomment-1670682114 From thartmann at openjdk.org Wed Aug 9 05:19:01 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Aug 2023 05:19:01 GMT Subject: Integrated: 8313345: SuperWord fails due to CMove without matching Bool pack In-Reply-To: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> References: <-wxTbM1_ST7Nh5NxKFlbQRkOV1RyVAST_ejnXA6sYE8=.dce6359c-5b82-43ad-a6ce-70d1de73e296@github.com> Message-ID: On Tue, 8 Aug 2023 10:50:19 GMT, Tobias Hartmann wrote: > SuperWord fails after [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302), when trying to convert `Bool + Cmp + CMove` packs into `VectorMaskCmp + VectorBlend` because it does not find the `Bool` (and `Cmp`) packs for a `CMoveD`: > > > After filter_packs > packset > Pack: 0 > align: 0 674 StoreD === 691 695 678 675 [[ 669 672 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=603,364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 8 669 StoreD === 691 674 673 670 [[ 603 606 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 16 603 StoreD === 691 669 607 604 [[ 367 364 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=364,300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > align: 24 364 StoreD === 691 603 368 428 [[ 695 363 514 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; Memory: @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=8; !orig=300 !jvms: Reproducer2$A::fill @ bci:17 (line 16) > Pack: 1 > align: 0 677 LoadD === 525 695 678 [[ 675 676 676 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=606,367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 8 672 LoadD === 525 674 673 [[ 670 671 671 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 16 606 LoadD === 525 669 607 [[ 604 605 605 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=367,193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > align: 24 367 LoadD === 525 603 368 [[ 366 366 428 ]] @double[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #double !orig=193 !jvms: Reproducer2$A::fill @ bci:13 (line 16) > Pack: 2 > align: 0 675 CMoveD === _ 327 676 677 [[ 674 ]] #double !orig=604,428,[393],278 !jvms: Reproducer2$A::fill @ bci:14 (line 16) > align: 8 670 CMoveD === _ 327 671 672 [... This pull request has now been integrated. Changeset: d3b578f1 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/d3b578f1c9d296ce8f99c70069df886e9f2dbef9 Stats: 79 lines in 2 files changed: 79 ins; 0 del; 0 mod 8313345: SuperWord fails due to CMove without matching Bool pack Co-authored-by: Emanuel Peter Co-authored-by: Hannes Greule Reviewed-by: chagedorn, epeter, hgreule ------------- PR: https://git.openjdk.org/jdk/pull/15189 From thartmann at openjdk.org Wed Aug 9 05:35:52 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Aug 2023 05:35:52 GMT Subject: [jdk21] RFR: 8313345: SuperWord fails due to CMove without matching Bool pack Message-ID: Backport of [JDK-8313345](https://bugs.openjdk.java.net/browse/JDK-8313345). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8313345: SuperWord fails due to CMove without matching Bool pack Changes: https://git.openjdk.org/jdk21/pull/168/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=168&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313345 Stats: 79 lines in 2 files changed: 79 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk21/pull/168.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/168/head:pull/168 PR: https://git.openjdk.org/jdk21/pull/168 From thartmann at openjdk.org Wed Aug 9 05:36:31 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Aug 2023 05:36:31 GMT Subject: RFR: 8307462: [REDO] VmObjectAlloc is not generated by intrinsics methods which allocate objects [v3] In-Reply-To: <9_aaKm7wC_9cQH2qHKbb5myvx7roacZJVzWRLZ8NdQM=.6dc482e0-c215-4a1b-b84a-d819f0ee3979@github.com> References: <9_aaKm7wC_9cQH2qHKbb5myvx7roacZJVzWRLZ8NdQM=.6dc482e0-c215-4a1b-b84a-d819f0ee3979@github.com> Message-ID: On Fri, 4 Aug 2023 19:45:56 GMT, Leonid Mesnik wrote: >> The fix adds posting VmObjectAlloc events by Unsafe.allocateInstance(Class cls). The previous attempt to post event directly from 'LibraryCallKit::inline_unsafe_allocate()' cause performance regression even if jvmti event is not enabled. Some optimizations have been disabled just because possible usage and escaping of newly allocated object. >> So event posting is doing by returning to interpreter if events are enabled. >> >> I verified that that performance (run locally only) of >> org.renaissance.jdk.streams.JmhScrabble.runOperation >> doesn't change if events are not enabled. >> >> There might be other intrinsics like 'LibraryCallKit::inline_unsafe_newArray()' where VM allocate memory. I'm going to file separate issue to find and fix them. >> >> Many thanks to Tobias H. for proposed solution. >> >> Testing with all tiers. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > The too many deopts check should be first. Looks good to me. src/hotspot/share/opto/library_call.cpp line 2845: > 2843: } > 2844: if (stopped()) > 2845: return true; Suggestion: if (stopped()) { return true; } ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15110#pullrequestreview-1568773710 PR Review Comment: https://git.openjdk.org/jdk/pull/15110#discussion_r1287968367 From chagedorn at openjdk.org Wed Aug 9 06:04:40 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 9 Aug 2023 06:04:40 GMT Subject: [jdk21] RFR: 8313345: SuperWord fails due to CMove without matching Bool pack In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 05:28:42 GMT, Tobias Hartmann wrote: > Backport of [JDK-8313345](https://bugs.openjdk.java.net/browse/JDK-8313345). Applies cleanly. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/168#pullrequestreview-1568803049 From lmesnik at openjdk.org Wed Aug 9 06:17:54 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 9 Aug 2023 06:17:54 GMT Subject: RFR: 8307462: [REDO] VmObjectAlloc is not generated by intrinsics methods which allocate objects [v4] In-Reply-To: References: Message-ID: > The fix adds posting VmObjectAlloc events by Unsafe.allocateInstance(Class cls). The previous attempt to post event directly from 'LibraryCallKit::inline_unsafe_allocate()' cause performance regression even if jvmti event is not enabled. Some optimizations have been disabled just because possible usage and escaping of newly allocated object. > So event posting is doing by returning to interpreter if events are enabled. > > I verified that that performance (run locally only) of > org.renaissance.jdk.streams.JmhScrabble.runOperation > doesn't change if events are not enabled. > > There might be other intrinsics like 'LibraryCallKit::inline_unsafe_newArray()' where VM allocate memory. I'm going to file separate issue to find and fix them. > > Many thanks to Tobias H. for proposed solution. > > Testing with all tiers. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: added braces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15110/files - new: https://git.openjdk.org/jdk/pull/15110/files/64871b91..4f52aa79 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15110&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15110&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15110/head:pull/15110 PR: https://git.openjdk.org/jdk/pull/15110 From thartmann at openjdk.org Wed Aug 9 06:21:30 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Aug 2023 06:21:30 GMT Subject: [jdk21] RFR: 8313345: SuperWord fails due to CMove without matching Bool pack In-Reply-To: References: Message-ID: <0UzVCvsGKNPNdK0EonZRIn6G7MUMjo0kLxuRGj9oLGc=.cfa7f528-810e-43c3-99d4-605df411161b@github.com> On Wed, 9 Aug 2023 05:28:42 GMT, Tobias Hartmann wrote: > Backport of [JDK-8313345](https://bugs.openjdk.java.net/browse/JDK-8313345). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/168#issuecomment-1670737888 From lmesnik at openjdk.org Wed Aug 9 06:32:41 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 9 Aug 2023 06:32:41 GMT Subject: RFR: 8307462: [REDO] VmObjectAlloc is not generated by intrinsics methods which allocate objects [v3] In-Reply-To: References: <9_aaKm7wC_9cQH2qHKbb5myvx7roacZJVzWRLZ8NdQM=.6dc482e0-c215-4a1b-b84a-d819f0ee3979@github.com> Message-ID: On Fri, 4 Aug 2023 20:37:00 GMT, Serguei Spitsyn wrote: >> Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: >> >> The too many deopts check should be first. > > This looks okay to me. > It needs to be reviewed by someone from the compiler team. > Thanks, > Serguei @sspitsyn, @TobiHartmann Thank you for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15110#issuecomment-1670746545 From lmesnik at openjdk.org Wed Aug 9 06:32:43 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 9 Aug 2023 06:32:43 GMT Subject: Integrated: 8307462: [REDO] VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 19:49:51 GMT, Leonid Mesnik wrote: > The fix adds posting VmObjectAlloc events by Unsafe.allocateInstance(Class cls). The previous attempt to post event directly from 'LibraryCallKit::inline_unsafe_allocate()' cause performance regression even if jvmti event is not enabled. Some optimizations have been disabled just because possible usage and escaping of newly allocated object. > So event posting is doing by returning to interpreter if events are enabled. > > I verified that that performance (run locally only) of > org.renaissance.jdk.streams.JmhScrabble.runOperation > doesn't change if events are not enabled. > > There might be other intrinsics like 'LibraryCallKit::inline_unsafe_newArray()' where VM allocate memory. I'm going to file separate issue to find and fix them. > > Many thanks to Tobias H. for proposed solution. > > Testing with all tiers. This pull request has now been integrated. Changeset: 3fb4805b Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/3fb4805b1ad6d66924fd961f62126a91d188abab Stats: 37 lines in 6 files changed: 33 ins; 4 del; 0 mod 8307462: [REDO] VmObjectAlloc is not generated by intrinsics methods which allocate objects Reviewed-by: sspitsyn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/15110 From thartmann at openjdk.org Wed Aug 9 08:07:29 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Aug 2023 08:07:29 GMT Subject: [jdk21] Integrated: 8313345: SuperWord fails due to CMove without matching Bool pack In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 05:28:42 GMT, Tobias Hartmann wrote: > Backport of [JDK-8313345](https://bugs.openjdk.java.net/browse/JDK-8313345). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 01a5df68 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/01a5df689b675d69b4360ab87840c815cace9c9c Stats: 79 lines in 2 files changed: 79 ins; 0 del; 0 mod 8313345: SuperWord fails due to CMove without matching Bool pack Reviewed-by: chagedorn Backport-of: d3b578f1c9d296ce8f99c70069df886e9f2dbef9 ------------- PR: https://git.openjdk.org/jdk21/pull/168 From dlong at openjdk.org Wed Aug 9 09:08:59 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 9 Aug 2023 09:08:59 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v2] In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 07:12:08 GMT, Yudi Zheng wrote: >> This PR exports `vmIntrinsic::is_intrinsic_available`, `Compiler::is_intrinsic_supported`, and `C2Compiler::is_intrinsic_supported` results to JVMCI compiler. This allows JVMCI compiler to comply with `-XX:DisableIntrinsic`, `-XX:ControlIntrinsic`, `-XX:-UseXXXIntrinsic`, and is essential for running test that depends on these flags, e.g., `java/lang/Float/Binary16ConversionNaN` that returns different result in the interpreter with `-XX:DisableIntrinsic=_float16ToFloat,_floatToFloat16`. >> This PR also attempts to fix some of the `is_intrinsic_available` results. Please see the inlined comments. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > update is_intrinsic_supported for _dcopySign,_fcopySign. I don't having the same logic in two places, because then those two places need to be kept in sync. Either the stubs should be generated based on is_intrinsic_supported(), or is_intrinsic_supported() should check if the stub was generated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15133#issuecomment-1670921720 From pli at openjdk.org Wed Aug 9 09:13:28 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 9 Aug 2023 09:13:28 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v4] In-Reply-To: References: Message-ID: > This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. > > To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. > > Tested various of compiler control related VM flags on x86 and AArch64. Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: Remove useless conditions and imports ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15011/files - new: https://git.openjdk.org/jdk/pull/15011/files/5bb67000..00d48cc8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15011&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15011&range=02-03 Stats: 27 lines in 1 file changed: 3 ins; 19 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/15011.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15011/head:pull/15011 PR: https://git.openjdk.org/jdk/pull/15011 From pli at openjdk.org Wed Aug 9 09:36:04 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 9 Aug 2023 09:36:04 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v2] In-Reply-To: References: Message-ID: On Tue, 8 Aug 2023 17:44:46 GMT, Emanuel Peter wrote: >>> A solution to this may be adding one more check of `CICompileOSR` is OFF if we still want to use interpreted execution for the reference result. >> >> I would suggest to use `WB.setBooleanVMFlag("CICompileOSR", false);`. But it is debug flag which can be set only in debug VM. There are may be other product flags you can temporary set to avoid compilation without locking. >> >>> >>> Now the question is, which verification approach do you think is better? "C2 vs. interpreted" or "C2 vs. C1"? >> >> We usually use Interpreter as gold standard. > > @vnkozlov @TobiHartmann we should re-run testing from our side. > > @pfustc Why do you only test correctness (compare results) in some conditions? Is there not a risk that we miss doing it in some cases we should do it, just because we get the conditions slightly wrong? > > Just FYI: we should integrate this whole correctness of results testing into the IR framework. I filed [JDK-8310533](https://bugs.openjdk.org/browse/JDK-8310533). That would make it easier to use for new tests. It could also be used for any test, not just the ones located in `test/hotspot/jtreg/compiler/vectorization`. Hi @eme64 , Thanks for looking at this. > @pfustc Why do you only test correctness (compare results) in some conditions? Is there not a risk that we miss doing it in some cases we should do it, just because we get the conditions slightly wrong? Yes, you are right! These conditions are added before to avoid jtreg hanging when compilation is locked. But now I can remove them because the lock is removed. In my latest commit, I have removed the conditions and some useless imports. > Just FYI: we should integrate this whole correctness of results testing into the IR framework. I filed [JDK-8310533](https://bugs.openjdk.org/browse/JDK-8310533). That would make it easier to use for new tests. It could also be used for any test, not just the ones located in test/hotspot/jtreg/compiler/vectorization. I have noticed this JBS before. The reason I didn't added correctness check into the IR framework is that I implemented this kind of check before the IR framework exists. (We have used it internally for a few years.) But anyway, it is a good proposal and I'm willing to help if needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1670964757 From dnsimon at openjdk.org Wed Aug 9 09:56:29 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 9 Aug 2023 09:56:29 GMT Subject: RFR: 8313899: JVMCI exception Translation can fail in TranslatedException. Message-ID: In a test that stresses metaspace (such as `vmTestbase/vm/mlvm/hiddenloader/stress/oome/metaspace/Test.java`) that also uses `-Xcomp -XX:-TieredCompilation`, we've seen a failure in `TranslatedException.` due to exhausted metaspace: java.lang.OutOfMemoryError: Metaspace at jdk.internal.vm.TranslatedException.encodeThrowable([java.base at 21-galahadeestaging](mailto:java.base at 21-galahadeestaging)/TranslatedException.java:176) at jdk.internal.vm.TranslatedException.([java.base at 21-galahadeestaging](mailto:java.base at 21-galahadeestaging)/TranslatedException.java:61) at jdk.internal.vm.VMSupport.encodeThrowable([java.base at 21-galahadeestaging](mailto:java.base at 21-galahadeestaging)/VMSupport.java:171) This PR pushes a fix such that this exception is properly handled in the VM (i.e. causing a compilation bailout) instead of leading to a VM crash. The PR includes 2 bits of debug code guarded by system properties that enable the handling to be tested in libgraal. The test itself is not included as libgraal is not part of OpenJDK. ------------- Commit messages: - handle exception in TranslatedException. (JDK-8313899) Changes: https://git.openjdk.org/jdk/pull/15198/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15198&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313899 Stats: 43 lines in 3 files changed: 41 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15198/head:pull/15198 PR: https://git.openjdk.org/jdk/pull/15198 From never at openjdk.org Wed Aug 9 09:56:29 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 9 Aug 2023 09:56:29 GMT Subject: RFR: 8313899: JVMCI exception Translation can fail in TranslatedException. In-Reply-To: References: Message-ID: On Tue, 8 Aug 2023 20:52:29 GMT, Doug Simon wrote: > In a test that stresses metaspace (such as `vmTestbase/vm/mlvm/hiddenloader/stress/oome/metaspace/Test.java`) that also uses `-Xcomp -XX:-TieredCompilation`, we've seen a failure in `TranslatedException.` due to exhausted metaspace: > > java.lang.OutOfMemoryError: Metaspace > at jdk.internal.vm.TranslatedException.encodeThrowable([java.base at 21-galahadeestaging](mailto:java.base at 21-galahadeestaging)/TranslatedException.java:176) > at jdk.internal.vm.TranslatedException.([java.base at 21-galahadeestaging](mailto:java.base at 21-galahadeestaging)/TranslatedException.java:61) > at jdk.internal.vm.VMSupport.encodeThrowable([java.base at 21-galahadeestaging](mailto:java.base at 21-galahadeestaging)/VMSupport.java:171) > > This PR pushes a fix such that this exception is properly handled in the VM (i.e. causing a compilation bailout) instead of leading to a VM crash. > > The PR includes 2 bits of debug code guarded by system properties that enable the handling to be tested in libgraal. The test itself is not included as libgraal is not part of OpenJDK. Looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15198#pullrequestreview-1568664065 From kvn at openjdk.org Wed Aug 9 19:07:29 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 9 Aug 2023 19:07:29 GMT Subject: RFR: 8312194: test/hotspot/jtreg/applications/ctw/modules/jdk_crypto_ec.java cannot handle empty modules In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 03:03:47 GMT, Leonid Mesnik wrote: > Removed empty module so CTW doesn't fail. okay ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15201#pullrequestreview-1570338776 From never at openjdk.org Wed Aug 9 20:57:29 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 9 Aug 2023 20:57:29 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension [v2] In-Reply-To: References: Message-ID: > Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Refactor logic and add LibJVMCICompilerThreadHidden - Merge branch 'master' into tkr-jvmci-hidden - 8311557: [JVMCI] deadlock with JVMTI thread suspension ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14799/files - new: https://git.openjdk.org/jdk/pull/14799/files/59b9edab..1f08dcc4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14799&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14799&range=00-01 Stats: 74808 lines in 1968 files changed: 39246 ins; 26179 del; 9383 mod Patch: https://git.openjdk.org/jdk/pull/14799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14799/head:pull/14799 PR: https://git.openjdk.org/jdk/pull/14799 From never at openjdk.org Wed Aug 9 20:58:28 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 9 Aug 2023 20:58:28 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:13:21 GMT, Tom Rodriguez wrote: > Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. I did some local testing and unsurprisingly this changes make it impossible to attach a debugger to these threads. In rare cases during development it might be necessary to this since there are still some helper Java calls with native JVMCI. So I've added a new flag LibJVMCICompilerThreadHidden to control this behaviour. I also refactored the code a bit because I figured I was missing some JVMCI_ONLY macros in the shared code. Could I get a rereview? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14799#issuecomment-1672108594 From dnsimon at openjdk.org Wed Aug 9 21:09:02 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 9 Aug 2023 21:09:02 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension [v2] In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 20:57:29 GMT, Tom Rodriguez wrote: >> Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Refactor logic and add LibJVMCICompilerThreadHidden > - Merge branch 'master' into tkr-jvmci-hidden > - 8311557: [JVMCI] deadlock with JVMTI thread suspension Marked as reviewed by dnsimon (Reviewer). src/hotspot/share/compiler/abstractCompiler.hpp line 154: > 152: CompilerType type() const { return _type; } > 153: > 154: virtual bool is_hidden_from_external_view() const { return false; } It would be nice if this had a comment explaining what "hidden from external view" implies. But I see that `Thread::is_hidden_from_external_view` has no documentation either so I guess there's no much that can be explained here if the broader concept is somewhat undefined. src/hotspot/share/jvmci/jvmciCompiler.hpp line 107: > 105: > 106: virtual bool is_hidden_from_external_view() const { return UseJVMCINativeLibrary && LibJVMCICompilerThreadHidden; } > 107: Remove extra blank line. src/hotspot/share/jvmci/jvmci_globals.hpp line 162: > 160: product(bool, LibJVMCICompilerThreadHidden, true, DIAGNOSTIC, \ > 161: "If true then native JVMCI compiler threads are hidden from " \ > 162: "JVMTI and FlightRecorder. This must be set to false if you" \ `you"` -> `you "` `threads"` -> `threads."` As far as I can help, this help text is never printed (is it?) but it may as well be properly formatted. ------------- PR Review: https://git.openjdk.org/jdk/pull/14799#pullrequestreview-1570538234 PR Review Comment: https://git.openjdk.org/jdk/pull/14799#discussion_r1289186285 PR Review Comment: https://git.openjdk.org/jdk/pull/14799#discussion_r1289186899 PR Review Comment: https://git.openjdk.org/jdk/pull/14799#discussion_r1289189957 From never at openjdk.org Thu Aug 10 01:20:34 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 10 Aug 2023 01:20:34 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension [v3] In-Reply-To: References: Message-ID: > Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14799/files - new: https://git.openjdk.org/jdk/pull/14799/files/1f08dcc4..334b0347 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14799&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14799&range=01-02 Stats: 5 lines in 3 files changed: 2 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14799/head:pull/14799 PR: https://git.openjdk.org/jdk/pull/14799 From never at openjdk.org Thu Aug 10 01:20:58 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 10 Aug 2023 01:20:58 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension [v2] In-Reply-To: References: Message-ID: <1OMHdTSahD0kjEUENiNdcDDVnwyC1sv3pcLslTi1Mak=.49b03f1d-f576-409d-9059-412c2bb10a74@github.com> On Wed, 9 Aug 2023 20:55:32 GMT, Doug Simon wrote: >> Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Refactor logic and add LibJVMCICompilerThreadHidden >> - Merge branch 'master' into tkr-jvmci-hidden >> - 8311557: [JVMCI] deadlock with JVMTI thread suspension > > src/hotspot/share/compiler/abstractCompiler.hpp line 154: > >> 152: CompilerType type() const { return _type; } >> 153: >> 154: virtual bool is_hidden_from_external_view() const { return false; } > > It would be nice if this had a comment explaining what "hidden from external view" implies. But I see that `Thread::is_hidden_from_external_view` has no documentation either so I guess there's no much that can be explained here if the broader concept is somewhat undefined. I added a comment in CompilerThread. > src/hotspot/share/jvmci/jvmciCompiler.hpp line 107: > >> 105: >> 106: virtual bool is_hidden_from_external_view() const { return UseJVMCINativeLibrary && LibJVMCICompilerThreadHidden; } >> 107: > > Remove extra blank line. Ok > src/hotspot/share/jvmci/jvmci_globals.hpp line 162: > >> 160: product(bool, LibJVMCICompilerThreadHidden, true, DIAGNOSTIC, \ >> 161: "If true then native JVMCI compiler threads are hidden from " \ >> 162: "JVMTI and FlightRecorder. This must be set to false if you" \ > > `you"` -> `you "` > `threads"` -> `threads."` > > As far as I can help, this help text is never printed (is it?) but it may as well be properly formatted. Fixed. It's not in the product but it does function a documentation for the options. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14799#discussion_r1289405007 PR Review Comment: https://git.openjdk.org/jdk/pull/14799#discussion_r1289404492 PR Review Comment: https://git.openjdk.org/jdk/pull/14799#discussion_r1289404280 From never at openjdk.org Thu Aug 10 01:33:58 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 10 Aug 2023 01:33:58 GMT Subject: RFR: 8314061: [JVMCI] DeoptimizeALot stress logic breaks deferred barriers Message-ID: JVMCIRuntime::new_array_common includes a little bit of stress logic that changes how it returns when DeoptimizeALot is set. This can cause it to bypass the call to SharedRuntime::on_slowpath_allocation_exit(current) which is where the deferred card mark logic lives. This can lead to random crashes of various kinds. ------------- Commit messages: - 8314061: [JVMCI] DeoptimizeALot stress logic breaks deferred barriers Changes: https://git.openjdk.org/jdk/pull/15218/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15218&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314061 Stats: 4 lines in 1 file changed: 2 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15218.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15218/head:pull/15218 PR: https://git.openjdk.org/jdk/pull/15218 From dholmes at openjdk.org Thu Aug 10 05:36:58 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 10 Aug 2023 05:36:58 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 10:29:06 GMT, Jorn Vernee wrote: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Hotspot test changes look fine. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15103#pullrequestreview-1571033557 From thartmann at openjdk.org Thu Aug 10 05:50:59 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Aug 2023 05:50:59 GMT Subject: RFR: 8314061: [JVMCI] DeoptimizeALot stress logic breaks deferred barriers In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 01:06:35 GMT, Tom Rodriguez wrote: > JVMCIRuntime::new_array_common includes a little bit of stress logic that changes how it returns when DeoptimizeALot is set. This can cause it to bypass the call to SharedRuntime::on_slowpath_allocation_exit(current) which is where the deferred card mark logic lives. This can lead to random crashes of various kinds. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15218#pullrequestreview-1571044722 From thartmann at openjdk.org Thu Aug 10 06:34:07 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Aug 2023 06:34:07 GMT Subject: RFR: 8312194: test/hotspot/jtreg/applications/ctw/modules/jdk_crypto_ec.java cannot handle empty modules In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 03:03:47 GMT, Leonid Mesnik wrote: > Removed empty module so CTW doesn't fail. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15201#pullrequestreview-1571105537 From thartmann at openjdk.org Thu Aug 10 06:34:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Aug 2023 06:34:58 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension [v3] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 01:20:34 GMT, Tom Rodriguez wrote: >> Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. > > Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Still looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14799#pullrequestreview-1571103255 From thartmann at openjdk.org Thu Aug 10 07:05:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Aug 2023 07:05:58 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v6] In-Reply-To: <9VFq3F8FbcL-Rf3JOrYrJ4j1BqJXDgnqdIY_bM2tAhY=.321bc122-f95c-4e88-9973-4f401d1c2a35@github.com> References: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> <9VFq3F8FbcL-Rf3JOrYrJ4j1BqJXDgnqdIY_bM2tAhY=.321bc122-f95c-4e88-9973-4f401d1c2a35@github.com> Message-ID: <31D5TzOWp6pow8HpfF4rZjvnk2H2qjmfTQng6OJeC-8=.a6a39fa3-a1ca-4027-b215-ae6439328d88@github.com> On Tue, 8 Aug 2023 14:18:51 GMT, Jorn Vernee wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Add a side effect to the IR tests to make sure we do not emit CMOVs there >> >> Without Tiered Compilation, no profile data is present, which means a CMOV would always be emitted. Keep the compiler from doing that, as the peephole currently does not work with CMOV instructions > > Ok, thanks. I've submitted another CI run. @JornVernee's testing failed: `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` Failed IR Rules (6) of Methods (6) ---------------------------------- 1) Method "public boolean compiler.c2.irTests.TestTestRemovalPeephole.testIntAddtionEquals0(int,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={FINAL_CODE}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#X86_TESTI_REG#_", "_#X86_TESTL_REG#_"}, applyIfOr={}, applyIfNot={})" > Phase "Final Code": - failOn: Graph contains forbidden nodes: * Constraint 1: "(\\d+(\\s){2}(testI_reg.*)+(\\s){2}===.*)" - Matched forbidden node: * 10 testI_reg === _ 11 [[ 9 ]] #0/0x00000000 2) Method "public boolean compiler.c2.irTests.TestTestRemovalPeephole.testIntAddtionNotEquals0(int,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={FINAL_CODE}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#X86_TESTI_REG#_", "_#X86_TESTL_REG#_"}, applyIfOr={}, applyIfNot={})" > Phase "Final Code": - failOn: Graph contains forbidden nodes: * Constraint 1: "(\\d+(\\s){2}(testI_reg.*)+(\\s){2}===.*)" - Matched forbidden node: * 10 testI_reg === _ 11 [[ 9 ]] #0/0x00000000 3) Method "public boolean compiler.c2.irTests.TestTestRemovalPeephole.testIntOrEquals0(int,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={FINAL_CODE}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#X86_TESTI_REG#_", "_#X86_TESTL_REG#_"}, applyIfOr={}, applyIfNot={})" > Phase "Final Code": - failOn: Graph contains forbidden nodes: * Constraint 1: "(\\d+(\\s){2}(testI_reg.*)+(\\s){2}===.*)" - Matched forbidden node: * 10 testI_reg === _ 11 [[ 9 ]] #0/0x00000000 4) Method "public boolean compiler.c2.irTests.TestTestRemovalPeephole.testIntOrGreater0(int,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={FINAL_CODE}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#X86_TESTI_REG#_", "_#X86_TESTL_REG#_"}, applyIfOr={}, applyIfNot={})" > Phase "Final Code": - failOn: Graph contains forbidden nodes: * Constraint 1: "(\\d+(\\s){2}(testI_reg.*)+(\\s){2}===.*)" - Matched forbidden node: * 10 testI_reg === _ 11 [[ 9 ]] #0/0x00000000 5) Method "public boolean compiler.c2.irTests.TestTestRemovalPeephole.testIntOrNotEquals0(int,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={FINAL_CODE}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#X86_TESTI_REG#_", "_#X86_TESTL_REG#_"}, applyIfOr={}, applyIfNot={})" > Phase "Final Code": - failOn: Graph contains forbidden nodes: * Constraint 1: "(\\d+(\\s){2}(testI_reg.*)+(\\s){2}===.*)" - Matched forbidden node: * 10 testI_reg === _ 11 [[ 9 ]] #0/0x00000000 6) Method "public boolean compiler.c2.irTests.TestTestRemovalPeephole.testLongOrGreater0(long,long)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={FINAL_CODE}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#X86_TESTI_REG#_", "_#X86_TESTL_REG#_"}, applyIfOr={}, applyIfNot={})" > Phase "Final Code": - failOn: Graph contains forbidden nodes: * Constraint 2: "(\\d+(\\s){2}(testL_reg.*)+(\\s){2}===.*)" - Matched forbidden node: * 10 testL_reg === _ 11 [[ 9 ]] #0/0x0000000000000000 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1672652404 From thartmann at openjdk.org Thu Aug 10 07:05:59 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Aug 2023 07:05:59 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved [v2] In-Reply-To: <7i5B-9hTl8oTKGpdMEiCsKEWf8a0M1HHpOZUsLYXrPI=.29dc1ce8-9666-4aa7-b63b-36610026c53a@github.com> References: <7i5B-9hTl8oTKGpdMEiCsKEWf8a0M1HHpOZUsLYXrPI=.29dc1ce8-9666-4aa7-b63b-36610026c53a@github.com> Message-ID: On Tue, 25 Jul 2023 18:38:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch removes the early return in `AddNode::Value` in case one of the inputs is a bottom, which may affect the value calculation of nodes such as `Min/MaxNode`. >> >> Please kindly review, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix min/maxfp nodes Looks good to me. src/hotspot/share/opto/addnode.cpp line 220: > 218: // Either input is TOP ==> the result is TOP > 219: const Type *t1 = phase->type(in(1)); > 220: const Type *t2 = phase->type(in(2)); Suggestion: const Type* t1 = phase->type(in(1)); const Type* t2 = phase->type(in(2)); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15021#pullrequestreview-1571131511 PR Review Comment: https://git.openjdk.org/jdk/pull/15021#discussion_r1289632391 From dnsimon at openjdk.org Thu Aug 10 08:17:02 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Aug 2023 08:17:02 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension [v3] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 01:20:34 GMT, Tom Rodriguez wrote: >> Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. > > Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14799#pullrequestreview-1571281599 From dnsimon at openjdk.org Thu Aug 10 08:32:28 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Aug 2023 08:32:28 GMT Subject: RFR: 8314061: [JVMCI] DeoptimizeALot stress logic breaks deferred barriers In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 01:06:35 GMT, Tom Rodriguez wrote: > JVMCIRuntime::new_array_common includes a little bit of stress logic that changes how it returns when DeoptimizeALot is set. This can cause it to bypass the call to SharedRuntime::on_slowpath_allocation_exit(current) which is where the deferred card mark logic lives. This can lead to random crashes of various kinds. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15218#pullrequestreview-1571285872 From dnsimon at openjdk.org Thu Aug 10 08:36:28 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Aug 2023 08:36:28 GMT Subject: RFR: 8313421: [JVMCI] avoid locking class loader in CompilerToVM.lookupType [v2] In-Reply-To: <5y-ZYjEZQrFUVuAvDQTWVGb4hOtBVzzpnWoKmK_-GAY=.a0c10c2e-3a7c-418f-a71c-ed86ecda7eaa@github.com> References: <5y-ZYjEZQrFUVuAvDQTWVGb4hOtBVzzpnWoKmK_-GAY=.a0c10c2e-3a7c-418f-a71c-ed86ecda7eaa@github.com> Message-ID: <_dnAOEYiYQA_ktAwKozDYzv5c-UT1ao49Ot05SVS08U=.42892582-b7c5-4223-b5b7-06b3de8d230d@github.com> On Tue, 8 Aug 2023 13:54:53 GMT, Doug Simon wrote: >> This PR removes the need to lock the system class loader when converting Class instances for boot and platform classes to ResolvedJavaType objects. Not only is the system class loader a suboptimal loader for resolving these classes but locking it can cause deadlock in some JDK tests (e.g. `test/jdk/java/lang/System/LoggerFinder/`) when run with `-Xcomp`. For example, a thread that holds the system class loader lock and triggers a blocking compilation will deadlock with the compiler thread servicing the compilation if the compilation requires calling `CompilerToVM.lookupType` (which most compilations do). > > Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8313421 > - avoid locking class loader in CompilerToVM.lookupType (JDK-8313421) Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15128#issuecomment-1672770989 From dnsimon at openjdk.org Thu Aug 10 08:36:58 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Aug 2023 08:36:58 GMT Subject: Integrated: 8313421: [JVMCI] avoid locking class loader in CompilerToVM.lookupType In-Reply-To: References: Message-ID: On Wed, 2 Aug 2023 20:33:49 GMT, Doug Simon wrote: > This PR removes the need to lock the system class loader when converting Class instances for boot and platform classes to ResolvedJavaType objects. Not only is the system class loader a suboptimal loader for resolving these classes but locking it can cause deadlock in some JDK tests (e.g. `test/jdk/java/lang/System/LoggerFinder/`) when run with `-Xcomp`. For example, a thread that holds the system class loader lock and triggers a blocking compilation will deadlock with the compiler thread servicing the compilation if the compilation requires calling `CompilerToVM.lookupType` (which most compilations do). This pull request has now been integrated. Changeset: 83adaf54 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/83adaf5477d1aa0128079a60be8847319dbadccc Stats: 93 lines in 8 files changed: 41 ins; 19 del; 33 mod 8313421: [JVMCI] avoid locking class loader in CompilerToVM.lookupType Reviewed-by: never, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/15128 From jvernee at openjdk.org Thu Aug 10 10:38:28 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 10 Aug 2023 10:38:28 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v6] In-Reply-To: <31D5TzOWp6pow8HpfF4rZjvnk2H2qjmfTQng6OJeC-8=.a6a39fa3-a1ca-4027-b215-ae6439328d88@github.com> References: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> <9VFq3F8FbcL-Rf3JOrYrJ4j1BqJXDgnqdIY_bM2tAhY=.321bc122-f95c-4e88-9973-4f401d1c2a35@github.com> <31D5TzOWp6pow8HpfF4rZjvnk2H2qjmfTQng6OJeC-8=.a6a39fa3-a1ca-4027-b215-ae6439328d88@github.com> Message-ID: On Thu, 10 Aug 2023 06:48:21 GMT, Tobias Hartmann wrote: >> Ok, thanks. I've submitted another CI run. > > @JornVernee's testing failed: > > `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` > > > > Failed IR Rules (6) of Methods (6) > ---------------------------------- > 1) Method "public boolean compiler.c2.irTests.TestTestRemovalPeephole.testIntAddtionEquals0(int,int)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={FINAL_CODE}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#X86_TESTI_REG#_", "_#X86_TESTL_REG#_"}, applyIfOr={}, applyIfNot={})" > > Phase "Final Code": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(\\d+(\\s){2}(testI_reg.*)+(\\s){2}===.*)" > - Matched forbidden node: > * 10 testI_reg === _ 11 [[ 9 ]] #0/0x00000000 > > 2) Method "public boolean compiler.c2.irTests.TestTestRemovalPeephole.testIntAddtionNotEquals0(int,int)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={FINAL_CODE}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#X86_TESTI_REG#_", "_#X86_TESTL_REG#_"}, applyIfOr={}, applyIfNot={})" > > Phase "Final Code": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(\\d+(\\s){2}(testI_reg.*)+(\\s){2}===.*)" > - Matched forbidden node: > * 10 testI_reg === _ 11 [[ 9 ]] #0/0x00000000 > > 3) Method "public boolean compiler.c2.irTests.TestTestRemovalPeephole.testIntOrEquals0(int,int)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={FINAL_CODE}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#X86_TESTI_REG#_", "_#X86_TESTL_REG#_"}, applyIfOr={}, applyIfNot={})" > > Phase "Final Code": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(\\d+(\\s){2}(testI_reg.*)+(\\s){2}===.*)" > - Matched forbidden node: > * 10 testI_reg === _ 11 [[ 9 ]] #0/0x00000000 > > 4) Method "public boolean compiler.c2.irTests.TestTestRemovalPeephole.testIntOrGreater0(int,int)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={FINAL_CODE}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#X86_TESTI_REG#_", "_#X86_TESTL_REG#_"}, applyIfOr={}, applyIfNot={})" > > Phase "Final Code": > ... Hey @TobiHartmann that's the old test run for commit hash `af93415`, the newer one for commit `9872e71` came back clean (there are just 3 known failures). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1672958797 From epeter at openjdk.org Thu Aug 10 10:51:01 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 10 Aug 2023 10:51:01 GMT Subject: RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 07:42:59 GMT, Pengfei Li wrote: > Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java` fails on AArch64 CPUs with 512-bit SVE. The reason is that many test loops in the code cannot be vectorized due to data dependence but IR tests assume they can. > > On AArch64, these IR tests just check the CPU feature of `asimd` and incorrectly assumes AArch64 vectors are at most 256 bits. But actually, `asimd` on AArch64 only represents NEON vectors which are at most 128 bits. AArch64 CPUs may have another feature of `sve` which represents scalable vectors of at most 2048 bits. The vectorization won't succeed on 512-bit SVE CPUs if the memory offset between some read and write is less than 512 bits. > > As this jtreg is auto-generated by a python script, we have updated the script and re-generated this jtreg. In this new version, we checked the auto-vectorization on both NEON-only and NEON+SVE platforms. Below is the diff of the generator script. We have also attached the new script to the JBS page. > > > @@ -321,7 +321,8 @@ class Type: > p.append(Platform("avx512", ["avx512", "true"], 64)) > else: > assert False, "type not implemented" + self.name > - p.append(Platform("asimd", ["asimd", "true"], 32)) > + p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16)) > + p.append(Platform("sve", ["sve", "true"], 256)) > return p > > class Test: > @@ -457,7 +458,7 @@ class Generator: > lines.append(" * and various MaxVectorSize values, and +- AlignVector.") > lines.append(" *") > lines.append(" * Note: this test is auto-generated. Please modify / generate with script:") > - lines.append(" * https://bugs.openjdk.org/browse/JDK-8308606") > + lines.append(" * https://bugs.openjdk.org/browse/JDK-8312570") > lines.append(" *") > lines.append(" * Types: " + ", ".join([t.name for t in self.types])) > lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets])) > @@ -598,7 +599,8 @@ class Generator: > # IR rules > for p in test.t.platforms(): > elements = p.vector_width // test.t.size > - lines.append(f" // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}") > + max_pre = "max " if p.name == "sve" else "" > + lines.append(f" // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}") > ############### -Align... @pfustc Thanks for the changes and explanations, looks good to me! :) Ah. Just one more idea: Since you now have even longer vector widths with 2048 bits: Should we not add some cases with even larger dependency offsets? We should go further than `-196, 196`. We could consider adding `255, 256, 511, 512, 1024, 1536` (positive and negative). Of course the question is if that increases the runtime too much, what do you think? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15010#pullrequestreview-1571589212 From epeter at openjdk.org Thu Aug 10 11:06:01 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 10 Aug 2023 11:06:01 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v2] In-Reply-To: References: Message-ID: <5EkrQNxjV5u0NIlfLg8BvCsLLYK-7qNMkT46dFA4_RU=.1ba35d62-0d2c-4765-9662-563751c4fd5c@github.com> On Wed, 9 Aug 2023 09:15:28 GMT, Pengfei Li wrote: >> @vnkozlov @TobiHartmann we should re-run testing from our side. >> >> @pfustc Why do you only test correctness (compare results) in some conditions? Is there not a risk that we miss doing it in some cases we should do it, just because we get the conditions slightly wrong? >> >> Just FYI: we should integrate this whole correctness of results testing into the IR framework. I filed [JDK-8310533](https://bugs.openjdk.org/browse/JDK-8310533). That would make it easier to use for new tests. It could also be used for any test, not just the ones located in `test/hotspot/jtreg/compiler/vectorization`. > > Hi @eme64 , > > Thanks for looking at this. > >> @pfustc Why do you only test correctness (compare results) in some conditions? Is there not a risk that we miss doing it in some cases we should do it, just because we get the conditions slightly wrong? > > Yes, you are right! These conditions are added before to avoid jtreg hanging when compilation is locked. But now I can remove them because the lock is removed. In my latest commit, I have removed the conditions and some useless imports. > >> Just FYI: we should integrate this whole correctness of results testing into the IR framework. I filed [JDK-8310533](https://bugs.openjdk.org/browse/JDK-8310533). That would make it easier to use for new tests. It could also be used for any test, not just the ones located in test/hotspot/jtreg/compiler/vectorization. > > I have noticed this JBS before. The reason I didn't added correctness check into the IR framework is that I implemented this kind of check before the IR framework exists. (We have used it internally for a few years.) But anyway, it is a good proposal and I'm willing to help if needed. @pfustc This looks good to me, thanks for making these changes. It will really increase the coverage. Maybe @vnkozlov should quickly look at it again if he still agrees. @TobiHartmann is running the testing again, just in case the dropped conditions change something. I will give you my approval after those tests are passing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1673001544 From thartmann at openjdk.org Thu Aug 10 11:07:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Aug 2023 11:07:58 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v6] In-Reply-To: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> References: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> Message-ID: On Tue, 8 Aug 2023 14:02:08 GMT, Tobias Hotz wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Add a side effect to the IR tests to make sure we do not emit CMOVs there > > Without Tiered Compilation, no profile data is present, which means a CMOV would always be emitted. Keep the compiler from doing that, as the peephole currently does not work with CMOV instructions Ah, sorry for the confusion then :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1672988980 From epeter at openjdk.org Thu Aug 10 11:25:34 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 10 Aug 2023 11:25:34 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v15] In-Reply-To: References: Message-ID: On Wed, 12 Jul 2023 17:47:26 GMT, Quan Anh Mai wrote: >> @merykitty I just discussed the testing with @TobiHartmann . He just came across this test: >> `test/hotspot/jtreg/compiler/c2/TestUnsignedByteCompare1.java`. >> The cool thing is that you can "simulate" constants with `MethodHandles.constant`. At runtime apparently the invocation specualte-and-traps it to a constant value. That means you can just set a new value, it depopts, and hopefully eventually re-compiles with the next constants. >> >> You could easily set up one of these tests per node. Any maybe throw in some interesting ranges for the `dividend`. >> >> An interesting experiment would be to have a IR test that works with a random constant, and then have an IR rule that fails if we find a`div` node. At least for those cases where that should work. And then you can easily compare the div results with a non-compiled method that computes the same value. > > @eme64 Thanks a lot for taking a look at this patch, I will address your remaining comments soon. > > The basic idea of the transformation in `javaArithmetic.hpp` is to find `M` and `s` such that `x / c = floor(x * M / 2**s)` for every interesting value of `x`. The remaining transformation in `divnode.cpp` is to convert this calculation from integer arithmetic to modular arithmetic. This is easy if the representative in the congruence class of an operand is always equal to itself, in which case we can do the calculation directly. For other cases, we have to do additional calculation to take into consideration the difference between arithmetic calculations in 2 domains. @merykitty I'm mostly out of the office until September 9 (FYI). It would be really cool if this made it in. I'm currently playing with `MethodHandles.constant`, and it is really easy to have "random" compile time constants. ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1673007195 From jvernee at openjdk.org Thu Aug 10 11:52:58 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 10 Aug 2023 11:52:58 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v6] In-Reply-To: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> References: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> Message-ID: <-qlslxgMp-QExMrmMtdukLNWlCpqPSxI3tc3gbuyzR0=.6bc13270-788d-4b84-8924-f8019cdae6e8@github.com> On Tue, 8 Aug 2023 14:02:08 GMT, Tobias Hotz wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Add a side effect to the IR tests to make sure we do not emit CMOVs there > > Without Tiered Compilation, no profile data is present, which means a CMOV would always be emitted. Keep the compiler from doing that, as the peephole currently does not work with CMOV instructions I think this looks mostly good now from my perspective. It would be nice to know why there's a regression in the `benchmarkOrTestFusableInt` case though. I'll try to reproduce here as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1673047023 From thartmann at openjdk.org Thu Aug 10 12:21:10 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Aug 2023 12:21:10 GMT Subject: RFR: 8313899: JVMCI exception Translation can fail in TranslatedException. In-Reply-To: References: Message-ID: On Tue, 8 Aug 2023 20:52:29 GMT, Doug Simon wrote: > In a test that stresses metaspace (such as `vmTestbase/vm/mlvm/hiddenloader/stress/oome/metaspace/Test.java`) that also uses `-Xcomp -XX:-TieredCompilation`, we've seen a failure in `TranslatedException.` due to exhausted metaspace: > > java.lang.OutOfMemoryError: Metaspace > at jdk.internal.vm.TranslatedException.encodeThrowable(java.base at 21/TranslatedException.java:176) > at jdk.internal.vm.TranslatedException.(java.base at 21/TranslatedException.java:61) > at jdk.internal.vm.VMSupport.encodeThrowable(java.base at 21/VMSupport.java:171) > > This PR pushes a fix such that this exception is properly handled in the VM (i.e. causing a compilation bailout) instead of leading to a VM crash. > > The PR includes 2 bits of debug code guarded by system properties that enable the handling to be tested in libgraal. The test itself is not included as libgraal is not part of OpenJDK. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15198#pullrequestreview-1571741510 From dnsimon at openjdk.org Thu Aug 10 12:22:00 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Aug 2023 12:22:00 GMT Subject: RFR: 8313899: JVMCI exception Translation can fail in TranslatedException. In-Reply-To: References: Message-ID: On Tue, 8 Aug 2023 20:52:29 GMT, Doug Simon wrote: > In a test that stresses metaspace (such as `vmTestbase/vm/mlvm/hiddenloader/stress/oome/metaspace/Test.java`) that also uses `-Xcomp -XX:-TieredCompilation`, we've seen a failure in `TranslatedException.` due to exhausted metaspace: > > java.lang.OutOfMemoryError: Metaspace > at jdk.internal.vm.TranslatedException.encodeThrowable(java.base at 21/TranslatedException.java:176) > at jdk.internal.vm.TranslatedException.(java.base at 21/TranslatedException.java:61) > at jdk.internal.vm.VMSupport.encodeThrowable(java.base at 21/VMSupport.java:171) > > This PR pushes a fix such that this exception is properly handled in the VM (i.e. causing a compilation bailout) instead of leading to a VM crash. > > The PR includes 2 bits of debug code guarded by system properties that enable the handling to be tested in libgraal. The test itself is not included as libgraal is not part of OpenJDK. src/hotspot/share/jvmci/jvmciEnv.cpp line 472: > 470: vmSymbols::encodeThrowable_name(), > 471: vmSymbols::encodeThrowable_signature(), &jargs, THREAD); > 472: if (handle_pending_exception(THREAD, buffer, buffer_size)) { This is the actual bug fix: handle any exception occurring in the Java upcall. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15198#discussion_r1290014253 From epeter at openjdk.org Thu Aug 10 12:37:29 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 10 Aug 2023 12:37:29 GMT Subject: RFR: 8312332: C2: Refactor SWPointer out from SuperWord In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 08:51:38 GMT, Pengfei Li wrote: > As discussed in JDK-8308994, we should first do some refactoring work before proceeding with the new post loop vectorization. In this patch, we have done the following. > > 1) We have created new C2 source files `vectorization.[cpp|hpp]` for shared logics and utilities for C2's auto-vectorization. So far we have moved class `SWPointer` and `VectorElementSizeStats` here from `superword.[cpp|hpp]`. > > 2) We have decoupled `SWPointer` from class `SuperWord` and renamed it to `VPointer` as it will be used by vectorizers other than SuperWord. The original class `SWPointer` and its inner class `Tracer` both have a `_slp` field initialized in their constructors. In this patch, we have replaced them by other fields and re-written the constructors for the same functionality. Original `SWPointer::invariant()` calls function `SuperWord::find_pre_loop_end()` for loop invariant checks. To help decoupling, we moved function `find_pre_loop_end()` to class `CountedLoopNode`. As function `SWPointer::Tracer::invariant_1()` is tightly coupled with `SuperWord` but only prints some debug messages, we temporarily removed it in this patch. We will consider adding it back after later refactoring of `SuperWord` so we added a `TODO` at its call site in this patch. > > 3) We have a lot of memory phi node checks in loop optimizations. So we added a utility function `is_memory_phi()` in `node.hpp`. > > Tested tier1~3 on x86 and AArch64. Also manually verified that option `VectorizeDebug` in compiler directives still works well. @pfustc Thanks a lot for moving this code, and especially for untangling it from SuperWord. This will be helpful for many future vectorization projects. I left a few comments, but otherwise this looks straight-forward and good to me. src/hotspot/share/opto/vectorization.cpp line 40: > 38: #endif > 39: > 40: VPointer::VPointer(MemNode* mem, PhaseIdealLoop* phase, IdealLoopTree* lpt, You could also call it `LPointer` or `LoopPointer`. `VPointer` sounds like `VectorPointer` - but it is not a pointer of a vector but a scalar memop. That could be confusing. But you could also argue it is a `VectorizationPointer`, and hence `VPointer.` src/hotspot/share/opto/vectorization.cpp line 50: > 48: _nstack(nstack), _analyze_only(analyze_only), _stack_idx(0) > 49: #ifndef PRODUCT > 50: , _tracer((phase->C->directive()->VectorizeDebugOption & 2) > 0) You should also refactor the accessors for `VectorizeDebugOption`. I would move it from SuperWord to `vectorization.hpp/cpp` somehow. We should only do the "masking" `& 2` in one single place. src/hotspot/share/opto/vectorization.cpp line 131: > 129: bool VPointer::invariant(Node* n) const { > 130: NOT_PRODUCT(Tracer::Depth dd;) > 131: // TODO: Add more trace output for invariant check after later refactoring We generally don't like `TODO`s in the code. Best is to just drop it in the code and file an RFE if you think it is really important. When did this even trace anything? `_slp->_lpt->is_member(_slp->_phase->get_loop(n_c)) != (int)_slp->in_bb(n)` Do you think this tracing is relevant enough? src/hotspot/share/opto/vectorization.cpp line 145: > 143: Node* n_c = phase()->get_ctrl(n); > 144: return phase()->is_dominator(n_c, pre_loop_end->loopnode()); > 145: } Is `pre_loop_end != nullptr` possible here? Before your patch we always found `_slp->pre_loop_head()`. I'm just worried that if we do not find it, then we still return `is_not_member`, but `n` is still located in the space between pre and post loop. What do you think about this? And: would it make sense to cache the `pre_loop_head` in the `VPointer`? ------------- PR Review: https://git.openjdk.org/jdk/pull/15013#pullrequestreview-1571704233 PR Review Comment: https://git.openjdk.org/jdk/pull/15013#discussion_r1290037785 PR Review Comment: https://git.openjdk.org/jdk/pull/15013#discussion_r1290023212 PR Review Comment: https://git.openjdk.org/jdk/pull/15013#discussion_r1290006470 PR Review Comment: https://git.openjdk.org/jdk/pull/15013#discussion_r1290015263 From epeter at openjdk.org Thu Aug 10 12:38:29 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 10 Aug 2023 12:38:29 GMT Subject: RFR: 8312332: C2: Refactor SWPointer out from SuperWord In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 11:50:12 GMT, Emanuel Peter wrote: >> As discussed in JDK-8308994, we should first do some refactoring work before proceeding with the new post loop vectorization. In this patch, we have done the following. >> >> 1) We have created new C2 source files `vectorization.[cpp|hpp]` for shared logics and utilities for C2's auto-vectorization. So far we have moved class `SWPointer` and `VectorElementSizeStats` here from `superword.[cpp|hpp]`. >> >> 2) We have decoupled `SWPointer` from class `SuperWord` and renamed it to `VPointer` as it will be used by vectorizers other than SuperWord. The original class `SWPointer` and its inner class `Tracer` both have a `_slp` field initialized in their constructors. In this patch, we have replaced them by other fields and re-written the constructors for the same functionality. Original `SWPointer::invariant()` calls function `SuperWord::find_pre_loop_end()` for loop invariant checks. To help decoupling, we moved function `find_pre_loop_end()` to class `CountedLoopNode`. As function `SWPointer::Tracer::invariant_1()` is tightly coupled with `SuperWord` but only prints some debug messages, we temporarily removed it in this patch. We will consider adding it back after later refactoring of `SuperWord` so we added a `TODO` at its call site in this patch. >> >> 3) We have a lot of memory phi node checks in loop optimizations. So we added a utility function `is_memory_phi()` in `node.hpp`. >> >> Tested tier1~3 on x86 and AArch64. Also manually verified that option `VectorizeDebug` in compiler directives still works well. > > src/hotspot/share/opto/vectorization.cpp line 131: > >> 129: bool VPointer::invariant(Node* n) const { >> 130: NOT_PRODUCT(Tracer::Depth dd;) >> 131: // TODO: Add more trace output for invariant check after later refactoring > > We generally don't like `TODO`s in the code. Best is to just drop it in the code and file an RFE if you think it is really important. > > When did this even trace anything? > `_slp->_lpt->is_member(_slp->_phase->get_loop(n_c)) != (int)_slp->in_bb(n)` > > Do you think this tracing is relevant enough? If it should never happen: can we add an assert somewhere instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15013#discussion_r1290044129 From epeter at openjdk.org Thu Aug 10 13:39:28 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 10 Aug 2023 13:39:28 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v5] In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 04:12:02 GMT, Fei Gao wrote: >> Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: >> >> >> match(Set dst (FmaF src3 (Binary (NegF src1) src2))); >> match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); >> >> >> Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. >> >> Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. >> >> After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. >> >> The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into fg8308340 > - Merge branch 'master' into fg8308340 > - Merge branch 'master' into fg8308340 > - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files > - Merge branch 'master' into fg8308340 > - 8308340: C2: Idealize Fma nodes > > Some platforms, like aarch64, ppc, and riscv, support fusing > `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating > partially symmetric match rules like: > > ``` > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > ``` > > Since `Fma` is partially communitive, the patch is to convert > `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, > making node patterns canonical. Then we can remove redundant > rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on > platforms supporting `Fma` instructions before matcher, so we > can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform > decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. @fg1417 This looks like a reasonable refactoring. We should probably do the verification that the canonicalization happened, if the normal `fma` matcher rule is chosen. We should add asserts that the first argument is not a negation (you could check the second argument also, just in case). What do you think? src/hotspot/cpu/x86/x86.ad line 3975: > 3973: // a * b + c > 3974: instruct fmaF_reg(regF a, regF b, regF c) %{ > 3975: predicate(UseFMA); You could add an assert to the encoding code. Just to ensure that we do not generate bad code, even if it is never executed during testing. src/hotspot/share/opto/mulnode.cpp line 1717: > 1715: //------------------------------Ideal------------------------------------------ > 1716: Node* FmaNode::Ideal(PhaseGVN* phase, bool can_reshape) { > 1717: // We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" Add motivation to comment // This reduces the number of rules in the matcher, as we only need to check // for negations on the second argument, and not the symmetric case where // the first argument is negated. test/hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java line 63: > 61: private static final VectorSpecies S_SPECIES = ShortVector.SPECIES_MAX; > 62: > 63: private static int LENGTH = 128; What is the reason for the reduction? Speed? ------------- PR Review: https://git.openjdk.org/jdk/pull/14576#pullrequestreview-1571816582 PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1290116448 PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1290112263 PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1290076923 From epeter at openjdk.org Thu Aug 10 13:40:28 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 10 Aug 2023 13:40:28 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: <2mha0SwQTlnWaTLdEMX6RQ1KqhasbfWKz503bmkxKhw=.eb3a1229-b222-4d02-b2f6-35c0e50e5766@github.com> <25TmMNCogoj1jgszVmsFMDfkBVov20V_zM9G0x8cqDQ=.5502aded-0509-4afb-b5ad-47084dbfa430@github.com> Message-ID: On Thu, 20 Jul 2023 09:34:27 GMT, Fei Gao wrote: >> Thanks. How are `FmaV` nodes with mask handled then? Are they transformed into equivalent nodes without mask? > > Actually, there is no handling on `FmaV` nodes **with mask** in this patch, whether in the C2 mid-end or codegen backend. The gvn transformation just skips them. And I suppose `FmaV` nodes with mask can't be transformed into nodes **without mask**, except that C2 can guarantee that the mask is all true (this transformation has not been supported by current C2). Thanks. @fg1417 I only understood the comment with the help of your explanations in this thread. I think you should improve the comment. I would not mention the vectorapi. We may generate `FmaV` through an auto-vectorizer. Though I guess that is unlikely, since the scalar version `Fma::Ideal` would already reshape things. Suggestion: // We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" // This reduces the number of rules in the matcher, as we only need to check // for negations on the second argument, and not the symmetric case where // the first argument is negated. // We cannot do this if he FmaV is masked. the inactive lanes have to return // the first input (ie "-a"). If we were to swap the inputs, the inactive lanes would // incorrectly return "b". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1290108363 From chagedorn at openjdk.org Thu Aug 10 13:56:02 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 10 Aug 2023 13:56:02 GMT Subject: RFR: 8314106: C2: assert(is_valid()) failed: must be valid after JDK-8305636 Message-ID: In the failing test case, we are unswitching a loop for which we've already removed Parse Predicates with `Compile::cleanup_parse_predicates()`. We are wrongly checking if a predicate block is non-empty (i.e. find the Parse **or** Runtime Predicates) instead of only checking if we find the Parse Predicate: https://github.com/openjdk/jdk/blob/23fe2ece586d3ed750e905e1b71a2cd1da91f335/src/hotspot/share/opto/loopPredicate.cpp#L448-L453 In the test case, we have a predicate block that contains Runtime Predicates from Loop Predication but no Parse Predicate anymore. Therefore, when trying to clone the non-existing Parse Predicate, we fail with the assertion because we do not have a valid Parse Predicate. The fix is to only clone a Parse Predicate and the Assertion Predicates for a predicate block if the Parse Predicate is actually there. This is not entirely correct because we would also need to clone Assertion Predicates in the absence of Parse Predicates. But this was already wrong before JDK-8305636: https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopPredicate.cpp#L598-L612 This will only be fixed with the complete fix ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). The proposed fix here just reverts back to the old behavior before JDK-8305636. Thanks, Christian ------------- Commit messages: - 8314106: C2: assert(is_valid()) failed: must be valid after JDK-8305636 Changes: https://git.openjdk.org/jdk/pull/15225/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15225&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314106 Stats: 68 lines in 2 files changed: 66 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15225/head:pull/15225 PR: https://git.openjdk.org/jdk/pull/15225 From thartmann at openjdk.org Thu Aug 10 13:56:38 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Aug 2023 13:56:38 GMT Subject: RFR: 8313899: JVMCI exception Translation can fail in TranslatedException. [v2] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 13:41:05 GMT, Doug Simon wrote: >> In a test that stresses metaspace (such as `vmTestbase/vm/mlvm/hiddenloader/stress/oome/metaspace/Test.java`) that also uses `-Xcomp -XX:-TieredCompilation`, we've seen a failure in `TranslatedException.` due to exhausted metaspace: >> >> java.lang.OutOfMemoryError: Metaspace >> at jdk.internal.vm.TranslatedException.encodeThrowable(java.base at 21/TranslatedException.java:176) >> at jdk.internal.vm.TranslatedException.(java.base at 21/TranslatedException.java:61) >> at jdk.internal.vm.VMSupport.encodeThrowable(java.base at 21/VMSupport.java:171) >> >> This PR pushes a fix such that this exception is properly handled in the VM (i.e. causing a compilation bailout) instead of leading to a VM crash. >> >> The PR includes 2 bits of debug code guarded by system properties that enable the handling to be tested in libgraal. The test itself is not included as libgraal is not part of OpenJDK. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - guard test-only code with ASSERT instead of !PRODUCT > - omit test-only code in product build Still looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15198#pullrequestreview-1571929038 From dnsimon at openjdk.org Thu Aug 10 13:56:37 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Aug 2023 13:56:37 GMT Subject: RFR: 8313899: JVMCI exception Translation can fail in TranslatedException. [v2] In-Reply-To: References: Message-ID: > In a test that stresses metaspace (such as `vmTestbase/vm/mlvm/hiddenloader/stress/oome/metaspace/Test.java`) that also uses `-Xcomp -XX:-TieredCompilation`, we've seen a failure in `TranslatedException.` due to exhausted metaspace: > > java.lang.OutOfMemoryError: Metaspace > at jdk.internal.vm.TranslatedException.encodeThrowable(java.base at 21/TranslatedException.java:176) > at jdk.internal.vm.TranslatedException.(java.base at 21/TranslatedException.java:61) > at jdk.internal.vm.VMSupport.encodeThrowable(java.base at 21/VMSupport.java:171) > > This PR pushes a fix such that this exception is properly handled in the VM (i.e. causing a compilation bailout) instead of leading to a VM crash. > > The PR includes 2 bits of debug code guarded by system properties that enable the handling to be tested in libgraal. The test itself is not included as libgraal is not part of OpenJDK. Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - guard test-only code with ASSERT instead of !PRODUCT - omit test-only code in product build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15198/files - new: https://git.openjdk.org/jdk/pull/15198/files/529258a8..f160cb80 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15198&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15198&range=00-01 Stats: 17 lines in 4 files changed: 17 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15198/head:pull/15198 PR: https://git.openjdk.org/jdk/pull/15198 From qamai at openjdk.org Thu Aug 10 13:57:32 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 10 Aug 2023 13:57:32 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved [v3] In-Reply-To: References: Message-ID: <5CfY1gow6NTUMu1rnz916kViyij3YA1ZaZrJbvlqgqI=.b1a11538-07a3-4149-847b-85d67f43e2c8@github.com> > Hi, > > This patch removes the early return in `AddNode::Value` in case one of the inputs is a bottom, which may affect the value calculation of nodes such as `Min/MaxNode`. > > Please kindly review, thanks very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: address review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15021/files - new: https://git.openjdk.org/jdk/pull/15021/files/ef2d3dfb..4ae1ad36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15021&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15021&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15021.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15021/head:pull/15021 PR: https://git.openjdk.org/jdk/pull/15021 From qamai at openjdk.org Thu Aug 10 14:01:28 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 10 Aug 2023 14:01:28 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved [v2] In-Reply-To: References: <7i5B-9hTl8oTKGpdMEiCsKEWf8a0M1HHpOZUsLYXrPI=.29dc1ce8-9666-4aa7-b63b-36610026c53a@github.com> Message-ID: <-tS92lZ7rv1tMzqzVxG424DMTEugbdwtMG9no0SO9Vc=.e922a467-2a92-4bd1-9e58-e06711bbc717@github.com> On Thu, 10 Aug 2023 06:45:05 GMT, Tobias Hartmann wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> fix min/maxfp nodes > > Looks good to me. @TobiHartmann Thanks for your reviews, I have adjusted the code style there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15021#issuecomment-1673231073 From thartmann at openjdk.org Thu Aug 10 14:06:01 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Aug 2023 14:06:01 GMT Subject: RFR: 8314106: C2: assert(is_valid()) failed: must be valid after JDK-8305636 In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 13:35:46 GMT, Christian Hagedorn wrote: > In the failing test case, we are unswitching a loop for which we've already removed Parse Predicates with `Compile::cleanup_parse_predicates()`. We are wrongly checking if a predicate block is non-empty (i.e. find the Parse **or** Runtime Predicates) instead of only checking if we find the Parse Predicate: > https://github.com/openjdk/jdk/blob/23fe2ece586d3ed750e905e1b71a2cd1da91f335/src/hotspot/share/opto/loopPredicate.cpp#L448-L453 > > In the test case, we have a predicate block that contains Runtime Predicates from Loop Predication but no Parse Predicate anymore. Therefore, when trying to clone the non-existing Parse Predicate, we fail with the assertion because we do not have a valid Parse Predicate. > > The fix is to only clone a Parse Predicate and the Assertion Predicates for a predicate block if the Parse Predicate is actually there. This is not entirely correct because we would also need to clone Assertion Predicates in the absence of Parse Predicates. But this was already wrong before JDK-8305636: > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopPredicate.cpp#L598-L612 > > This will only be fixed with the complete fix ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). The proposed fix here just reverts back to the old behavior before JDK-8305636. > > Thanks, > Christian Looks good and trivial to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15225#pullrequestreview-1571978296 From yzheng at openjdk.org Thu Aug 10 14:41:09 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 10 Aug 2023 14:41:09 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v3] In-Reply-To: References: Message-ID: > This PR exports `vmIntrinsic::is_intrinsic_available`, `Compiler::is_intrinsic_supported`, and `C2Compiler::is_intrinsic_supported` results to JVMCI compiler. This allows JVMCI compiler to comply with `-XX:DisableIntrinsic`, `-XX:ControlIntrinsic`, `-XX:-UseXXXIntrinsic`, and is essential for running test that depends on these flags, e.g., `java/lang/Float/Binary16ConversionNaN` that returns different result in the interpreter with `-XX:DisableIntrinsic=_float16ToFloat,_floatToFloat16`. > This PR also attempts to fix some of the `is_intrinsic_available` results. Please see the inlined comments. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: avoid duplicating cpu feature checks. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15133/files - new: https://git.openjdk.org/jdk/pull/15133/files/fba23164..c6e6dc64 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15133&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15133&range=01-02 Stats: 54 lines in 2 files changed: 9 ins; 45 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15133/head:pull/15133 PR: https://git.openjdk.org/jdk/pull/15133 From yzheng at openjdk.org Thu Aug 10 15:00:59 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 10 Aug 2023 15:00:59 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v4] In-Reply-To: References: Message-ID: > This PR exports `vmIntrinsic::is_intrinsic_available`, `Compiler::is_intrinsic_supported`, and `C2Compiler::is_intrinsic_supported` results to JVMCI compiler. This allows JVMCI compiler to comply with `-XX:DisableIntrinsic`, `-XX:ControlIntrinsic`, `-XX:-UseXXXIntrinsic`, and is essential for running test that depends on these flags, e.g., `java/lang/Float/Binary16ConversionNaN` that returns different result in the interpreter with `-XX:DisableIntrinsic=_float16ToFloat,_floatToFloat16`. > This PR also attempts to fix some of the `is_intrinsic_available` results. Please see the inlined comments. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: cleanup. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15133/files - new: https://git.openjdk.org/jdk/pull/15133/files/c6e6dc64..1d2b83e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15133&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15133&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15133/head:pull/15133 PR: https://git.openjdk.org/jdk/pull/15133 From yzheng at openjdk.org Thu Aug 10 15:04:58 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 10 Aug 2023 15:04:58 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v2] In-Reply-To: References: Message-ID: <9WeBz8r5aL5xW5HDjw-za0NOTMZU7VEgI0PadSri5aU=.3c958061-4355-4ccf-b3ea-9895bb6dd68a@github.com> On Wed, 9 Aug 2023 08:46:36 GMT, Dean Long wrote: > I don't having the same logic in two places, because then those two places need to be kept in sync. Either the stubs should be generated based on is_intrinsic_supported(), or is_intrinsic_supported() should check if the stub was generated. I have dropped the redundant CPU feature checks, and for those intrinsics with stubs, I tested if the stub pointer is nullptr in `C2Compiler::is_intrinsic_supported` ------------- PR Comment: https://git.openjdk.org/jdk/pull/15133#issuecomment-1673331114 From jvernee at openjdk.org Thu Aug 10 15:05:58 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 10 Aug 2023 15:05:58 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: <7K7n7ahPfknWU6q5Wu9upiPKhMmXzwA5jthvvJmt7A4=.3072efdc-e2cc-4250-a5b1-feabe104ae30@github.com> On Tue, 1 Aug 2023 10:29:06 GMT, Jorn Vernee wrote: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Thanks for the reviews so far. That also reminds me to mention: while there are a lot of files touched, most of the updates a simple 1 line changes to test files removing the `@enablePreview` tag from the jtreg test information. So, probably most of this is easy to review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1673384016 From lmesnik at openjdk.org Thu Aug 10 15:40:29 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 10 Aug 2023 15:40:29 GMT Subject: Integrated: 8312194: test/hotspot/jtreg/applications/ctw/modules/jdk_crypto_ec.java cannot handle empty modules In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 03:03:47 GMT, Leonid Mesnik wrote: > Removed empty module so CTW doesn't fail. This pull request has now been integrated. Changeset: e7c83ea9 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/e7c83ea948f8b2cd7caf7e59d3cf6b087807dba7 Stats: 39 lines in 2 files changed: 0 ins; 39 del; 0 mod 8312194: test/hotspot/jtreg/applications/ctw/modules/jdk_crypto_ec.java cannot handle empty modules Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/15201 From duke at openjdk.org Thu Aug 10 15:55:29 2023 From: duke at openjdk.org (Swati Sharma) Date: Thu, 10 Aug 2023 15:55:29 GMT Subject: RFR: 8314085: Fixing scope from benchmark to thread for JMH tests having shared state Message-ID: In addition to the issue [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing the scope from benchmark to thread for below benchmark files having shared state, also which fixes few of the benchmarks scalability problems. org/openjdk/bench/java/io/DataInputStreamTest.java org/openjdk/bench/java/lang/ArrayClone.java org/openjdk/bench/java/lang/StringCompareToDifferentLength.java org/openjdk/bench/java/lang/StringCompareToIgnoreCase.java org/openjdk/bench/java/lang/StringComparisons.java org/openjdk/bench/java/lang/StringEquals.java org/openjdk/bench/java/lang/StringFormat.java org/openjdk/bench/java/lang/StringReplace.java org/openjdk/bench/java/lang/StringSubstring.java org/openjdk/bench/java/lang/StringTemplateFMT.java org/openjdk/bench/java/lang/constant/MethodTypeDescFactories.java org/openjdk/bench/java/lang/constant/ReferenceClassDescResolve.java org/openjdk/bench/java/lang/invoke/MethodHandlesConstant.java org/openjdk/bench/java/lang/invoke/MethodHandlesIdentity.java org/openjdk/bench/java/lang/invoke/MethodHandlesThrowException.java org/openjdk/bench/java/lang/invoke/MethodTypeAppendParams.java org/openjdk/bench/java/lang/invoke/MethodTypeChangeParam.java org/openjdk/bench/java/lang/invoke/MethodTypeChangeReturn.java org/openjdk/bench/java/lang/invoke/MethodTypeDropParams.java org/openjdk/bench/java/lang/invoke/MethodTypeGenerify.java org/openjdk/bench/java/lang/invoke/MethodTypeInsertParams.java org/openjdk/bench/java/security/CipherSuiteBench.java org/openjdk/bench/java/time/GetYearBench.java org/openjdk/bench/java/time/InstantBench.java org/openjdk/bench/java/time/format/DateTimeFormatterWithPaddingBench.java org/openjdk/bench/java/util/ListArgs.java org/openjdk/bench/java/util/LocaleDefaults.java org/openjdk/bench/java/util/TestAdler32.java org/openjdk/bench/java/util/TestCRC32.java org/openjdk/bench/java/util/TestCRC32C.java org/openjdk/bench/java/util/regex/Exponential.java org/openjdk/bench/java/util/regex/Primality.java org/openjdk/bench/java/util/regex/Trim.java org/openjdk/bench/javax/crypto/AESReinit.java org/openjdk/bench/jdk/incubator/vector/LoadMaskedIOOBEBenchmark.java org/openjdk/bench/vm/compiler/Rotation.java org/openjdk/bench/vm/compiler/x86/ConvertF2I.java org/openjdk/bench/vm/compiler/x86/BasicRules.java Please review and provide your feedback. Thanks, Swati ------------- Commit messages: - 8314085: Fixing scope from benchmark to thread for JMH tests having shared state Changes: https://git.openjdk.org/jdk/pull/15230/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15230&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314085 Stats: 46 lines in 38 files changed: 0 ins; 0 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/15230.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15230/head:pull/15230 PR: https://git.openjdk.org/jdk/pull/15230 From kvn at openjdk.org Thu Aug 10 16:32:03 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Aug 2023 16:32:03 GMT Subject: RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 07:42:59 GMT, Pengfei Li wrote: > Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java` fails on AArch64 CPUs with 512-bit SVE. The reason is that many test loops in the code cannot be vectorized due to data dependence but IR tests assume they can. > > On AArch64, these IR tests just check the CPU feature of `asimd` and incorrectly assumes AArch64 vectors are at most 256 bits. But actually, `asimd` on AArch64 only represents NEON vectors which are at most 128 bits. AArch64 CPUs may have another feature of `sve` which represents scalable vectors of at most 2048 bits. The vectorization won't succeed on 512-bit SVE CPUs if the memory offset between some read and write is less than 512 bits. > > As this jtreg is auto-generated by a python script, we have updated the script and re-generated this jtreg. In this new version, we checked the auto-vectorization on both NEON-only and NEON+SVE platforms. Below is the diff of the generator script. We have also attached the new script to the JBS page. > > > @@ -321,7 +321,8 @@ class Type: > p.append(Platform("avx512", ["avx512", "true"], 64)) > else: > assert False, "type not implemented" + self.name > - p.append(Platform("asimd", ["asimd", "true"], 32)) > + p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16)) > + p.append(Platform("sve", ["sve", "true"], 256)) > return p > > class Test: > @@ -457,7 +458,7 @@ class Generator: > lines.append(" * and various MaxVectorSize values, and +- AlignVector.") > lines.append(" *") > lines.append(" * Note: this test is auto-generated. Please modify / generate with script:") > - lines.append(" * https://bugs.openjdk.org/browse/JDK-8308606") > + lines.append(" * https://bugs.openjdk.org/browse/JDK-8312570") > lines.append(" *") > lines.append(" * Types: " + ", ".join([t.name for t in self.types])) > lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets])) > @@ -598,7 +599,8 @@ class Generator: > # IR rules > for p in test.t.platforms(): > elements = p.vector_width // test.t.size > - lines.append(f" // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}") > + max_pre = "max " if p.name == "sve" else "" > + lines.append(f" // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}") > ############### -Align... Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15010#pullrequestreview-1572278425 From kvn at openjdk.org Thu Aug 10 16:37:28 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Aug 2023 16:37:28 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v4] In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 09:13:28 GMT, Pengfei Li wrote: >> This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. >> >> To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. >> >> Tested various of compiler control related VM flags on x86 and AArch64. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Remove useless conditions and imports I am fine with update. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15011#pullrequestreview-1572286304 From kvn at openjdk.org Thu Aug 10 16:49:31 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Aug 2023 16:49:31 GMT Subject: RFR: 8314106: C2: assert(is_valid()) failed: must be valid after JDK-8305636 In-Reply-To: References: Message-ID: <4Vjn2azliPNi-OwU7Qcgk9F9Kxk9DHe_Hvsi1ItArho=.f6f4635c-4ca3-4946-998b-95785e6f5679@github.com> On Thu, 10 Aug 2023 13:35:46 GMT, Christian Hagedorn wrote: > In the failing test case, we are unswitching a loop for which we've already removed Parse Predicates with `Compile::cleanup_parse_predicates()`. We are wrongly checking if a predicate block is non-empty (i.e. find the Parse **or** Runtime Predicates) instead of only checking if we find the Parse Predicate: > https://github.com/openjdk/jdk/blob/23fe2ece586d3ed750e905e1b71a2cd1da91f335/src/hotspot/share/opto/loopPredicate.cpp#L448-L453 > > In the test case, we have a predicate block that contains Runtime Predicates from Loop Predication but no Parse Predicate anymore. Therefore, when trying to clone the non-existing Parse Predicate, we fail with the assertion because we do not have a valid Parse Predicate. > > The fix is to only clone a Parse Predicate and the Assertion Predicates for a predicate block if the Parse Predicate is actually there. This is not entirely correct because we would also need to clone Assertion Predicates in the absence of Parse Predicates. But this was already wrong before JDK-8305636: > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopPredicate.cpp#L598-L612 > > This will only be fixed with the complete fix ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). The proposed fix here just reverts back to the old behavior before JDK-8305636. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15225#pullrequestreview-1572298570 From never at openjdk.org Thu Aug 10 16:53:28 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 10 Aug 2023 16:53:28 GMT Subject: RFR: 8314061: [JVMCI] DeoptimizeALot stress logic breaks deferred barriers In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 01:06:35 GMT, Tom Rodriguez wrote: > JVMCIRuntime::new_array_common includes a little bit of stress logic that changes how it returns when DeoptimizeALot is set. This can cause it to bypass the call to SharedRuntime::on_slowpath_allocation_exit(current) which is where the deferred card mark logic lives. This can lead to random crashes of various kinds. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15218#issuecomment-1673557828 From never at openjdk.org Thu Aug 10 16:53:29 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 10 Aug 2023 16:53:29 GMT Subject: Integrated: 8314061: [JVMCI] DeoptimizeALot stress logic breaks deferred barriers In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 01:06:35 GMT, Tom Rodriguez wrote: > JVMCIRuntime::new_array_common includes a little bit of stress logic that changes how it returns when DeoptimizeALot is set. This can cause it to bypass the call to SharedRuntime::on_slowpath_allocation_exit(current) which is where the deferred card mark logic lives. This can lead to random crashes of various kinds. This pull request has now been integrated. Changeset: 1875b287 Author: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/1875b2872baa566fa11f92006c8eba7642267213 Stats: 4 lines in 1 file changed: 2 ins; 1 del; 1 mod 8314061: [JVMCI] DeoptimizeALot stress logic breaks deferred barriers Reviewed-by: thartmann, dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/15218 From kvn at openjdk.org Thu Aug 10 17:05:58 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Aug 2023 17:05:58 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v4] In-Reply-To: References: Message-ID: <4i5IXOTKEPmaWHmsb5lvsdT-7psgFlUKcs6isBM9rXE=.dbf6919a-b786-4a28-9c84-00793f156647@github.com> On Thu, 10 Aug 2023 16:55:24 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/vm_version_x86.hpp line 689: >> >>> 687: static bool supports_avxonly() { return ((supports_avx2() || supports_avx()) && !supports_evex()); } >>> 688: static bool supports_sha() { return (_features & CPU_SHA) != 0; } >>> 689: static bool supports_fma() { return (_features & CPU_FMA) != 0 && supports_avx(); } >> >> https://github.com/openjdk/jdk/blob/53ca75b18ea419d469758475fac8352bf915b484/src/hotspot/cpu/x86/vm_version_x86.cpp#L1154-L1158 >> implies fma intrinsic can be used without AVX > > https://bugs.openjdk.org/browse/JDK-8181616 added support_avx() check because new Fma vectorization needs AVX: https://cr.openjdk.org/~vdeshpande/8181616/webrev.01/ > Then we hit bug https://bugs.openjdk.org/browse/JDK-8182114 and bandaid it by restoring UseSSE check. > That change came before 8296168 which switch off UseAVX if UseSSE < 4: > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L908 > > This FMA check happens after UseSSE and UseAVX are set. I suggest to remove UseSSE check here instead and keep support_avx(). Saying that. You may remove support_avx() here but you need to add it to assembler vector instructions which have only support_fma() check now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15133#discussion_r1290430146 From kvn at openjdk.org Thu Aug 10 17:05:29 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Aug 2023 17:05:29 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v4] In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 07:09:18 GMT, Yudi Zheng wrote: >> Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup. > > src/hotspot/cpu/x86/vm_version_x86.hpp line 689: > >> 687: static bool supports_avxonly() { return ((supports_avx2() || supports_avx()) && !supports_evex()); } >> 688: static bool supports_sha() { return (_features & CPU_SHA) != 0; } >> 689: static bool supports_fma() { return (_features & CPU_FMA) != 0 && supports_avx(); } > > https://github.com/openjdk/jdk/blob/53ca75b18ea419d469758475fac8352bf915b484/src/hotspot/cpu/x86/vm_version_x86.cpp#L1154-L1158 > implies fma intrinsic can be used without AVX https://bugs.openjdk.org/browse/JDK-8181616 added support_avx() check because new Fma vectorization needs AVX: https://cr.openjdk.org/~vdeshpande/8181616/webrev.01/ Then we hit bug https://bugs.openjdk.org/browse/JDK-8182114 and bandaid it by restoring UseSSE check. That change came before 8296168 which switch off UseAVX if UseSSE < 4: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L908 This FMA check happens after UseSSE and UseAVX are set. I suggest to remove UseSSE check here instead and keep support_avx(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15133#discussion_r1290424028 From kvn at openjdk.org Thu Aug 10 17:07:28 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 10 Aug 2023 17:07:28 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v2] In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 07:07:02 GMT, Yudi Zheng wrote: >> Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> update is_intrinsic_supported for _dcopySign,_fcopySign. > > src/hotspot/share/opto/c2compiler.cpp line 237: > >> 235: case vmIntrinsics::_electronicCodeBook_decryptAESCrypt: >> 236: if (StubRoutines::electronicCodeBook_decryptAESCrypt() == nullptr) return false; >> 237: break; > > These two intrinsics were marked as supported on non-x86 platforms where the underlying stubs are not generated good catch ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15133#discussion_r1290397831 From never at openjdk.org Thu Aug 10 17:26:28 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 10 Aug 2023 17:26:28 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension [v4] In-Reply-To: References: Message-ID: > Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: LibJVMCICompilerThreadHidden should just be EXPERIMENTAL ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14799/files - new: https://git.openjdk.org/jdk/pull/14799/files/334b0347..124e7283 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14799&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14799&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14799/head:pull/14799 PR: https://git.openjdk.org/jdk/pull/14799 From dnsimon at openjdk.org Thu Aug 10 17:50:58 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Aug 2023 17:50:58 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension [v4] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 17:26:28 GMT, Tom Rodriguez wrote: >> Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. > > Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: > > LibJVMCICompilerThreadHidden should just be EXPERIMENTAL Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14799#pullrequestreview-1572397856 From jvernee at openjdk.org Thu Aug 10 17:57:01 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 10 Aug 2023 17:57:01 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v2] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - 8313894: Rename isTrivial linker option to critical Reviewed-by: pminborg, mcimadamore - 8313680: Disallow combining caputreCallState with isTrivial Reviewed-by: mcimadamore - Merge branch 'master' into JEP22 - use immutable map for fallback linker canonical layouts - 8313265: Move the FFM API out of preview Reviewed-by: mcimadamore - 8313005: Ensure native access check can fold away Reviewed-by: mcimadamore - 8312981: Make the linker API required Reviewed-by: mcimadamore - 8312615: Ensure jdk_foreign tests pass on linux-x86 Reviewed-by: mcimadamore - 8312186: TestStringEncodingFails for UTF-32 Reviewed-by: mcimadamore - 8312059: Clarify the documention for variadic functions 8310646: Javadoc around prototype-less functions might be incorrect Reviewed-by: mcimadamore - ... and 7 more: https://git.openjdk.org/jdk/compare/23fe2ece...74bbe721 ------------- Changes: https://git.openjdk.org/jdk/pull/15103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=01 Stats: 2817 lines in 230 files changed: 1239 ins; 894 del; 684 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From jvernee at openjdk.org Thu Aug 10 17:57:03 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 10 Aug 2023 17:57:03 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 10:29:06 GMT, Jorn Vernee wrote: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Addressing some offline review comments. Two more (small) changes have been added: - Disallow combining the `captureCallState` and `isTrivial` (see https://github.com/openjdk/panama-foreign/pull/856) - Rename `isTrivial` to `critical` (see https://github.com/openjdk/panama-foreign/pull/859) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1673627002 From jbhateja at openjdk.org Thu Aug 10 18:22:59 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Aug 2023 18:22:59 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v6] In-Reply-To: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> References: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> Message-ID: On Tue, 8 Aug 2023 14:02:08 GMT, Tobias Hotz wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Add a side effect to the IR tests to make sure we do not emit CMOVs there > > Without Tiered Compilation, no profile data is present, which means a CMOV would always be emitted. Keep the compiler from doing that, as the peephole currently does not work with CMOV instructions src/hotspot/cpu/x86/peephole_x86_64.cpp line 175: > 173: // It checks the required EFLAGS for the downstream instructions of the TEST > 174: // and removes the TEST if the preceding instructions already sets all these flags > 175: bool Peephole::test_may_remove(Block* block, int block_index, PhaseCFG* cfg_, PhaseRegAlloc* ra_, FYI, Processor frontend decoders employ macro-fusions techniques to emit single micro operation for TEST + JMP patterns and many more, please refer to 3.4.2.2 of [Optimization manual](https://cdrdv2.intel.com/v1/dl/getContent/671488?explicitVersion=true). So effect of this optimization may not be evident in most general use cases where a JMP succeeds TEST. This is more of a size oriented optimization where one of the two back to back FLAGS effecting instructions are removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1289502680 From jbhateja at openjdk.org Thu Aug 10 18:23:58 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 10 Aug 2023 18:23:58 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set [v6] In-Reply-To: References: <2Vl2c9rrW0AIEWr_t-7Wn0yZUHpqA9jDBDcOoADsxs8=.139baa19-a81b-4d45-80e6-d7017376d26a@github.com> Message-ID: On Thu, 10 Aug 2023 03:31:48 GMT, Jatin Bhateja wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Add a side effect to the IR tests to make sure we do not emit CMOVs there >> >> Without Tiered Compilation, no profile data is present, which means a CMOV would always be emitted. Keep the compiler from doing that, as the peephole currently does not work with CMOV instructions > > src/hotspot/cpu/x86/peephole_x86_64.cpp line 175: > >> 173: // It checks the required EFLAGS for the downstream instructions of the TEST >> 174: // and removes the TEST if the preceding instructions already sets all these flags >> 175: bool Peephole::test_may_remove(Block* block, int block_index, PhaseCFG* cfg_, PhaseRegAlloc* ra_, > > FYI, Processor frontend decoders employ macro-fusions techniques to emit single micro operation for TEST + JMP patterns and many more, please refer to 3.4.2.2 of [Optimization manual](https://cdrdv2.intel.com/v1/dl/getContent/671488?explicitVersion=true). So effect of this optimization may not be evident in most general use cases where a JMP succeeds TEST. This is more of a size oriented optimization where one of the two back to back FLAGS effecting instructions are removed. I was curious to see impact of this patch on micro fusion. Sapphire rapids has an explicit event for it and shows 3x drop in number of micro fused operations with attached test, not much impact on performance. [micro.txt](https://github.com/openjdk/jdk/files/12312909/micro.txt) ![PR14172](https://github.com/openjdk/jdk/assets/59989778/b51f9df9-a6a8-4dae-aa17-68c4cc2c0f35) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1290502246 From dnsimon at openjdk.org Thu Aug 10 19:10:28 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Aug 2023 19:10:28 GMT Subject: RFR: 8313899: JVMCI exception Translation can fail in TranslatedException. [v2] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 13:56:37 GMT, Doug Simon wrote: >> In a test that stresses metaspace (such as `vmTestbase/vm/mlvm/hiddenloader/stress/oome/metaspace/Test.java`) that also uses `-Xcomp -XX:-TieredCompilation`, we've seen a failure in `TranslatedException.` due to exhausted metaspace: >> >> java.lang.OutOfMemoryError: Metaspace >> at jdk.internal.vm.TranslatedException.encodeThrowable(java.base at 21/TranslatedException.java:176) >> at jdk.internal.vm.TranslatedException.(java.base at 21/TranslatedException.java:61) >> at jdk.internal.vm.VMSupport.encodeThrowable(java.base at 21/VMSupport.java:171) >> >> This PR pushes a fix such that this exception is properly handled in the VM (i.e. causing a compilation bailout) instead of leading to a VM crash. >> >> The PR includes 2 bits of debug code guarded by system properties that enable the handling to be tested in libgraal. The test itself is not included as libgraal is not part of OpenJDK. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - guard test-only code with ASSERT instead of !PRODUCT > - omit test-only code in product build Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15198#issuecomment-1673739930 From dnsimon at openjdk.org Thu Aug 10 19:10:29 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Aug 2023 19:10:29 GMT Subject: Integrated: 8313899: JVMCI exception Translation can fail in TranslatedException. In-Reply-To: References: Message-ID: On Tue, 8 Aug 2023 20:52:29 GMT, Doug Simon wrote: > In a test that stresses metaspace (such as `vmTestbase/vm/mlvm/hiddenloader/stress/oome/metaspace/Test.java`) that also uses `-Xcomp -XX:-TieredCompilation`, we've seen a failure in `TranslatedException.` due to exhausted metaspace: > > java.lang.OutOfMemoryError: Metaspace > at jdk.internal.vm.TranslatedException.encodeThrowable(java.base at 21/TranslatedException.java:176) > at jdk.internal.vm.TranslatedException.(java.base at 21/TranslatedException.java:61) > at jdk.internal.vm.VMSupport.encodeThrowable(java.base at 21/VMSupport.java:171) > > This PR pushes a fix such that this exception is properly handled in the VM (i.e. causing a compilation bailout) instead of leading to a VM crash. > > The PR includes 2 bits of debug code guarded by system properties that enable the handling to be tested in libgraal. The test itself is not included as libgraal is not part of OpenJDK. This pull request has now been integrated. Changeset: 6f5c903d Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/6f5c903d10aa5f7ff979a79f121609c167f88eff Stats: 60 lines in 6 files changed: 58 ins; 1 del; 1 mod 8313899: JVMCI exception Translation can fail in TranslatedException. Reviewed-by: never, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/15198 From jvernee at openjdk.org Thu Aug 10 20:43:28 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 10 Aug 2023 20:43:28 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v3] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - enable fallback linker on linux x86 in GHA - make Arena::allocate abstract ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/74bbe721..147e79d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=01-02 Stats: 20 lines in 4 files changed: 9 ins; 7 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From bpb at openjdk.org Thu Aug 10 21:49:58 2023 From: bpb at openjdk.org (Brian Burkhalter) Date: Thu, 10 Aug 2023 21:49:58 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v3] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 20:43:28 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - enable fallback linker on linux x86 in GHA > - make Arena::allocate abstract The few, simple NIO changes are fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1673943826 From liach at openjdk.org Thu Aug 10 23:46:58 2023 From: liach at openjdk.org (Chen Liang) Date: Thu, 10 Aug 2023 23:46:58 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v3] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 20:43:28 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - enable fallback linker on linux x86 in GHA > - make Arena::allocate abstract Just curious, what's the rationale for finalizing the API when there are significant changes from the last preview? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1674051619 From iklam at openjdk.org Fri Aug 11 01:47:28 2023 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 11 Aug 2023 01:47:28 GMT Subject: RFR: 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp Message-ID: This PR updates Java code in JVMCI to match the C code changes in [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996) that modified the constant pool layout. Essentially, the indices after a getfield/putfield/getstatic/putstatic bytecode is no longer a CpCacheIndex, but an index for `ConstantPool::resolved_field_entry_at(int field_index)`. The assertion (and subsequent crash) happen only in debug builds. Out of pure luck, in product build JVMCI produces the correct output even after [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996), although the code was doing the wrong thing. This PR has (so far) two commits: - 6527e67b1832087d180d2b50b65aaeca2e244c28 The actual functional change to use the `rawIndex` that follows a field bytecode. - c322b8e71d4d9e33bd065e64420101010f9127fc Fixed incorrectly named parameters and variables in the JVMCI code and JavaDoc. In most cases, `cpi` needs to be changed to `rawIndex` to reflect the correct type of the index. To help reviewing, I am limiting the renaming to just those affected by the field changes (without the renames, it's hard to validate that I am actually doing the right thing). There are still some cases of `cpi` that need to be changed to `rawIndex`. I will fix those in a separate RFE. E.g. in ConstantPool.java: default JavaMethod lookupMethod(int cpi, int opcode) { return lookupMethod(cpi, opcode, null); } ------------- Commit messages: - (no actual code changes) Fixed variable names and javadoc so that changes in this PR are consistent - 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp Changes: https://git.openjdk.org/jdk/pull/15237/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15237&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314078 Stats: 123 lines in 5 files changed: 62 ins; 8 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/15237.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15237/head:pull/15237 PR: https://git.openjdk.org/jdk/pull/15237 From thartmann at openjdk.org Fri Aug 11 06:48:29 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Aug 2023 06:48:29 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v2] In-Reply-To: <5EkrQNxjV5u0NIlfLg8BvCsLLYK-7qNMkT46dFA4_RU=.1ba35d62-0d2c-4765-9662-563751c4fd5c@github.com> References: <5EkrQNxjV5u0NIlfLg8BvCsLLYK-7qNMkT46dFA4_RU=.1ba35d62-0d2c-4765-9662-563751c4fd5c@github.com> Message-ID: On Thu, 10 Aug 2023 10:55:54 GMT, Emanuel Peter wrote: >> Hi @eme64 , >> >> Thanks for looking at this. >> >>> @pfustc Why do you only test correctness (compare results) in some conditions? Is there not a risk that we miss doing it in some cases we should do it, just because we get the conditions slightly wrong? >> >> Yes, you are right! These conditions are added before to avoid jtreg hanging when compilation is locked. But now I can remove them because the lock is removed. In my latest commit, I have removed the conditions and some useless imports. >> >>> Just FYI: we should integrate this whole correctness of results testing into the IR framework. I filed [JDK-8310533](https://bugs.openjdk.org/browse/JDK-8310533). That would make it easier to use for new tests. It could also be used for any test, not just the ones located in test/hotspot/jtreg/compiler/vectorization. >> >> I have noticed this JBS before. The reason I didn't added correctness check into the IR framework is that I implemented this kind of check before the IR framework exists. (We have used it internally for a few years.) But anyway, it is a good proposal and I'm willing to help if needed. > > @pfustc This looks good to me, thanks for making these changes. It will really increase the coverage. > Maybe @vnkozlov should quickly look at it again if he still agrees. > @TobiHartmann is running the testing again, just in case the dropped conditions change something. > I will give you my approval after those tests are passing. All tests passed (@eme64). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1674272896 From thartmann at openjdk.org Fri Aug 11 06:47:29 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Aug 2023 06:47:29 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v4] In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 09:13:28 GMT, Pengfei Li wrote: >> This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. >> >> To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. >> >> Tested various of compiler control related VM flags on x86 and AArch64. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Remove useless conditions and imports Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15011#pullrequestreview-1573072563 From epeter at openjdk.org Fri Aug 11 08:04:28 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 11 Aug 2023 08:04:28 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v4] In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 09:13:28 GMT, Pengfei Li wrote: >> This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. >> >> To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. >> >> Tested various of compiler control related VM flags on x86 and AArch64. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Remove useless conditions and imports Thanks @TobiHartmann . -> approved ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15011#pullrequestreview-1573165970 From chagedorn at openjdk.org Fri Aug 11 08:33:29 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Aug 2023 08:33:29 GMT Subject: RFR: 8314106: C2: assert(is_valid()) failed: must be valid after JDK-8305636 In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 13:35:46 GMT, Christian Hagedorn wrote: > In the failing test case, we are unswitching a loop for which we've already removed Parse Predicates with `Compile::cleanup_parse_predicates()`. We are wrongly checking if a predicate block is non-empty (i.e. find the Parse **or** Runtime Predicates) instead of only checking if we find the Parse Predicate: > https://github.com/openjdk/jdk/blob/23fe2ece586d3ed750e905e1b71a2cd1da91f335/src/hotspot/share/opto/loopPredicate.cpp#L448-L453 > > In the test case, we have a predicate block that contains Runtime Predicates from Loop Predication but no Parse Predicate anymore. Therefore, when trying to clone the non-existing Parse Predicate, we fail with the assertion because we do not have a valid Parse Predicate. > > The fix is to only clone a Parse Predicate and the Assertion Predicates for a predicate block if the Parse Predicate is actually there. This is not entirely correct because we would also need to clone Assertion Predicates in the absence of Parse Predicates. But this was already wrong before JDK-8305636: > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopPredicate.cpp#L598-L612 > > This will only be fixed with the complete fix ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). The proposed fix here just reverts back to the old behavior before JDK-8305636. > > Thanks, > Christian Thanks Tobias and Vladimir for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15225#issuecomment-1674374719 From chagedorn at openjdk.org Fri Aug 11 08:36:03 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Aug 2023 08:36:03 GMT Subject: RFR: 8314116: C2: assert(false) failed: malformed control flow after JDK-8305636 Message-ID: In the test case, a Template Assertion Predicate is not removed when a loop is dying. After applying more loop opts, it ends up above a different loop. We then peel this loop and create an Initialized Assertion Predicate from the template which gets completely unrelated values from the already removed loop. This causes some nodes to die and we end up with a broken graph. This problem of not removing Template Assertion Predicates of dying loops which end up at different loops was already known before JDK-8305636 (see [analysis in JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428?focusedCommentId=14571901&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14571901)). Apparently, with JDK-8305636, this became much more likely because I've accidentally already included a fix for Loop Peeling when moving some refactorings from the complete fix ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)) to JDK-8305636. The included fix creates Initialized Assertion Predicates when peeling a loop even though Parse Predicates have already been removed. This needs to be done eventually but seems to trigger JDK-8305428 more often with a different manifestation. I therefore suggest to revert Loop Peeling back to the old state: https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L781-L789 https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L2068-L2075 where we only create Initialized Assertion Predicates if there are actually Parse Predicates available. Thanks, Christian ------------- Commit messages: - 8314116: C2: assert(false) failed: malformed control flow after JDK-8305636 Changes: https://git.openjdk.org/jdk/pull/15244/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15244&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314116 Stats: 86 lines in 2 files changed: 85 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15244/head:pull/15244 PR: https://git.openjdk.org/jdk/pull/15244 From chagedorn at openjdk.org Fri Aug 11 09:01:28 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 11 Aug 2023 09:01:28 GMT Subject: RFR: 8314116: C2: assert(false) failed: malformed control flow after JDK-8305636 In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 08:13:57 GMT, Christian Hagedorn wrote: > In the test case, a Template Assertion Predicate is not removed when a loop is dying. After applying more loop opts, it ends up above a different loop. We then peel this loop and create an Initialized Assertion Predicate from the template which gets completely unrelated values from the already removed loop. This causes some nodes to die and we end up with a broken graph. > > This problem of not removing Template Assertion Predicates of dying loops which end up at different loops was already known before JDK-8305636 (see [analysis in JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428?focusedCommentId=14571901&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14571901)). > > Apparently, with JDK-8305636, this became much more likely because I've accidentally already included a fix for Loop Peeling when moving some refactorings from the complete fix ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)) to JDK-8305636. The included fix creates Initialized Assertion Predicates when peeling a loop even though Parse Predicates have already been removed. This needs to be done eventually but seems to trigger JDK-8305428 more often with a different manifestation. > > I therefore suggest to revert Loop Peeling back to the old state: > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L781-L789 > > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L2068-L2075 > > where we only create Initialized Assertion Predicates if there are actually Parse Predicates available. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15244#issuecomment-1674408031 From thartmann at openjdk.org Fri Aug 11 09:00:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 11 Aug 2023 09:00:58 GMT Subject: RFR: 8314116: C2: assert(false) failed: malformed control flow after JDK-8305636 In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 08:13:57 GMT, Christian Hagedorn wrote: > In the test case, a Template Assertion Predicate is not removed when a loop is dying. After applying more loop opts, it ends up above a different loop. We then peel this loop and create an Initialized Assertion Predicate from the template which gets completely unrelated values from the already removed loop. This causes some nodes to die and we end up with a broken graph. > > This problem of not removing Template Assertion Predicates of dying loops which end up at different loops was already known before JDK-8305636 (see [analysis in JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428?focusedCommentId=14571901&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14571901)). > > Apparently, with JDK-8305636, this became much more likely because I've accidentally already included a fix for Loop Peeling when moving some refactorings from the complete fix ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)) to JDK-8305636. The included fix creates Initialized Assertion Predicates when peeling a loop even though Parse Predicates have already been removed. This needs to be done eventually but seems to trigger JDK-8305428 more often with a different manifestation. > > I therefore suggest to revert Loop Peeling back to the old state: > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L781-L789 > > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L2068-L2075 > > where we only create Initialized Assertion Predicates if there are actually Parse Predicates available. > > Thanks, > Christian Looks good and trivial to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15244#pullrequestreview-1573258083 From epeter at openjdk.org Fri Aug 11 11:30:29 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 11 Aug 2023 11:30:29 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v27] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: enable equal count comparison, tighten default cascade lake special casing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/48fa52ba..8b2b8f2d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=25-26 Stats: 494 lines in 12 files changed: 140 ins; 171 del; 183 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Fri Aug 11 11:44:29 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 11 Aug 2023 11:44:29 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v28] In-Reply-To: References: Message-ID: > **TODO** tests running with Cascade Lake simulation. Remove TODO's and rerun tests afterwards. > > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `f... Emanuel Peter has updated the pull request incrementally with three additional commits since the last revision: - fix copyright format - fix whitespace issues - fix TestSafepointWhilePrinting.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/8b2b8f2d..a7834f8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=26-27 Stats: 8 lines in 3 files changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From dnsimon at openjdk.org Fri Aug 11 12:27:30 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 11 Aug 2023 12:27:30 GMT Subject: RFR: 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 01:15:01 GMT, Ioi Lam wrote: > This PR updates Java code in JVMCI to match the C code changes in [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996) that modified the constant pool layout. Essentially, the indices after a getfield/putfield/getstatic/putstatic bytecode is no longer a CpCacheIndex, but an index for `ConstantPool::resolved_field_entry_at(int field_index)`. > > The assertion (and subsequent crash) happen only in debug builds. Out of pure luck, in product build JVMCI produces the correct output even after [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996), although the code was doing the wrong thing. > > This PR has (so far) two commits: > > - 6527e67b1832087d180d2b50b65aaeca2e244c28 The actual functional change to use the `rawIndex` that follows a field bytecode. > - c322b8e71d4d9e33bd065e64420101010f9127fc Fixed incorrectly named parameters and variables in the JVMCI code and JavaDoc. In most cases, `cpi` needs to be changed to `rawIndex` to reflect the correct type of the index. > > To help reviewing, I am limiting the renaming to just those affected by the field changes (without the renames, it's hard to validate that I am actually doing the right thing). > > There are still some cases of `cpi` that need to be changed to `rawIndex`. I will fix those in a separate RFE. E.g. in ConstantPool.java: > > > default JavaMethod lookupMethod(int cpi, int opcode) { > return lookupMethod(cpi, opcode, null); > } Marked as reviewed by dnsimon (Reviewer). Thanks for doing this Ioi. In this PR or the follow-up renaming RFE, could you please add a "decoder ring" comment to the javadoc for ConstantPool. An incomplete example: * The following terminology is used when indexing a constant pool entry: *
    *
  • rawIndex - index in the bytecode stream after the opcode (could be rewritten for some bytecodes)
  • *
  • cpi - the class file constant pool index
  • *
  • cpci - a constant pool cache index
  • *
------------- PR Review: https://git.openjdk.org/jdk/pull/15237#pullrequestreview-1573540144 PR Comment: https://git.openjdk.org/jdk/pull/15237#issuecomment-1674643799 From jvernee at openjdk.org Fri Aug 11 12:37:59 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 11 Aug 2023 12:37:59 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v4] In-Reply-To: References: Message-ID: <2oLXen9Xs8SSrIgV9q_2KV0CA0ioUIdUZDqtsSkQ-BI=.b30a8a5b-1f03-480e-8d53-9f107ab35d64@github.com> > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: remove spurious imports ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/147e79d3..141096b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From fbredberg at openjdk.org Fri Aug 11 14:15:33 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 11 Aug 2023 14:15:33 GMT Subject: RFR: 8313419: Template interpreter produces no safepoint check for return bytecodes Message-ID: The template interpreter produces a safepoint check for return bytecodes (TemplateTable::_return(TosState state)) on x86, ppc64le and s390, but not on aarch64, arm32, and riscv64. This PR adds the missing safepoint check to aarch64, arm32, and riscv64. Tested tier1-tier7 on aarch64. Both arm32, and riscv64 was sanity tested using Qemu. ------------- Commit messages: - 8313419: Template interpreter produces no safepoint check for return bytecodes Changes: https://git.openjdk.org/jdk/pull/15248/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15248&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313419 Stats: 35 lines in 3 files changed: 35 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15248/head:pull/15248 PR: https://git.openjdk.org/jdk/pull/15248 From fbredberg at openjdk.org Fri Aug 11 14:15:34 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 11 Aug 2023 14:15:34 GMT Subject: RFR: 8313419: Template interpreter produces no safepoint check for return bytecodes In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 13:22:19 GMT, Fredrik Bredberg wrote: > The template interpreter produces a safepoint check for return bytecodes (TemplateTable::_return(TosState state)) on x86, ppc64le and s390, but not on aarch64, arm32, and riscv64. > > This PR adds the missing safepoint check to aarch64, arm32, and riscv64. > > Tested tier1-tier7 on aarch64. Both arm32, and riscv64 was sanity tested using Qemu. I've done basic testing on riscv64 and arm32 using Qemu, but would appreciate if @RealFYang and @bulasevich could take it for a real test drive. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15248#issuecomment-1674827460 From yzheng at openjdk.org Fri Aug 11 15:20:59 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 11 Aug 2023 15:20:59 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v5] In-Reply-To: References: Message-ID: > This PR exports `vmIntrinsic::is_intrinsic_available`, `Compiler::is_intrinsic_supported`, and `C2Compiler::is_intrinsic_supported` results to JVMCI compiler. This allows JVMCI compiler to comply with `-XX:DisableIntrinsic`, `-XX:ControlIntrinsic`, `-XX:-UseXXXIntrinsic`, and is essential for running test that depends on these flags, e.g., `java/lang/Float/Binary16ConversionNaN` that returns different result in the interpreter with `-XX:DisableIntrinsic=_float16ToFloat,_floatToFloat16`. > This PR also attempts to fix some of the `is_intrinsic_available` results. Please see the inlined comments. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: revert change in supports_fma. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15133/files - new: https://git.openjdk.org/jdk/pull/15133/files/1d2b83e6..028aaf5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15133&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15133&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15133/head:pull/15133 PR: https://git.openjdk.org/jdk/pull/15133 From yzheng at openjdk.org Fri Aug 11 15:22:58 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 11 Aug 2023 15:22:58 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v5] In-Reply-To: <4i5IXOTKEPmaWHmsb5lvsdT-7psgFlUKcs6isBM9rXE=.dbf6919a-b786-4a28-9c84-00793f156647@github.com> References: <4i5IXOTKEPmaWHmsb5lvsdT-7psgFlUKcs6isBM9rXE=.dbf6919a-b786-4a28-9c84-00793f156647@github.com> Message-ID: On Thu, 10 Aug 2023 17:00:30 GMT, Vladimir Kozlov wrote: >> https://bugs.openjdk.org/browse/JDK-8181616 added support_avx() check because new Fma vectorization needs AVX: https://cr.openjdk.org/~vdeshpande/8181616/webrev.01/ >> Then we hit bug https://bugs.openjdk.org/browse/JDK-8182114 and bandaid it by restoring UseSSE check. >> That change came before 8296168 which switch off UseAVX if UseSSE < 4: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L908 >> >> This FMA check happens after UseSSE and UseAVX are set. I suggest to remove UseSSE check here instead and keep support_avx(). > > Saying that. You may remove support_avx() here but you need to add it to assembler vector instructions which have only support_fma() check now. Thanks for the references! I have reverted this change and will adjust the Graal intrinsic accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15133#discussion_r1291448160 From epeter at openjdk.org Fri Aug 11 16:03:29 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 11 Aug 2023 16:03:29 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v29] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 77 commits: - Merge branch 'master' into JDK-8310308 - take out cascade lake simulation - fix copyright format - fix whitespace issues - fix TestSafepointWhilePrinting.java - enable equal count comparison, tighten default cascade lake special casing - manual merge from master - duplicate rules in VectorLogicalOpIdentityTest.java - Merge branch 'master' into JDK-8310308 - Duplicated =1 counts for vector nodes in compiler/vectorapi/reshape/tests/TestVectorCast.java - ... and 67 more: https://git.openjdk.org/jdk/compare/12326770...ebeb5898 ------------- Changes: https://git.openjdk.org/jdk/pull/14539/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=28 Stats: 3532 lines in 70 files changed: 1464 ins; 21 del; 2047 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From coleenp at openjdk.org Fri Aug 11 16:10:58 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 11 Aug 2023 16:10:58 GMT Subject: RFR: 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 01:15:01 GMT, Ioi Lam wrote: > This PR updates Java code in JVMCI to match the C code changes in [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996) that modified the constant pool layout. Essentially, the indices after a getfield/putfield/getstatic/putstatic bytecode is no longer a CpCacheIndex, but an index for `ConstantPool::resolved_field_entry_at(int field_index)`. > > The assertion (and subsequent crash) happen only in debug builds. Out of pure luck, in product build JVMCI produces the correct output even after [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996), although the code was doing the wrong thing. > > This PR has (so far) two commits: > > - 6527e67b1832087d180d2b50b65aaeca2e244c28 The actual functional change to use the `rawIndex` that follows a field bytecode. > - c322b8e71d4d9e33bd065e64420101010f9127fc Fixed incorrectly named parameters and variables in the JVMCI code and JavaDoc. In most cases, `cpi` needs to be changed to `rawIndex` to reflect the correct type of the index. > > To help reviewing, I am limiting the renaming to just those affected by the field changes (without the renames, it's hard to validate that I am actually doing the right thing). > > There are still some cases of `cpi` that need to be changed to `rawIndex`. I will fix those in a separate RFE. E.g. in ConstantPool.java: > > > default JavaMethod lookupMethod(int cpi, int opcode) { > return lookupMethod(cpi, opcode, null); > } This looks good. I don't know how we missed it so thank you for fixing this. Thank you for fixing the variable names, which could have been part of the confusion. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15237#pullrequestreview-1573971843 From kvn at openjdk.org Fri Aug 11 16:22:08 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 11 Aug 2023 16:22:08 GMT Subject: RFR: 8314116: C2: assert(false) failed: malformed control flow after JDK-8305636 In-Reply-To: References: Message-ID: <581NkorFYy9DeXPnyEQNB6UU-BjZeKtWgbdpIpLcAX8=.c2d873fd-21b4-449f-8843-a0c15928cbd2@github.com> On Fri, 11 Aug 2023 08:13:57 GMT, Christian Hagedorn wrote: > In the test case, a Template Assertion Predicate is not removed when a loop is dying. After applying more loop opts, it ends up above a different loop. We then peel this loop and create an Initialized Assertion Predicate from the template which gets completely unrelated values from the already removed loop. This causes some nodes to die and we end up with a broken graph. > > This problem of not removing Template Assertion Predicates of dying loops which end up at different loops was already known before JDK-8305636 (see [analysis in JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428?focusedCommentId=14571901&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14571901)). > > Apparently, with JDK-8305636, this became much more likely because I've accidentally already included a fix for Loop Peeling when moving some refactorings from the complete fix ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)) to JDK-8305636. The included fix creates Initialized Assertion Predicates when peeling a loop even though Parse Predicates have already been removed. This needs to be done eventually but seems to trigger JDK-8305428 more often with a different manifestation. > > I therefore suggest to revert Loop Peeling back to the old state: > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L781-L789 > > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L2068-L2075 > > where we only create Initialized Assertion Predicates if there are actually Parse Predicates available. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15244#pullrequestreview-1574004777 From jvernee at openjdk.org Fri Aug 11 16:53:58 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 11 Aug 2023 16:53:58 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v3] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 23:31:57 GMT, Chen Liang wrote: > Just curious, what's the rationale for finalizing the API when there are significant changes from the last preview? A preview API is finalized when it is ready. The preview process, as outlined by [JEP 12](https://bugs.openjdk.org/browse/JDK-8195734), does not place a mandate on the amount of changes that a JEP that finalizes a preview API should or should not contain. It only requires that the changes since the last preview iteration are noted (which we have done). Though, the amount of changes can be used to inform the decision to finalize. We feel that the FFM API is ready for finalization, and does not require another round of preview. In this case in particular: previous iterations contained significant changes to the API, including re-shuffling of some of the core APIs. (See e.g. https://github.com/openjdk/jdk/pull/13079#issuecomment-1476648707) In contrast this JEP contains mostly superficial changes to the API, that are not likely to impact how a client would write a program using the FFM API. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1675057357 From uschindler at openjdk.org Fri Aug 11 17:23:28 2023 From: uschindler at openjdk.org (Uwe Schindler) Date: Fri, 11 Aug 2023 17:23:28 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v4] In-Reply-To: <2oLXen9Xs8SSrIgV9q_2KV0CA0ioUIdUZDqtsSkQ-BI=.b30a8a5b-1f03-480e-8d53-9f107ab35d64@github.com> References: <2oLXen9Xs8SSrIgV9q_2KV0CA0ioUIdUZDqtsSkQ-BI=.b30a8a5b-1f03-480e-8d53-9f107ab35d64@github.com> Message-ID: On Fri, 11 Aug 2023 12:37:59 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > remove spurious imports > > Just curious, what's the rationale for finalizing the API when there are significant changes from the last preview? > > A preview API is finalized when it is ready. The preview process, as outlined by [JEP 12](https://bugs.openjdk.org/browse/JDK-8195734), does not place a mandate on the amount of changes that a JEP that finalizes a preview API should or should not contain. It only requires that the changes since the last preview iteration are noted (which we have done). Though, the amount of changes can be used to inform the decision to finalize. We feel that the FFM API is ready for finalization, and does not require another round of preview. > > In this case in particular: previous iterations contained significant changes to the API, including re-shuffling of some of the core APIs. (See e.g. [#13079 (comment)](https://github.com/openjdk/jdk/pull/13079#issuecomment-1476648707)) In contrast this JEP contains mostly superficial changes to the API, that are not likely to impact how a client would write a program using the FFM API. In addition if somebody wrote code against the preview API, heshe needs to update it anyways because the class files are marked by preview flag. So all is fine. We just make API ready to use for everybody and therefor all early adopters need to adapt. t won't affect anybox else. To me it would be strange if code goes out of preview without changes, because if there are no changes why was it in preview then in the last JDK feature version? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1675101701 From iklam at openjdk.org Fri Aug 11 20:28:58 2023 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 11 Aug 2023 20:28:58 GMT Subject: RFR: 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp [v2] In-Reply-To: References: Message-ID: > This PR updates Java code in JVMCI to match the C code changes in [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996) that modified the constant pool layout. Essentially, the indices after a getfield/putfield/getstatic/putstatic bytecode is no longer a CpCacheIndex, but an index for `ConstantPool::resolved_field_entry_at(int field_index)`. > > The assertion (and subsequent crash) happen only in debug builds. Out of pure luck, in product build JVMCI produces the correct output even after [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996), although the code was doing the wrong thing. > > This PR has (so far) two commits: > > - 6527e67b1832087d180d2b50b65aaeca2e244c28 The actual functional change to use the `rawIndex` that follows a field bytecode. > - c322b8e71d4d9e33bd065e64420101010f9127fc Fixed incorrectly named parameters and variables in the JVMCI code and JavaDoc. In most cases, `cpi` needs to be changed to `rawIndex` to reflect the correct type of the index. > > To help reviewing, I am limiting the renaming to just those affected by the field changes (without the renames, it's hard to validate that I am actually doing the right thing). > > There are still some cases of `cpi` that need to be changed to `rawIndex`. I will fix those in a separate RFE. E.g. in ConstantPool.java: > > > default JavaMethod lookupMethod(int cpi, int opcode) { > return lookupMethod(cpi, opcode, null); > } Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @dougxc review: Added comments about rawIndex vs cpi ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15237/files - new: https://git.openjdk.org/jdk/pull/15237/files/c322b8e7..21976c06 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15237&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15237&range=00-01 Stats: 11 lines in 1 file changed: 8 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15237.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15237/head:pull/15237 PR: https://git.openjdk.org/jdk/pull/15237 From iklam at openjdk.org Fri Aug 11 20:29:30 2023 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 11 Aug 2023 20:29:30 GMT Subject: RFR: 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 12:06:36 GMT, Doug Simon wrote: > Thanks for doing this Ioi. > > In this PR or the follow-up renaming RFE, could you please add a "decoder ring" comment to the javadoc for ConstantPool. An incomplete example: > > ``` > * The following terminology is used when indexing a constant pool entry: > *
    > *
  • rawIndex - index in the bytecode stream after the opcode (could be rewritten for some bytecodes)
  • > *
  • cpi - the class file constant pool index
  • > *
  • cpci - a constant pool cache index
  • > *
> ``` Hi Doug, thanks for the review. I added the comments into ConstantPool.java. I omitted cpci as it's not part of the API. I hope to add more details in the comments when fixing the other incorrectly named variables in [JDK-8314172](https://bugs.openjdk.org/browse/JDK-8314172) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15237#issuecomment-1675311831 From sviswanathan at openjdk.org Fri Aug 11 21:18:58 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 11 Aug 2023 21:18:58 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v20] In-Reply-To: <76QbMpTJL41HzLBGljF4qze4cGI6JR9hVYvqbnqc2I0=.32b9f770-04e1-4d7f-913e-5e36ff2a96b6@github.com> References: <76QbMpTJL41HzLBGljF4qze4cGI6JR9hVYvqbnqc2I0=.32b9f770-04e1-4d7f-913e-5e36ff2a96b6@github.com> Message-ID: On Mon, 7 Aug 2023 21:01:51 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change names from avx512 to x86_64 src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 391: > 389: } calls[] = { > 390: "arraysort_stub", > 391: { { TypeFunc::Parms, ShenandoahLoad }, { TypeFunc::Parms+1, ShenandoahStore }, { -1, ShenandoahNone }, Only the first parameter is array. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1291772928 From duke at openjdk.org Fri Aug 11 22:26:58 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 11 Aug 2023 22:26:58 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v21] In-Reply-To: References: Message-ID: <27d0xFfwH6b5ZXr6_WIQRvng_t7BLWy5TPdcMT2ZUBI=.717a94c7-6a1b-4368-85c9-52c5690ac611@github.com> > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Fix signature for Shenandoah support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/c49657ee..58467994 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=19-20 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Fri Aug 11 22:31:28 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 11 Aug 2023 22:31:28 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v20] In-Reply-To: References: <76QbMpTJL41HzLBGljF4qze4cGI6JR9hVYvqbnqc2I0=.32b9f770-04e1-4d7f-913e-5e36ff2a96b6@github.com> Message-ID: On Fri, 11 Aug 2023 21:04:36 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> change names from avx512 to x86_64 > > src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 391: > >> 389: } calls[] = { >> 390: "arraysort_stub", >> 391: { { TypeFunc::Parms, ShenandoahLoad }, { TypeFunc::Parms+1, ShenandoahStore }, { -1, ShenandoahNone }, > > Only the first parameter is array. Please see the fixed signature in the latest commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1291815951 From kvn at openjdk.org Sat Aug 12 00:46:28 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 12 Aug 2023 00:46:28 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v5] In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 15:20:59 GMT, Yudi Zheng wrote: >> This PR exports `vmIntrinsic::is_intrinsic_available`, `Compiler::is_intrinsic_supported`, and `C2Compiler::is_intrinsic_supported` results to JVMCI compiler. This allows JVMCI compiler to comply with `-XX:DisableIntrinsic`, `-XX:ControlIntrinsic`, `-XX:-UseXXXIntrinsic`, and is essential for running test that depends on these flags, e.g., `java/lang/Float/Binary16ConversionNaN` that returns different result in the interpreter with `-XX:DisableIntrinsic=_float16ToFloat,_floatToFloat16`. >> This PR also attempts to fix some of the `is_intrinsic_available` results. Please see the inlined comments. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > revert change in supports_fma. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15133#pullrequestreview-1574578577 From kvn at openjdk.org Sat Aug 12 00:46:30 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 12 Aug 2023 00:46:30 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v2] In-Reply-To: <9WeBz8r5aL5xW5HDjw-za0NOTMZU7VEgI0PadSri5aU=.3c958061-4355-4ccf-b3ea-9895bb6dd68a@github.com> References: <9WeBz8r5aL5xW5HDjw-za0NOTMZU7VEgI0PadSri5aU=.3c958061-4355-4ccf-b3ea-9895bb6dd68a@github.com> Message-ID: On Thu, 10 Aug 2023 14:28:39 GMT, Yudi Zheng wrote: >> I don't having the same logic in two places, because then those two places need to be kept in sync. Either the stubs should be generated based on is_intrinsic_supported(), or is_intrinsic_supported() should check if the stub was generated. > >> I don't having the same logic in two places, because then those two places need to be kept in sync. Either the stubs should be generated based on is_intrinsic_supported(), or is_intrinsic_supported() should check if the stub was generated. > > I have dropped the redundant CPU feature checks, and for those intrinsics with stubs, I tested if the stub pointer is nullptr in `C2Compiler::is_intrinsic_supported` @mur47x111 please, rerun testing with latest version before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15133#issuecomment-1675567914 From duke at openjdk.org Sat Aug 12 22:29:58 2023 From: duke at openjdk.org (iaroslavski) Date: Sat, 12 Aug 2023 22:29:58 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 22:29:37 GMT, Srinivas Vamsi Parasa wrote: >>> Also need to handle arraySort in file: share/gc/shenandoah/c2/shenandoahSupport.cpp, function: ShenandoahBarrierC2Support::verify around line 3000. >> >> Updated the code in ShenandoahBarrierC2Support as suggested. > >> @vamsi-parasa With fastdebug build I see the following error: Internal Error (jdk/src/hotspot/share/opto/escape.cpp:1196), pid=3543536, tid=3543559 fatal error: EA unexpected CallLeaf arraysort_stub >> >> Please take a look. > > This was fixed as well. Hello @vamsi-parasa ! Thank you for your improvements of sorting! Please see my comments and suggestions: 1. You introduced one method arraySort which is common for int/float/long/double types. What if we have 4 methods for each type? In this case we don't need switch and UnsupportedOperationException. What do you think? 2. Please pay attention to the argument "long offset" in method arraySort(). It is not used in this method. What is it for? 3. Do you have plan to apply your approach to Arrays.parallelSort()? 4. I see benchmarking file test/micro/org/openjdk/bench/java/util/ArraysSort.java, sounds good, thank you for testing! At the same time in the PR https://github.com/openjdk/jdk/pull/13568 we have other benchmarking test with the same name. It would be nice to have one version. Could you please run benchmarking test from the PR https://github.com/openjdk/jdk/pull/13568 with your changes? 5. As you can see the PR https://github.com/openjdk/jdk/pull/13568 contains the optimized version of DualPivotQuicksort. I appreciate if you run both benchmarking tests with updated class DualPivotQuicksort and share results. Many thanks, Vladimir Yaroslavskiy ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1676124093 From ysuenaga at openjdk.org Mon Aug 14 01:52:28 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Mon, 14 Aug 2023 01:52:28 GMT Subject: RFR: 8313406: nep_invoker_blob can be simplified more In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 12:22:00 GMT, Yasumasa Suenaga wrote: > In FFM, native function would be called via `nep_invoker_blob`. If the function has two arguments, it would be following: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fcae394cd10 > -------------------------------------------------------------------------------- > 0x00007fcae394cd80: pushq %rbp > 0x00007fcae394cd81: movq %rsp, %rbp > 0x00007fcae394cd84: subq $0, %rsp > ;; { argument shuffle > 0x00007fcae394cd88: movq %r8, %rax > 0x00007fcae394cd8b: movq %rsi, %r10 > 0x00007fcae394cd8e: movq %rcx, %rsi > 0x00007fcae394cd91: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fcae394cd94: callq *%r10 > 0x00007fcae394cd97: leave > 0x00007fcae394cd98: retq > > > `subq $0, %rsp` is for shadow space on stack, and `movq %r8, %rax` is number of args for variadic function. So they are not necessary in some case. They should be remove following if they are not needed: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fd8778e2810 > -------------------------------------------------------------------------------- > 0x00007fd8778e2880: pushq %rbp > 0x00007fd8778e2881: movq %rsp, %rbp > ;; { argument shuffle > 0x00007fd8778e2884: movq %rsi, %r10 > 0x00007fd8778e2887: movq %rcx, %rsi > 0x00007fd8778e288a: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fd8778e288d: callq *%r10 > 0x00007fd8778e2890: leave > 0x00007fd8778e2891: retq > > > All java/foreign jtreg tests are passed. > > We can see these stub code on [ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/examples/cpumodel) with `-XX:+UnlockDiagnosticVMOptions -XX:+PrintStubCode` and hsdis library. This testcase linked the code with `Linker.Option.isTrivial()`. > > After this change, FFM performance on [another ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/benchmarks/funccall) was improved: > > before: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 106664071.816 ? 14396524.718 ops/s > FuncCallComparison.rdtsc thrpt 3 108024079.738 ? 13223921.011 ops/s > > > after: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 107622971.525 ? 12249767.134 ops/s > FuncCallComparison.rdtsc thrpt 3 107695741.608 ? 23983281.346 ops/s > > > Environment: > * CPU: AMD Ryzen 3 3300X > * OS: Fedora 38 x86_64 (Kernel 6.3.8-200.fc38.x86_64) > * Hyper-V 4vCPU, 8GB mem PING: could you review this PR? I need one more reviewer to push. This PR has passed java/foreign jtreg tests and CI in Oracle. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15089#issuecomment-1676545828 From yzheng at openjdk.org Mon Aug 14 06:27:58 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 14 Aug 2023 06:27:58 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v5] In-Reply-To: References: Message-ID: <5Jq7V3PisHoJW_xkn5PCCWpFDVE284iVn_n7qUzymIk=.c1ad14e1-1d37-4c54-92c5-40f8dd479362@github.com> On Fri, 11 Aug 2023 15:20:59 GMT, Yudi Zheng wrote: >> This PR exports `vmIntrinsic::is_intrinsic_available`, `Compiler::is_intrinsic_supported`, and `C2Compiler::is_intrinsic_supported` results to JVMCI compiler. This allows JVMCI compiler to comply with `-XX:DisableIntrinsic`, `-XX:ControlIntrinsic`, `-XX:-UseXXXIntrinsic`, and is essential for running test that depends on these flags, e.g., `java/lang/Float/Binary16ConversionNaN` that returns different result in the interpreter with `-XX:DisableIntrinsic=_float16ToFloat,_floatToFloat16`. >> This PR also attempts to fix some of the `is_intrinsic_available` results. Please see the inlined comments. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > revert change in supports_fma. Passed tier1-3 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15133#issuecomment-1676730285 From fgao at openjdk.org Mon Aug 14 07:42:07 2023 From: fgao at openjdk.org (Fei Gao) Date: Mon, 14 Aug 2023 07:42:07 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v6] In-Reply-To: References: Message-ID: > Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: > > > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > > > Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Improve comments and add assertions - Merge branch 'master' into fg8308340 - Merge branch 'master' into fg8308340 - Merge branch 'master' into fg8308340 - Merge branch 'master' into fg8308340 - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files - Merge branch 'master' into fg8308340 - 8308340: C2: Idealize Fma nodes Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: ``` match(Set dst (FmaF src3 (Binary (NegF src1) src2))); match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); ``` Since `Fma` is partially communitive, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. The patch passed all tier 1 - 3 on aarch64 and x86 platforms. ------------- Changes: https://git.openjdk.org/jdk/pull/14576/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14576&range=05 Stats: 689 lines in 20 files changed: 469 ins; 118 del; 102 mod Patch: https://git.openjdk.org/jdk/pull/14576.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14576/head:pull/14576 PR: https://git.openjdk.org/jdk/pull/14576 From fgao at openjdk.org Mon Aug 14 07:42:29 2023 From: fgao at openjdk.org (Fei Gao) Date: Mon, 14 Aug 2023 07:42:29 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v6] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 13:19:06 GMT, Emanuel Peter wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Improve comments and add assertions >> - Merge branch 'master' into fg8308340 >> - Merge branch 'master' into fg8308340 >> - Merge branch 'master' into fg8308340 >> - Merge branch 'master' into fg8308340 >> - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files >> - Merge branch 'master' into fg8308340 >> - 8308340: C2: Idealize Fma nodes >> >> Some platforms, like aarch64, ppc, and riscv, support fusing >> `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating >> partially symmetric match rules like: >> >> ``` >> match(Set dst (FmaF src3 (Binary (NegF src1) src2))); >> match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); >> ``` >> >> Since `Fma` is partially communitive, the patch is to convert >> `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, >> making node patterns canonical. Then we can remove redundant >> rules. >> >> Also, we should guarantee that C2 generates `Fma` nodes only on >> platforms supporting `Fma` instructions before matcher, so we >> can remove all `predicate(UseFMA)` for all `Fma` rules. >> >> After the patch, the code size of libjvm.so on aarch64 platform >> decreased by 63.4k. >> >> The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > > src/hotspot/cpu/x86/x86.ad line 3975: > >> 3973: // a * b + c >> 3974: instruct fmaF_reg(regF a, regF b, regF c) %{ >> 3975: predicate(UseFMA); > > You could add an assert to the encoding code. Just to ensure that we do not generate bad code, even if it is never executed during testing. Yes, thanks for your suggestion! Updated in the new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1293070879 From fgao at openjdk.org Mon Aug 14 07:42:33 2023 From: fgao at openjdk.org (Fei Gao) Date: Mon, 14 Aug 2023 07:42:33 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v5] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 13:16:04 GMT, Emanuel Peter wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge branch 'master' into fg8308340 >> - Merge branch 'master' into fg8308340 >> - Merge branch 'master' into fg8308340 >> - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files >> - Merge branch 'master' into fg8308340 >> - 8308340: C2: Idealize Fma nodes >> >> Some platforms, like aarch64, ppc, and riscv, support fusing >> `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating >> partially symmetric match rules like: >> >> ``` >> match(Set dst (FmaF src3 (Binary (NegF src1) src2))); >> match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); >> ``` >> >> Since `Fma` is partially communitive, the patch is to convert >> `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, >> making node patterns canonical. Then we can remove redundant >> rules. >> >> Also, we should guarantee that C2 generates `Fma` nodes only on >> platforms supporting `Fma` instructions before matcher, so we >> can remove all `predicate(UseFMA)` for all `Fma` rules. >> >> After the patch, the code size of libjvm.so on aarch64 platform >> decreased by 63.4k. >> >> The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > > src/hotspot/share/opto/mulnode.cpp line 1717: > >> 1715: //------------------------------Ideal------------------------------------------ >> 1716: Node* FmaNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> 1717: // We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" > > Add motivation to comment > > > // This reduces the number of rules in the matcher, as we only need to check > // for negations on the second argument, and not the symmetric case where > // the first argument is negated. Thanks! Done. > test/hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java line 63: > >> 61: private static final VectorSpecies S_SPECIES = ShortVector.SPECIES_MAX; >> 62: >> 63: private static int LENGTH = 128; > > What is the reason for the reduction? Speed? Yes, it's for speeding up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1293070063 PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1293069544 From fgao at openjdk.org Mon Aug 14 07:43:28 2023 From: fgao at openjdk.org (Fei Gao) Date: Mon, 14 Aug 2023 07:43:28 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: <2mha0SwQTlnWaTLdEMX6RQ1KqhasbfWKz503bmkxKhw=.eb3a1229-b222-4d02-b2f6-35c0e50e5766@github.com> <25TmMNCogoj1jgszVmsFMDfkBVov20V_zM9G0x8cqDQ=.5502aded-0509-4afb-b5ad-47084dbfa430@github.com> Message-ID: On Thu, 10 Aug 2023 13:13:18 GMT, Emanuel Peter wrote: >> Actually, there is no handling on `FmaV` nodes **with mask** in this patch, whether in the C2 mid-end or codegen backend. The gvn transformation just skips them. And I suppose `FmaV` nodes with mask can't be transformed into nodes **without mask**, except that C2 can guarantee that the mask is all true (this transformation has not been supported by current C2). Thanks. > > @fg1417 I only understood the comment with the help of your explanations in this thread. I think you should improve the comment. I would not mention the vectorapi. We may generate `FmaV` through an auto-vectorizer. Though I guess that is unlikely, since the scalar version `Fma::Ideal` would already reshape things. > > Suggestion: > > // We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" > // This reduces the number of rules in the matcher, as we only need to check > // for negations on the second argument, and not the symmetric case where > // the first argument is negated. > // We cannot do this if he FmaV is masked. the inactive lanes have to return > // the first input (ie "-a"). If we were to swap the inputs, the inactive lanes would > // incorrectly return "b". Thanks! Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1293070273 From fgao at openjdk.org Mon Aug 14 07:56:35 2023 From: fgao at openjdk.org (Fei Gao) Date: Mon, 14 Aug 2023 07:56:35 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v5] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 13:34:05 GMT, Emanuel Peter wrote: > We should probably do the verification that the canonicalization happened, if the normal `fma` matcher rule is chosen. We should add asserts that the first argument is not a negation (you could check the second argument also, just in case). What do you think? Hi @eme64, thanks for your review! The check may be more complex than expected. Matcher can fuse two instructions into one, only when there is no other use for the inputs. It means that we can do the fusion for the case like: return Math.fma(-a, b, c); But we can't fuse them for the case like: float tmp = -a; return Math.fma(tmp, b, c) + (-a); For the second case, we still match normal `neg` + `fma` rules separately, instead of these combined rules. So, we can't simply guarantee that the first argument is not a negation when the normal `fma` matcher rule is chosen. If considering the def-use while doing the instruction selection, the check may be complex. WDYT? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14576#issuecomment-1676838806 From chagedorn at openjdk.org Mon Aug 14 08:40:28 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 14 Aug 2023 08:40:28 GMT Subject: Integrated: 8314106: C2: assert(is_valid()) failed: must be valid after JDK-8305636 In-Reply-To: References: Message-ID: <28MxKbjs3DlBUZHO-urlMyV_Ko8dapkexiNv53XgqEs=.41fcce82-e6f9-4bbc-8ea9-0d8514757d87@github.com> On Thu, 10 Aug 2023 13:35:46 GMT, Christian Hagedorn wrote: > In the failing test case, we are unswitching a loop for which we've already removed Parse Predicates with `Compile::cleanup_parse_predicates()`. We are wrongly checking if a predicate block is non-empty (i.e. find the Parse **or** Runtime Predicates) instead of only checking if we find the Parse Predicate: > https://github.com/openjdk/jdk/blob/23fe2ece586d3ed750e905e1b71a2cd1da91f335/src/hotspot/share/opto/loopPredicate.cpp#L448-L453 > > In the test case, we have a predicate block that contains Runtime Predicates from Loop Predication but no Parse Predicate anymore. Therefore, when trying to clone the non-existing Parse Predicate, we fail with the assertion because we do not have a valid Parse Predicate. > > The fix is to only clone a Parse Predicate and the Assertion Predicates for a predicate block if the Parse Predicate is actually there. This is not entirely correct because we would also need to clone Assertion Predicates in the absence of Parse Predicates. But this was already wrong before JDK-8305636: > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopPredicate.cpp#L598-L612 > > This will only be fixed with the complete fix ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). The proposed fix here just reverts back to the old behavior before JDK-8305636. > > Thanks, > Christian This pull request has now been integrated. Changeset: 1de5bf1c Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/1de5bf1ce94c20bc2fd481cd4387f170b0d3c63d Stats: 68 lines in 2 files changed: 66 ins; 1 del; 1 mod 8314106: C2: assert(is_valid()) failed: must be valid after JDK-8305636 Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15225 From chagedorn at openjdk.org Mon Aug 14 08:41:00 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 14 Aug 2023 08:41:00 GMT Subject: Integrated: 8314116: C2: assert(false) failed: malformed control flow after JDK-8305636 In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 08:13:57 GMT, Christian Hagedorn wrote: > In the test case, a Template Assertion Predicate is not removed when a loop is dying. After applying more loop opts, it ends up above a different loop. We then peel this loop and create an Initialized Assertion Predicate from the template which gets completely unrelated values from the already removed loop. This causes some nodes to die and we end up with a broken graph. > > This problem of not removing Template Assertion Predicates of dying loops which end up at different loops was already known before JDK-8305636 (see [analysis in JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428?focusedCommentId=14571901&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14571901)). > > Apparently, with JDK-8305636, this became much more likely because I've accidentally already included a fix for Loop Peeling when moving some refactorings from the complete fix ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)) to JDK-8305636. The included fix creates Initialized Assertion Predicates when peeling a loop even though Parse Predicates have already been removed. This needs to be done eventually but seems to trigger JDK-8305428 more often with a different manifestation. > > I therefore suggest to revert Loop Peeling back to the old state: > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L781-L789 > > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L2068-L2075 > > where we only create Initialized Assertion Predicates if there are actually Parse Predicates available. > > Thanks, > Christian This pull request has now been integrated. Changeset: a39ed108 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/a39ed1087b3c188f06c9aa602313f3b9bf20f9c2 Stats: 86 lines in 2 files changed: 85 ins; 0 del; 1 mod 8314116: C2: assert(false) failed: malformed control flow after JDK-8305636 Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15244 From chagedorn at openjdk.org Mon Aug 14 08:40:58 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 14 Aug 2023 08:40:58 GMT Subject: RFR: 8314116: C2: assert(false) failed: malformed control flow after JDK-8305636 In-Reply-To: References: Message-ID: <1xYUsbVwLM5R6hwNmd43BNzfaY_vYU-Dv8fbeE6UlEQ=.de4e2e21-258f-49e8-9e3c-70cd1a997f6c@github.com> On Fri, 11 Aug 2023 08:13:57 GMT, Christian Hagedorn wrote: > In the test case, a Template Assertion Predicate is not removed when a loop is dying. After applying more loop opts, it ends up above a different loop. We then peel this loop and create an Initialized Assertion Predicate from the template which gets completely unrelated values from the already removed loop. This causes some nodes to die and we end up with a broken graph. > > This problem of not removing Template Assertion Predicates of dying loops which end up at different loops was already known before JDK-8305636 (see [analysis in JDK-8305428](https://bugs.openjdk.org/browse/JDK-8305428?focusedCommentId=14571901&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14571901)). > > Apparently, with JDK-8305636, this became much more likely because I've accidentally already included a fix for Loop Peeling when moving some refactorings from the complete fix ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)) to JDK-8305636. The included fix creates Initialized Assertion Predicates when peeling a loop even though Parse Predicates have already been removed. This needs to be done eventually but seems to trigger JDK-8305428 more often with a different manifestation. > > I therefore suggest to revert Loop Peeling back to the old state: > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L781-L789 > > https://github.com/openjdk/jdk/blob/a38fdaf18dfeeb23775516d1986c720190ba9fc2/src/hotspot/share/opto/loopTransform.cpp#L2068-L2075 > > where we only create Initialized Assertion Predicates if there are actually Parse Predicates available. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15244#issuecomment-1676885035 From dnsimon at openjdk.org Mon Aug 14 09:12:00 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 14 Aug 2023 09:12:00 GMT Subject: RFR: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. [v5] In-Reply-To: References: Message-ID: <2IS7HQyrcVXQiRYs5qk7jKfftECxVxfbiGGyT4Spf-Q=.6585bddb-e014-41b7-83ac-fe614cf9cfaf@github.com> On Fri, 11 Aug 2023 15:20:59 GMT, Yudi Zheng wrote: >> This PR exports `vmIntrinsic::is_intrinsic_available`, `Compiler::is_intrinsic_supported`, and `C2Compiler::is_intrinsic_supported` results to JVMCI compiler. This allows JVMCI compiler to comply with `-XX:DisableIntrinsic`, `-XX:ControlIntrinsic`, `-XX:-UseXXXIntrinsic`, and is essential for running test that depends on these flags, e.g., `java/lang/Float/Binary16ConversionNaN` that returns different result in the interpreter with `-XX:DisableIntrinsic=_float16ToFloat,_floatToFloat16`. >> This PR also attempts to fix some of the `is_intrinsic_available` results. Please see the inlined comments. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > revert change in supports_fma. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15133#pullrequestreview-1576399256 From yzheng at openjdk.org Mon Aug 14 09:12:01 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 14 Aug 2023 09:12:01 GMT Subject: Integrated: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. In-Reply-To: References: Message-ID: On Thu, 3 Aug 2023 06:48:27 GMT, Yudi Zheng wrote: > This PR exports `vmIntrinsic::is_intrinsic_available`, `Compiler::is_intrinsic_supported`, and `C2Compiler::is_intrinsic_supported` results to JVMCI compiler. This allows JVMCI compiler to comply with `-XX:DisableIntrinsic`, `-XX:ControlIntrinsic`, `-XX:-UseXXXIntrinsic`, and is essential for running test that depends on these flags, e.g., `java/lang/Float/Binary16ConversionNaN` that returns different result in the interpreter with `-XX:DisableIntrinsic=_float16ToFloat,_floatToFloat16`. > This PR also attempts to fix some of the `is_intrinsic_available` results. Please see the inlined comments. This pull request has now been integrated. Changeset: 4164693f Author: Yudi Zheng Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/4164693f3bf15a2f3e03dee72e1ca3fb8d82582c Stats: 80 lines in 10 files changed: 63 ins; 6 del; 11 mod 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers. Reviewed-by: dnsimon, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15133 From thartmann at openjdk.org Mon Aug 14 10:54:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Aug 2023 10:54:58 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v29] In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 16:03:29 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 77 commits: > > - Merge branch 'master' into JDK-8310308 > - take out cascade lake simulation > - fix copyright format > - fix whitespace issues > - fix TestSafepointWhilePrinting.java > - enable equal count comparison, tighten default cascade lake special casing > - manual merge from master > - duplicate rules in VectorLogicalOpIdentityTest.java > - Merge branch 'master' into JDK-8310308 > - Duplicated =1 counts for vector nodes in compiler/vectorapi/reshape/tests/TestVectorCast.java > - ... and 67 more: https://git.openjdk.org/jdk/compare/12326770...ebeb5898 Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14539#pullrequestreview-1569236318 From thartmann at openjdk.org Mon Aug 14 10:56:29 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 14 Aug 2023 10:56:29 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v26] In-Reply-To: References: Message-ID: On Tue, 8 Aug 2023 17:20:01 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 71 commits: > > - manual merge from master > - duplicate rules in VectorLogicalOpIdentityTest.java > - Merge branch 'master' into JDK-8310308 > - Duplicated =1 counts for vector nodes in compiler/vectorapi/reshape/tests/TestVectorCast.java > - Merge branch 'master' into JDK-8310308 > - Fix with canTrustVectorSize for Cascade Lake > - TestSpillTheBeans.java > - print VMInfo from Test VM > - merge from master, manual merge for VectorLogicalOpIdentityTest.java > - Response to Tobias' review > - ... and 61 more: https://git.openjdk.org/jdk/compare/509f80bb...48fa52ba test/hotspot/jtreg/compiler/lib/ir_framework/README.md line 90: > 88: ``` > 89: > 90: However, the size does not have to be specified. In most cases, one either wants to have vectorization at the maximal possible vector width, or no vectorization at all. Hence, for lower bound counts ('>' or '>=') the default size is `IRNode.VECTOR_SIZE_MAX`, and for upper bound counts ('<' or '<=' or '=0' or failOn) the default is `IRNode.VECTOR_SIZE_ANY`. Equal count comparisons with a strictly positive count (e.g. '=2') are not allowed for vector nodes. On machines with 'canTrustVectorSize == false' (cascade lake) the maximal vector width is not predictable currently. Hence, on such a machine we have to automatically weaken the IR rules. All lower bound counts are performed checking with `IRNode.VECTOR_SIZE_ANY`. Upper bound counts with no user specified size are performed with `IRNode.VECTOR_SIZE_ANY` but upper bound counts with a user specified size are not checked at all. Details and reasoning can be found in [RawIRNode](./driver/irmatching/irrule/checkattribute/parsing/Ra wIRNode.java). Suggestion: However, the size does not have to be specified. In most cases, one either wants to have vectorization at the maximal possible vector width, or no vectorization at all. Hence, for lower bound counts ('>' or '>=') the default size is `IRNode.VECTOR_SIZE_MAX`, and for upper bound counts ('<' or '<=' or '=0' or failOn) the default is `IRNode.VECTOR_SIZE_ANY`. Equal count comparisons with a strictly positive count (e.g. '=2') are not allowed for vector nodes. On machines with 'canTrustVectorSize == false' (Cascade Lake) the maximal vector width is not predictable currently. Hence, on such a machine we have to automatically weaken the IR rules. All lower bound counts are performed checking with `IRNode.VECTOR_SIZE_ANY`. Upper bound counts with no user specified size are performed with `IRNode.VECTOR_SIZE_ANY` but upper bound counts with a user specified size are not checked at all. Details and reasoning can be found in [RawIRNode](./driver/irmatching/irrule/checkattribute/parsing/RawIRNod e.java). Same for other occurrences. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java line 110: > 108: // If we have a size specified but cannot trust the size, and must check an upper > 109: // bound, this can be impossible to count correctly - if we have an incorrect size > 110: // we may count either too many nodes. We just create a impossible regex which will Suggestion: // we may count either too many nodes. We just create an impossible regex which will test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java line 76: > 74: } > 75: case "=" -> { > 76: // "=0" is same as setting upper bound - just like for failOn. But i we compare equals a Suggestion: // "=0" is same as setting upper bound - just like for failOn. But if we compare equals a test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java line 77: > 75: case "=" -> { > 76: // "=0" is same as setting upper bound - just like for failOn. But i we compare equals a > 77: // strictly positive number it is like setting both and upper and lower bound (equal). Suggestion: // strictly positive number it is like setting both upper and lower bound (equal). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1288279325 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1288278270 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1288283599 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1288283843 From epeter at openjdk.org Mon Aug 14 11:17:29 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Aug 2023 11:17:29 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v30] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Fix comments according to TobiHartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/ebeb5898..ffd5ed81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=28-29 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From chagedorn at openjdk.org Mon Aug 14 12:55:58 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 14 Aug 2023 12:55:58 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance Message-ID: This reapplies JDK-8308682 (i.e. reverse the backout done with [JDK-8313756](https://bugs.openjdk.org/browse/JDK-8313756)) but attributes it correctly to @theRealAph together with @adinn and @sviswa7 as additional reviewers. The redo applied cleanly. Thanks, Christian ------------- Commit messages: - 8313760: [REDO] Enhance AES performance Changes: https://git.openjdk.org/jdk/pull/15267/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15267&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313760 Stats: 107 lines in 7 files changed: 70 ins; 1 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/15267.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15267/head:pull/15267 PR: https://git.openjdk.org/jdk/pull/15267 From epeter at openjdk.org Mon Aug 14 13:10:59 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Aug 2023 13:10:59 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v6] In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 07:42:07 GMT, Fei Gao wrote: >> Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: >> >> >> match(Set dst (FmaF src3 (Binary (NegF src1) src2))); >> match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); >> >> >> Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. >> >> Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. >> >> After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. >> >> The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Improve comments and add assertions > - Merge branch 'master' into fg8308340 > - Merge branch 'master' into fg8308340 > - Merge branch 'master' into fg8308340 > - Merge branch 'master' into fg8308340 > - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files > - Merge branch 'master' into fg8308340 > - 8308340: C2: Idealize Fma nodes > > Some platforms, like aarch64, ppc, and riscv, support fusing > `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating > partially symmetric match rules like: > > ``` > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > ``` > > Since `Fma` is partially communitive, the patch is to convert > `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, > making node patterns canonical. Then we can remove redundant > rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on > platforms supporting `Fma` instructions before matcher, so we > can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform > decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. @fg1417 fair enough. the checks may be too complex. I'll approve it now :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14576#pullrequestreview-1576783457 From chagedorn at openjdk.org Mon Aug 14 13:57:29 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 14 Aug 2023 13:57:29 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v30] In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 11:17:29 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments according to TobiHartmann I have only some minor comments left, otherwise, the update looks good to me! test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 113: > 111: public static final String VECTOR_SIZE_TAG_ANY = "any"; > 112: public static final String VECTOR_SIZE_TAG_MAX = "max_for_type"; > 113: public static final String VECTOR_SIZE_ANY = VECTOR_SIZE + VECTOR_SIZE_TAG_ANY; // default for count "=0" and failOn Suggestion: public static final String VECTOR_SIZE_ANY = VECTOR_SIZE + VECTOR_SIZE_TAG_ANY; // default for counts "=0" and failOn test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 114: > 112: public static final String VECTOR_SIZE_TAG_MAX = "max_for_type"; > 113: public static final String VECTOR_SIZE_ANY = VECTOR_SIZE + VECTOR_SIZE_TAG_ANY; // default for count "=0" and failOn > 114: public static final String VECTOR_SIZE_MAX = VECTOR_SIZE + VECTOR_SIZE_TAG_MAX; // default in count Suggestion: public static final String VECTOR_SIZE_MAX = VECTOR_SIZE + VECTOR_SIZE_TAG_MAX; // default in counts test/hotspot/jtreg/compiler/lib/ir_framework/driver/SuccessOnlyConstraintException.java line 28: > 26: /** > 27: * Exception used to signal that the Constraint should always suceed. > 28: */ Suggestion: import compiler.lib.ir_framework.driver.irmatching.irrule.constraint.Constraint; /** * Exception used to signal that a {@link Constraint} should always succeed. */ test/hotspot/jtreg/compiler/lib/ir_framework/driver/SuccessOnlyConstraintException.java line 31: > 29: public class SuccessOnlyConstraintException extends RuntimeException { > 30: public SuccessOnlyConstraintException(String message) { > 31: super("Unhandled SuccessOnlyConstraintException, should have created a Constraint that alway suceeds:" + System.lineSeparator() + message); Suggestion: super("Unhandled SuccessOnlyConstraintException, should have created a Constraint that always succeeds:" + System.lineSeparator() + message); test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java line 92: > 90: if (!userPostfix.isValid() || !vmInfo.canTrustVectorSize()) { > 91: switch (bound) { > 92: case Comparison.Bound.LOWER -> { Suggestion: case LOWER -> { test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java line 105: > 103: } > 104: } > 105: case Comparison.Bound.UPPER -> { Suggestion: case UPPER -> { test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java line 120: > 118: } > 119: } > 120: case Comparison.Bound.EQUAL -> { Suggestion: case EQUAL -> { test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java line 127: > 125: // Equal comparison to a strictly positive number would lead us to an impossible > 126: // situation: we might have to know the exact vector size or else we count too many > 127: // or too few cases. Because of this we forbid this case in general. The comment is outdated and could, for example, be updated to: Suggestion: // or too few cases. We therefore skip such a constraint and treat it as success. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/SuccessConstraintCheck.java line 28: > 26: import compiler.lib.ir_framework.IR; > 27: import compiler.lib.ir_framework.driver.irmatching.MatchResult; > 28: import compiler.lib.ir_framework.shared.Comparison; Suggestion: import compiler.lib.ir_framework.driver.irmatching.MatchResult; test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/SuccessConstraintCheck.java line 38: > 36: */ > 37: class SuccessConstraintCheck implements ConstraintCheck { > 38: public SuccessConstraintCheck() {} This default constructor can be removed since it's implicitly added for us. Suggestion: test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/VMInfo.java line 67: > 65: System.err.println(" -> SuperWord expected to run with 32 byte, not 64 byte, VectorAPI expected to use 64 byte"); > 66: System.err.println(" -> \"canTrustVectorSize == false\", some vector node IR rules are made weaker."); > 67: } Is this a leftover from debugging? If you want to print this information for debugging purposes, I suggest to move this code to `VMInfoParser` and additionally guard it with `VERBOSE || PRINT_IR_ENCODING`. The name `PRINT_IR_ENCODING` is not completely correct here but we might want to clean this up separately at some other point in time. You can keep the verification of calling the `get*()` methods here, though. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/VMInfo.java line 108: > 106: * Some platforms do not behave as expected, and one cannot trust that the vectors > 107: * make use of the full MaxVectorSize. For Cascade Lake we by default only use > 108: * 32 bytes for SuperWord even though MaxVectorSize is 64. But the VectorAPI still Suggestion: For Cascade Lake, we only use 32 bytes for SuperWord by default even though MaxVectorSize is 64. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14539#pullrequestreview-1569244364 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293347002 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293347135 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293359212 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293361113 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293452244 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293452500 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293452691 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293455080 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293363923 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293457878 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293468601 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293475850 From chagedorn at openjdk.org Mon Aug 14 13:57:58 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 14 Aug 2023 13:57:58 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v26] In-Reply-To: References: Message-ID: On Tue, 8 Aug 2023 17:20:01 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 71 commits: > > - manual merge from master > - duplicate rules in VectorLogicalOpIdentityTest.java > - Merge branch 'master' into JDK-8310308 > - Duplicated =1 counts for vector nodes in compiler/vectorapi/reshape/tests/TestVectorCast.java > - Merge branch 'master' into JDK-8310308 > - Fix with canTrustVectorSize for Cascade Lake > - TestSpillTheBeans.java > - print VMInfo from Test VM > - merge from master, manual merge for VectorLogicalOpIdentityTest.java > - Response to Tobias' review > - ... and 61 more: https://git.openjdk.org/jdk/compare/509f80bb...48fa52ba test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 108: > 106: private static final String LOAD_OF_CLASS_POSTFIX = "(:|\\+)\\S* \\*" + END; > 107: > 108: public static final String IMPOSSIBLE_NODE_REGEX = "impossible_node_regex"; Maybe add additional `#` to be on the safe side to never accidentally match it: Suggestion: public static final String IMPOSSIBLE_NODE_REGEX = "#impossible_node_regex#"; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1288283334 From epeter at openjdk.org Mon Aug 14 14:12:29 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Aug 2023 14:12:29 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v26] In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 10:31:57 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 71 commits: >> >> - manual merge from master >> - duplicate rules in VectorLogicalOpIdentityTest.java >> - Merge branch 'master' into JDK-8310308 >> - Duplicated =1 counts for vector nodes in compiler/vectorapi/reshape/tests/TestVectorCast.java >> - Merge branch 'master' into JDK-8310308 >> - Fix with canTrustVectorSize for Cascade Lake >> - TestSpillTheBeans.java >> - print VMInfo from Test VM >> - merge from master, manual merge for VectorLogicalOpIdentityTest.java >> - Response to Tobias' review >> - ... and 61 more: https://git.openjdk.org/jdk/compare/509f80bb...48fa52ba > > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 108: > >> 106: private static final String LOAD_OF_CLASS_POSTFIX = "(:|\\+)\\S* \\*" + END; >> 107: >> 108: public static final String IMPOSSIBLE_NODE_REGEX = "impossible_node_regex"; > > Maybe add additional `#` to be on the safe side to never accidentally match it: > Suggestion: > > public static final String IMPOSSIBLE_NODE_REGEX = "#impossible_node_regex#"; Line is now removed, not required any more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293500774 From epeter at openjdk.org Mon Aug 14 14:12:34 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Aug 2023 14:12:34 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v30] In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 13:35:33 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comments according to TobiHartmann > > test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/VMInfo.java line 67: > >> 65: System.err.println(" -> SuperWord expected to run with 32 byte, not 64 byte, VectorAPI expected to use 64 byte"); >> 66: System.err.println(" -> \"canTrustVectorSize == false\", some vector node IR rules are made weaker."); >> 67: } > > Is this a leftover from debugging? If you want to print this information for debugging purposes, I suggest to move this code to `VMInfoParser` and additionally guard it with `VERBOSE || PRINT_IR_ENCODING`. The name `PRINT_IR_ENCODING` is not completely correct here but we might want to clean this up separately at some other point in time. > > You can keep the verification of calling the `get*()` methods here, though. I can just remove it. It is not necessary any more I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1293515743 From epeter at openjdk.org Mon Aug 14 14:35:58 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 14 Aug 2023 14:35:58 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v31] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - more review suggestions from christian - Apply suggestions from code review (christian) Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/ffd5ed81..ca0b576f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=29-30 Stats: 27 lines in 5 files changed: 2 ins; 15 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From fgao at openjdk.org Mon Aug 14 14:42:30 2023 From: fgao at openjdk.org (Fei Gao) Date: Mon, 14 Aug 2023 14:42:30 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v3] In-Reply-To: References: Message-ID: On Thu, 20 Jul 2023 09:42:36 GMT, Fei Yang wrote: >> Looks good to me. >> You need second review. > >> Thanks for your review @vnkozlov . >> >> I would appreciate it very much if some expert on ppc or riscv could help review it! Perhaps @RealFYang @reinrich > > Hello, the RISC-V part looks fine from what this PR is supposed to do. And this has passed tier1-3 tests on linux-riscv64 platform. Note that I didn't check the shared code changes. Thanks a lot for all your kind reviews and test work, @RealFYang @vnkozlov @eme64 @reinrich. I'll integrate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14576#issuecomment-1677401754 From iklam at openjdk.org Mon Aug 14 15:54:29 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 14 Aug 2023 15:54:29 GMT Subject: RFR: 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 12:06:36 GMT, Doug Simon wrote: >> This PR updates Java code in JVMCI to match the C code changes in [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996) that modified the constant pool layout. Essentially, the indices after a getfield/putfield/getstatic/putstatic bytecode is no longer a CpCacheIndex, but an index for `ConstantPool::resolved_field_entry_at(int field_index)`. >> >> The assertion (and subsequent crash) happen only in debug builds. Out of pure luck, in product build JVMCI produces the correct output even after [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996), although the code was doing the wrong thing. >> >> This PR has (so far) two commits: >> >> - 6527e67b1832087d180d2b50b65aaeca2e244c28 The actual functional change to use the `rawIndex` that follows a field bytecode. >> - c322b8e71d4d9e33bd065e64420101010f9127fc Fixed incorrectly named parameters and variables in the JVMCI code and JavaDoc. In most cases, `cpi` needs to be changed to `rawIndex` to reflect the correct type of the index. >> >> To help reviewing, I am limiting the renaming to just those affected by the field changes (without the renames, it's hard to validate that I am actually doing the right thing). >> >> There are still some cases of `cpi` that need to be changed to `rawIndex`. I will fix those in a separate RFE. E.g. in ConstantPool.java: >> >> >> default JavaMethod lookupMethod(int cpi, int opcode) { >> return lookupMethod(cpi, opcode, null); >> } > > Thanks for doing this Ioi. > > In this PR or the follow-up renaming RFE, could you please add a "decoder ring" comment to the javadoc for ConstantPool. An incomplete example: > > * The following terminology is used when indexing a constant pool entry: > *
    > *
  • rawIndex - index in the bytecode stream after the opcode (could be rewritten for some bytecodes)
  • > *
  • cpi - the class file constant pool index
  • > *
  • cpci - a constant pool cache index
  • > *
Thanks @dougxc and @coleenp for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15237#issuecomment-1677571227 From iklam at openjdk.org Mon Aug 14 15:55:29 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 14 Aug 2023 15:55:29 GMT Subject: Integrated: 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 01:15:01 GMT, Ioi Lam wrote: > This PR updates Java code in JVMCI to match the C code changes in [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996) that modified the constant pool layout. Essentially, the indices after a getfield/putfield/getstatic/putstatic bytecode is no longer a CpCacheIndex, but an index for `ConstantPool::resolved_field_entry_at(int field_index)`. > > The assertion (and subsequent crash) happen only in debug builds. Out of pure luck, in product build JVMCI produces the correct output even after [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996), although the code was doing the wrong thing. > > This PR has (so far) two commits: > > - 6527e67b1832087d180d2b50b65aaeca2e244c28 The actual functional change to use the `rawIndex` that follows a field bytecode. > - c322b8e71d4d9e33bd065e64420101010f9127fc Fixed incorrectly named parameters and variables in the JVMCI code and JavaDoc. In most cases, `cpi` needs to be changed to `rawIndex` to reflect the correct type of the index. > > To help reviewing, I am limiting the renaming to just those affected by the field changes (without the renames, it's hard to validate that I am actually doing the right thing). > > There are still some cases of `cpi` that need to be changed to `rawIndex`. I will fix those in a separate RFE. E.g. in ConstantPool.java: > > > default JavaMethod lookupMethod(int cpi, int opcode) { > return lookupMethod(cpi, opcode, null); > } This pull request has now been integrated. Changeset: 911d1dbb Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/911d1dbbf7362693c736b905b42e5150fc4f8a96 Stats: 131 lines in 5 files changed: 70 ins; 8 del; 53 mod 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp Reviewed-by: dnsimon, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/15237 From jbhateja at openjdk.org Mon Aug 14 17:55:30 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 14 Aug 2023 17:55:30 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 22:29:37 GMT, Srinivas Vamsi Parasa wrote: >>> Also need to handle arraySort in file: share/gc/shenandoah/c2/shenandoahSupport.cpp, function: ShenandoahBarrierC2Support::verify around line 3000. >> >> Updated the code in ShenandoahBarrierC2Support as suggested. > >> @vamsi-parasa With fastdebug build I see the following error: Internal Error (jdk/src/hotspot/share/opto/escape.cpp:1196), pid=3543536, tid=3543559 fatal error: EA unexpected CallLeaf arraysort_stub >> >> Please take a look. > > This was fixed as well. Hi @vamsi-parasa , If there are limitations to support this on windows kindly open a follow-up PR and add its link here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1677768179 From vlivanov at openjdk.org Mon Aug 14 18:41:47 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 14 Aug 2023 18:41:47 GMT Subject: RFR: 8313406: nep_invoker_blob can be simplified more In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 12:22:00 GMT, Yasumasa Suenaga wrote: > In FFM, native function would be called via `nep_invoker_blob`. If the function has two arguments, it would be following: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fcae394cd10 > -------------------------------------------------------------------------------- > 0x00007fcae394cd80: pushq %rbp > 0x00007fcae394cd81: movq %rsp, %rbp > 0x00007fcae394cd84: subq $0, %rsp > ;; { argument shuffle > 0x00007fcae394cd88: movq %r8, %rax > 0x00007fcae394cd8b: movq %rsi, %r10 > 0x00007fcae394cd8e: movq %rcx, %rsi > 0x00007fcae394cd91: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fcae394cd94: callq *%r10 > 0x00007fcae394cd97: leave > 0x00007fcae394cd98: retq > > > `subq $0, %rsp` is for shadow space on stack, and `movq %r8, %rax` is number of args for variadic function. So they are not necessary in some case. They should be remove following if they are not needed: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fd8778e2810 > -------------------------------------------------------------------------------- > 0x00007fd8778e2880: pushq %rbp > 0x00007fd8778e2881: movq %rsp, %rbp > ;; { argument shuffle > 0x00007fd8778e2884: movq %rsi, %r10 > 0x00007fd8778e2887: movq %rcx, %rsi > 0x00007fd8778e288a: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fd8778e288d: callq *%r10 > 0x00007fd8778e2890: leave > 0x00007fd8778e2891: retq > > > All java/foreign jtreg tests are passed. > > We can see these stub code on [ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/examples/cpumodel) with `-XX:+UnlockDiagnosticVMOptions -XX:+PrintStubCode` and hsdis library. This testcase linked the code with `Linker.Option.isTrivial()`. > > After this change, FFM performance on [another ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/benchmarks/funccall) was improved: > > before: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 106664071.816 ? 14396524.718 ops/s > FuncCallComparison.rdtsc thrpt 3 108024079.738 ? 13223921.011 ops/s > > > after: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 107622971.525 ? 12249767.134 ops/s > FuncCallComparison.rdtsc thrpt 3 107695741.608 ? 23983281.346 ops/s > > > Environment: > * CPU: AMD Ryzen 3 3300X > * OS: Fedora 38 x86_64 (Kernel 6.3.8-200.fc38.x86_64) > * Hyper-V 4vCPU, 8GB mem Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15089#pullrequestreview-1577434056 From xxinliu at amazon.com Mon Aug 14 19:21:07 2023 From: xxinliu at amazon.com (Liu, Xin) Date: Mon, 14 Aug 2023 19:21:07 +0000 Subject: Update on PEA in C2 (Episode 5) Message-ID: <72541D9E-9683-4CF6-9150-711696CDABE2@amazon.com> Hi, I would like update what we have done in the past month. Previously, we mentioned that we plan to materialize an object using DFS. We have completed this and it fixed 7 jtreg failures due to object composition. We expected to fix performance issue of CounterUppercase.java too. Unfortunately, it doesn?t pan out. We will cover it later. We set up a custom workflow using GHA. The purpose is to transparently track the status of PEA_beta branch. The workflow consists of 3 major stages. 1. Build a fastdebug binary on linux-x64 2. Run smoke tests and CTW 'java.base' module 3. Run tier1 tests. We break it down into 3 concurrent tasks: hotspot-tier1, jdk-tier1 and langtools-tier1. Since last time we reported, we have fixed all regression of hotspot:tier1. Besides the object composition issue, we also discover the following issues: 1. It?s possible that we have to materialize an object which has unbalanced monitor counter. Eg. https://github.com/navyxliu/jdk/blob/PEA_beta/PEA/MatInMonitor.java#L6 We just workaround this case by marking the object Escaped at bytecode MonitorEnter. 2. Some intrinsics carry side-effect of memory. For instance, Unsafe.compareAndSetReference(). We materialize all object references for all intrinsics as if they are non-inlined function calls. (This is too conservative, Object::hashCode() doesn't have any side-effect of 'this'. We will loosen this constraint) 3. PredictedCallGenerator introduces an if-else construct based on speculation. PEA materialization may take place in either branch so we need to merge the allocation state for that. Besides inter-procedural parse and deoptimization, we found the 3rd source of bugs by summarizing issues above. Some ideal nodes that are not directly from bytecode parsing. Since we embed PEA in c2 parse, we depend on Parse to capture Java-object semantic. 2) above bypasses the bytecodes of intrinsics, so we fail to capture the side-effect, aka. escaping points. The if-else ideal nodes of 3) are not from bytecodes either. An invokevirtual or invokeinterface generates them because of UseTypeProfile. We need to pay more attention to area in the next bug hunting. Because hotspot:tier1 is clear. We will focus on tier1 jtreg tests. In the latest run(https://github.com/navyxliu/jdk/actions/runs/5827349911), we still have 9 failures in jdk:tier1 and 109 failures in langtools:tier1. It looks like C2 PEA has problem to deal with methodhandle. We also start running dacapo. We encounter exceptions in h2 and lusearch/luindex. We are looking into them. I would like to continue to discuss CounterUppercase.java. We still suffer from duplicated allocation issue after we deploy DFS materialization. The problem is that we punt object elimination to C2 optimizer. It turns out C2 optimizer can?t eliminate cyclic object graph easily(https://bugs.openjdk.org/browse/JDK-8314179). By doing DFS materialization in Uppercase.java, we leave behind a useless object graph just like the attachment we upload. Is there a simple solution for this case? We have 2 ideas on C2 PEA side. The 1st one is a workaround and could be a short-term remedy. When we realize we end up with redundant allocation (-XX:+PEAParanoid), we recompile the current compilation unit without PEA. We observe that 'C2Compiler::compile_method' has deployed a retry mechanism for a few reasons, eg. subsume_loads. The 2nd idea is that we bring back passive materialization and take responsibility to eliminate the original AllocateNode. This is how Graal PEA does. We have proved that the original object is either scalar replaceable or useless. we will mark it in Parse and process it in macro-expansion phase. Of course, We are happy to work together with other developers on JDK-8314179. Leaving hectic jobs to C2 optimizer is desirable because it makes C2 PEA simpler. The biggest challenge I have so far is to capture the failures of the JIT compiler. It's as if trying to capture a rare Pok?mon, which has high possibility to escape. I have to spend a lot of time upfront finding a reproducible and then gradually pinpoint the problematic method. I understand that this is the inherent issue of "dynamic compiler'. I am told that there's an ongoing project which use c2 as a PGO static compiler. I wonder if it's possible to convert a dynamic compilation to a static compilation somehow. If I trained c2 to hit the bug in AOT mode, I guess I could bisect all compilation units and find the culprit quicker. Thanks, --lx -------------- next part -------------- An HTML attachment was scrubbed... URL: From btaylor at openjdk.org Mon Aug 14 22:53:32 2023 From: btaylor at openjdk.org (Ben Taylor) Date: Mon, 14 Aug 2023 22:53:32 GMT Subject: Integrated: 8312597: Convert TraceTypeProfile to UL In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 20:44:36 GMT, Ben Taylor wrote: > This PR adds the output from `-XX:+TraceTypeProfile` to the `jit` and `inlining` tags in unified logging. It also adds minimal tests for `-XX:+TraceTypeProfile` and `-Xlog:jit*=debug`. > > Change passes tier1 tests. This pull request has now been integrated. Changeset: 0074b48a Author: Ben Taylor Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/0074b48ad77d68ece8633a165aaba7f42bb52c5d Stats: 93 lines in 3 files changed: 86 ins; 0 del; 7 mod 8312597: Convert TraceTypeProfile to UL Reviewed-by: shade, phh ------------- PR: https://git.openjdk.org/jdk/pull/15167 From ysuenaga at openjdk.org Mon Aug 14 23:17:17 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Mon, 14 Aug 2023 23:17:17 GMT Subject: Integrated: 8313406: nep_invoker_blob can be simplified more In-Reply-To: References: Message-ID: On Mon, 31 Jul 2023 12:22:00 GMT, Yasumasa Suenaga wrote: > In FFM, native function would be called via `nep_invoker_blob`. If the function has two arguments, it would be following: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fcae394cd10 > -------------------------------------------------------------------------------- > 0x00007fcae394cd80: pushq %rbp > 0x00007fcae394cd81: movq %rsp, %rbp > 0x00007fcae394cd84: subq $0, %rsp > ;; { argument shuffle > 0x00007fcae394cd88: movq %r8, %rax > 0x00007fcae394cd8b: movq %rsi, %r10 > 0x00007fcae394cd8e: movq %rcx, %rsi > 0x00007fcae394cd91: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fcae394cd94: callq *%r10 > 0x00007fcae394cd97: leave > 0x00007fcae394cd98: retq > > > `subq $0, %rsp` is for shadow space on stack, and `movq %r8, %rax` is number of args for variadic function. So they are not necessary in some case. They should be remove following if they are not needed: > > > Decoding RuntimeStub - nep_invoker_blob 0x00007fd8778e2810 > -------------------------------------------------------------------------------- > 0x00007fd8778e2880: pushq %rbp > 0x00007fd8778e2881: movq %rsp, %rbp > ;; { argument shuffle > 0x00007fd8778e2884: movq %rsi, %r10 > 0x00007fd8778e2887: movq %rcx, %rsi > 0x00007fd8778e288a: movq %rdx, %rdi > ;; } argument shuffle > 0x00007fd8778e288d: callq *%r10 > 0x00007fd8778e2890: leave > 0x00007fd8778e2891: retq > > > All java/foreign jtreg tests are passed. > > We can see these stub code on [ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/examples/cpumodel) with `-XX:+UnlockDiagnosticVMOptions -XX:+PrintStubCode` and hsdis library. This testcase linked the code with `Linker.Option.isTrivial()`. > > After this change, FFM performance on [another ffmasm testcase](https://github.com/YaSuenag/ffmasm/tree/ef7a466ca9607164dbe7be7e68ea509d4bdac998/benchmarks/funccall) was improved: > > before: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 106664071.816 ? 14396524.718 ops/s > FuncCallComparison.rdtsc thrpt 3 108024079.738 ? 13223921.011 ops/s > > > after: > > Benchmark Mode Cnt Score Error Units > FuncCallComparison.invokeFFMRDTSC thrpt 3 107622971.525 ? 12249767.134 ops/s > FuncCallComparison.rdtsc thrpt 3 107695741.608 ? 23983281.346 ops/s > > > Environment: > * CPU: AMD Ryzen 3 3300X > * OS: Fedora 38 x86_64 (Kernel 6.3.8-200.fc38.x86_64) > * Hyper-V 4vCPU, 8GB mem This pull request has now been integrated. Changeset: 583cb754 Author: Yasumasa Suenaga URL: https://git.openjdk.org/jdk/commit/583cb754f38f5d32144e302ce5e82a3b36a2cb78 Stats: 41 lines in 3 files changed: 4 ins; 11 del; 26 mod 8313406: nep_invoker_blob can be simplified more Reviewed-by: jvernee, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/15089 From dholmes at openjdk.org Mon Aug 14 23:18:18 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Aug 2023 23:18:18 GMT Subject: RFR: 8312597: Convert TraceTypeProfile to UL In-Reply-To: References: Message-ID: <_6p2U1JTmujpCHMCAsOFK7bdfmvdPcsvgVolT5Hew7c=.9379bb0b-4ac7-4c16-bf63-77599c00496c@github.com> On Fri, 4 Aug 2023 20:44:36 GMT, Ben Taylor wrote: > This PR adds the output from `-XX:+TraceTypeProfile` to the `jit` and `inlining` tags in unified logging. It also adds minimal tests for `-XX:+TraceTypeProfile` and `-Xlog:jit*=debug`. > > Change passes tier1 tests. The new test files are causing a build failure in our CI due to an issue with the copyright notice and/or GPL header. Trying to ascertain exactly what it doesn't like ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/15167#issuecomment-1678210131 From dholmes at openjdk.org Mon Aug 14 23:29:19 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 14 Aug 2023 23:29:19 GMT Subject: RFR: 8312597: Convert TraceTypeProfile to UL In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 20:44:36 GMT, Ben Taylor wrote: > This PR adds the output from `-XX:+TraceTypeProfile` to the `jit` and `inlining` tags in unified logging. It also adds minimal tests for `-XX:+TraceTypeProfile` and `-Xlog:jit*=debug`. > > Change passes tier1 tests. IANAL but IIUC it is perfectly fine to have an Amazon only copyright notice, but the rest of the GPL header must be what Oracle has defined i.e. Oracle designates the CPE not Amazon, and the Oracle contact information must be present. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15167#issuecomment-1678220004 From sviswanathan at openjdk.org Mon Aug 14 23:50:09 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 14 Aug 2023 23:50:09 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 22:29:37 GMT, Srinivas Vamsi Parasa wrote: >>> Also need to handle arraySort in file: share/gc/shenandoah/c2/shenandoahSupport.cpp, function: ShenandoahBarrierC2Support::verify around line 3000. >> >> Updated the code in ShenandoahBarrierC2Support as suggested. > >> @vamsi-parasa With fastdebug build I see the following error: Internal Error (jdk/src/hotspot/share/opto/escape.cpp:1196), pid=3543536, tid=3543559 fatal error: EA unexpected CallLeaf arraysort_stub >> >> Please take a look. > > This was fixed as well. @vamsi-parasa We need to preserve NaNs. The base (https://github.com/intel/x86-simd-sort) algorithm used doesn't preserve NaNs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1678237261 From dholmes at openjdk.org Tue Aug 15 00:51:25 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 15 Aug 2023 00:51:25 GMT Subject: RFR: 8314244: Incorrect file headers in new tests from JDK-8312597 Message-ID: Trivial fix: - Test files don't need/use the Classpath Exception. - The Oracle contact details were missing. These headers now match other Amazon-only copyrighted test files. Thanks. ------------- Commit messages: - 8314244: Incorrect file headers in new tests from JDK-8312597 Changes: https://git.openjdk.org/jdk/pull/15282/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15282&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314244 Stats: 16 lines in 2 files changed: 8 ins; 6 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15282.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15282/head:pull/15282 PR: https://git.openjdk.org/jdk/pull/15282 From fgao at openjdk.org Tue Aug 15 01:07:25 2023 From: fgao at openjdk.org (Fei Gao) Date: Tue, 15 Aug 2023 01:07:25 GMT Subject: Integrated: 8308340: C2: Idealize Fma nodes In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 03:26:38 GMT, Fei Gao wrote: > Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: > > > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > > > Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. This pull request has now been integrated. Changeset: 37c6b23f Author: Fei Gao URL: https://git.openjdk.org/jdk/commit/37c6b23f5b82311c82f5fe981f104824f87e3e54 Stats: 689 lines in 20 files changed: 469 ins; 118 del; 102 mod 8308340: C2: Idealize Fma nodes Reviewed-by: kvn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/14576 From iklam at openjdk.org Tue Aug 15 03:56:21 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 15 Aug 2023 03:56:21 GMT Subject: RFR: 8314248: Remove HotSpotConstantPool::isResolvedDynamicInvoke Message-ID: This method is not used and its implementation is wrong. It should be removed. ------------- Commit messages: - 8314248: Remove HotSpotConstantPool::isResolvedDynamicInvoke Changes: https://git.openjdk.org/jdk/pull/15283/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15283&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314248 Stats: 20 lines in 1 file changed: 0 ins; 20 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15283/head:pull/15283 PR: https://git.openjdk.org/jdk/pull/15283 From lmesnik at openjdk.org Tue Aug 15 04:05:07 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 15 Aug 2023 04:05:07 GMT Subject: RFR: 8314244: Incorrect file headers in new tests from JDK-8312597 In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 00:44:36 GMT, David Holmes wrote: > Trivial fix: > > - Test files don't need/use the Classpath Exception. > - The Oracle contact details were missing. > > These headers now match other Amazon-only copyrighted test files. > > Thanks. Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15282#pullrequestreview-1577937335 From kvn at openjdk.org Tue Aug 15 04:32:13 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Aug 2023 04:32:13 GMT Subject: RFR: 8314244: Incorrect file headers in new tests from JDK-8312597 In-Reply-To: References: Message-ID: <8IvDFllDs0eVpem4m5Waz5sQAmPB3VdQaL3SMzM6kY4=.0cf39bb0-5385-4cec-91c5-043a54a3884b@github.com> On Tue, 15 Aug 2023 00:44:36 GMT, David Holmes wrote: > Trivial fix: > > - Test files don't need/use the Classpath Exception. > - The Oracle contact details were missing. > > These headers now match other Amazon-only copyrighted test files. > > Thanks. Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15282#pullrequestreview-1577950107 From dholmes at openjdk.org Tue Aug 15 04:32:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 15 Aug 2023 04:32:14 GMT Subject: RFR: 8314244: Incorrect file headers in new tests from JDK-8312597 In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 04:02:00 GMT, Leonid Mesnik wrote: >> Trivial fix: >> >> - Test files don't need/use the Classpath Exception. >> - The Oracle contact details were missing. >> >> These headers now match other Amazon-only copyrighted test files. >> >> Thanks. > > Marked as reviewed by lmesnik (Reviewer). Thanks @lmesnik ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15282#issuecomment-1678412750 From dholmes at openjdk.org Tue Aug 15 04:32:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 15 Aug 2023 04:32:15 GMT Subject: Integrated: 8314244: Incorrect file headers in new tests from JDK-8312597 In-Reply-To: References: Message-ID: <5c3dBLI9IMrFDOSiRTecW9Etmokp8CK4-Xj1Xyoh8Wg=.77c77fa6-28f7-48b6-b894-198b5a00daa7@github.com> On Tue, 15 Aug 2023 00:44:36 GMT, David Holmes wrote: > Trivial fix: > > - Test files don't need/use the Classpath Exception. > - The Oracle contact details were missing. > > These headers now match other Amazon-only copyrighted test files. > > Thanks. This pull request has now been integrated. Changeset: b7dee213 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/b7dee213dfb2d0ec4e22837898bf4837c1fe523d Stats: 16 lines in 2 files changed: 8 ins; 6 del; 2 mod 8314244: Incorrect file headers in new tests from JDK-8312597 Reviewed-by: lmesnik, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15282 From thartmann at openjdk.org Tue Aug 15 06:14:15 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Aug 2023 06:14:15 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v31] In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 14:35:58 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - more review suggestions from christian > - Apply suggestions from code review (christian) > > Co-authored-by: Christian Hagedorn Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14539#pullrequestreview-1578022947 From thartmann at openjdk.org Tue Aug 15 06:15:06 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Aug 2023 06:15:06 GMT Subject: RFR: 8314248: Remove HotSpotConstantPool::isResolvedDynamicInvoke In-Reply-To: References: Message-ID: <5eOVMCPWZ9NSz7zOq9OTh-D1MplPaSRDZ1_VS8LhVyQ=.8af3d336-4d3f-43a7-876f-3c92f1de9941@github.com> On Tue, 15 Aug 2023 03:48:43 GMT, Ioi Lam wrote: > This method is not used and its implementation is wrong. It should be removed. Looks good to me but JVMCI experts should also review this (@dougxc). ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15283#pullrequestreview-1578023819 From epeter at openjdk.org Tue Aug 15 06:32:35 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Aug 2023 06:32:35 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v32] In-Reply-To: References: Message-ID: <8EKARdkSJ46A9U64hG4KUPiINkGyvVUf9FmJ2Qf0xk4=.32740439-7a1b-46ff-bc0c-17c88bba272f@github.com> > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 81 commits: - merge manual - more review suggestions from christian - Apply suggestions from code review (christian) Co-authored-by: Christian Hagedorn - Fix comments according to TobiHartmann - Merge branch 'master' into JDK-8310308 - take out cascade lake simulation - fix copyright format - fix whitespace issues - fix TestSafepointWhilePrinting.java - enable equal count comparison, tighten default cascade lake special casing - ... and 71 more: https://git.openjdk.org/jdk/compare/b7dee213...184dff01 ------------- Changes: https://git.openjdk.org/jdk/pull/14539/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=31 Stats: 3519 lines in 70 files changed: 1451 ins; 21 del; 2047 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From dnsimon at openjdk.org Tue Aug 15 06:53:07 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 15 Aug 2023 06:53:07 GMT Subject: RFR: 8314248: Remove HotSpotConstantPool::isResolvedDynamicInvoke In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 03:48:43 GMT, Ioi Lam wrote: > This method is not used and its implementation is wrong. It should be removed. LGTM. I think this is a leftover from jaotc. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15283#pullrequestreview-1578052639 From chagedorn at openjdk.org Tue Aug 15 06:54:17 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Aug 2023 06:54:17 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v32] In-Reply-To: <8EKARdkSJ46A9U64hG4KUPiINkGyvVUf9FmJ2Qf0xk4=.32740439-7a1b-46ff-bc0c-17c88bba272f@github.com> References: <8EKARdkSJ46A9U64hG4KUPiINkGyvVUf9FmJ2Qf0xk4=.32740439-7a1b-46ff-bc0c-17c88bba272f@github.com> Message-ID: On Tue, 15 Aug 2023 06:32:35 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 81 commits: > > - merge manual > - more review suggestions from christian > - Apply suggestions from code review (christian) > > Co-authored-by: Christian Hagedorn > - Fix comments according to TobiHartmann > - Merge branch 'master' into JDK-8310308 > - take out cascade lake simulation > - fix copyright format > - fix whitespace issues > - fix TestSafepointWhilePrinting.java > - enable equal count comparison, tighten default cascade lake special casing > - ... and 71 more: https://git.openjdk.org/jdk/compare/b7dee213...184dff01 Thanks for the updates, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14539#pullrequestreview-1578053414 From shade at openjdk.org Tue Aug 15 07:13:18 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Aug 2023 07:13:18 GMT Subject: RFR: 8312597: Convert TraceTypeProfile to UL In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 23:26:35 GMT, David Holmes wrote: > IANAL but IIUC it is perfectly fine to have an Amazon only copyright notice, but the rest of the GPL header must be what Oracle has defined i.e. Oracle designates the CPE not Amazon, and the Oracle contact information must be present. D'oh. I completely missed the header text was changed. Yes, it should be the way you made it in https://github.com/openjdk/jdk/commit/b7dee213dfb2d0ec4e22837898bf4837c1fe523d. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15167#issuecomment-1678513085 From adinn at openjdk.org Tue Aug 15 07:16:07 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 15 Aug 2023 07:16:07 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 12:18:02 GMT, Christian Hagedorn wrote: > This reapplies JDK-8308682 (i.e. reverse the backout done with [JDK-8313756](https://bugs.openjdk.org/browse/JDK-8313756)) but attributes it correctly to @theRealAph together with @adinn and @sviswa7 as additional reviewers. > > The redo applied cleanly. > > Thanks, > Christian These changes look fine, thanks! ------------- PR Review: https://git.openjdk.org/jdk/pull/15267#pullrequestreview-1578073281 From shade at openjdk.org Tue Aug 15 08:57:16 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Aug 2023 08:57:16 GMT Subject: RFR: 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp [v2] In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 20:28:58 GMT, Ioi Lam wrote: >> This PR updates Java code in JVMCI to match the C code changes in [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996) that modified the constant pool layout. Essentially, the indices after a getfield/putfield/getstatic/putstatic bytecode is no longer a CpCacheIndex, but an index for `ConstantPool::resolved_field_entry_at(int field_index)`. >> >> The assertion (and subsequent crash) happen only in debug builds. Out of pure luck, in product build JVMCI produces the correct output even after [JDK-8301996](https://bugs.openjdk.java.net/browse/JDK-8301996), although the code was doing the wrong thing. >> >> This PR has (so far) two commits: >> >> - 6527e67b1832087d180d2b50b65aaeca2e244c28 The actual functional change to use the `rawIndex` that follows a field bytecode. >> - c322b8e71d4d9e33bd065e64420101010f9127fc Fixed incorrectly named parameters and variables in the JVMCI code and JavaDoc. In most cases, `cpi` needs to be changed to `rawIndex` to reflect the correct type of the index. >> >> To help reviewing, I am limiting the renaming to just those affected by the field changes (without the renames, it's hard to validate that I am actually doing the right thing). >> >> There are still some cases of `cpi` that need to be changed to `rawIndex`. I will fix those in a separate RFE. E.g. in ConstantPool.java: >> >> >> default JavaMethod lookupMethod(int cpi, int opcode) { >> return lookupMethod(cpi, opcode, null); >> } > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @dougxc review: Added comments about rawIndex vs cpi src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 783: > 781: @Override > 782: public JavaField lookupField(int rawIndex, ResolvedJavaMethod method, int opcode) { > 783: final int cpi = compilerToVM().decodeFieldIndexToCPIndex(this, rawIndex); I have the new warning here: `cpi` is not used. I guess it is correct, seeing how methods are accepting `rawIndex` now, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15237#discussion_r1294346394 From epeter at openjdk.org Tue Aug 15 10:09:18 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Aug 2023 10:09:18 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v31] In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 06:11:16 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - more review suggestions from christian >> - Apply suggestions from code review (christian) >> >> Co-authored-by: Christian Hagedorn > > Marked as reviewed by thartmann (Reviewer). Thanks @TobiHartmann @chhagedorn for all the discussions, help and reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14539#issuecomment-1678704021 From epeter at openjdk.org Tue Aug 15 10:12:28 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Aug 2023 10:12:28 GMT Subject: Integrated: 8310308: IR Framework: check for type and size of vector nodes In-Reply-To: References: Message-ID: <4A4g9ei-SxvTBCAcDeUCLiMRcUtNs89Co8DO45y4Tko=.50e5195e-af55-48f7-83ca-1fd5be08fee7@github.com> On Mon, 19 Jun 2023 11:52:42 GMT, Emanuel Peter wrote: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... This pull request has now been integrated. Changeset: a02d65ef Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/a02d65efccaab5bb7c2f2aad4a2eb5062f545ef8 Stats: 3519 lines in 70 files changed: 1451 ins; 21 del; 2047 mod 8310308: IR Framework: check for type and size of vector nodes Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14539 From chagedorn at openjdk.org Tue Aug 15 10:18:08 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Aug 2023 10:18:08 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 07:13:33 GMT, Andrew Dinn wrote: >> This reapplies JDK-8308682 (i.e. reverse the backout done with [JDK-8313756](https://bugs.openjdk.org/browse/JDK-8313756)) but attributes it correctly to @theRealAph together with @adinn and @sviswa7 as additional reviewers. >> >> The redo applied cleanly. >> >> Thanks, >> Christian > > These changes look fine, thanks! Thanks @adinn! I think you additionally need to approve it since the bot does not accept only manually added reviewers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15267#issuecomment-1678712823 From aph at openjdk.org Tue Aug 15 11:26:10 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 15 Aug 2023 11:26:10 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: References: Message-ID: <6hZXNmNF0JTrPoYwsCmMHA08KHBj6u0jOThieNrGYhM=.0736f11a-7d6d-446f-a2f0-c1292fb56f30@github.com> On Mon, 14 Aug 2023 12:18:02 GMT, Christian Hagedorn wrote: > This reapplies JDK-8308682 (i.e. reverse the backout done with [JDK-8313756](https://bugs.openjdk.org/browse/JDK-8313756)) but attributes it correctly to @theRealAph together with @adinn and @sviswa7 as additional reviewers. > > The redo applied cleanly. > > Thanks, > Christian Thank you. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15267#pullrequestreview-1578372895 From rehn at openjdk.org Tue Aug 15 11:56:22 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 15 Aug 2023 11:56:22 GMT Subject: RFR: 8314268: Missing include in assembler_riscv.hpp Message-ID: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> Hello, please consider. ------------- Commit messages: - Added a missing include Changes: https://git.openjdk.org/jdk/pull/15285/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15285&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314268 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15285/head:pull/15285 PR: https://git.openjdk.org/jdk/pull/15285 From shade at openjdk.org Tue Aug 15 12:02:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Aug 2023 12:02:08 GMT Subject: RFR: 8314268: Missing include in assembler_riscv.hpp In-Reply-To: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> References: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> Message-ID: <3e1bVB7YoHlguVels4vfYeKtqsAdxa5u0CZB5evngvM=.45d0337f-aa9d-4f12-802d-efe8a6065b0f@github.com> On Tue, 15 Aug 2023 11:50:17 GMT, Robbin Ehn wrote: > Hello, please consider. What's the symptom for the problem? Build failure in some unusual config? Or this is just a cleanliness ("use the symbol in the header, include the definition without relying on transitive includes")? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15285#issuecomment-1678813492 From chagedorn at openjdk.org Tue Aug 15 12:16:07 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Aug 2023 12:16:07 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: <6hZXNmNF0JTrPoYwsCmMHA08KHBj6u0jOThieNrGYhM=.0736f11a-7d6d-446f-a2f0-c1292fb56f30@github.com> References: <6hZXNmNF0JTrPoYwsCmMHA08KHBj6u0jOThieNrGYhM=.0736f11a-7d6d-446f-a2f0-c1292fb56f30@github.com> Message-ID: <3ybDhfQAdvqz_6u19hKmkSHaFNpJI2xbsuTOkTk4Whc=.4990c920-183d-41b4-9b69-9c30049fa0df@github.com> On Tue, 15 Aug 2023 11:23:10 GMT, Andrew Haley wrote: >> This reapplies JDK-8308682 (i.e. reverse the backout done with [JDK-8313756](https://bugs.openjdk.org/browse/JDK-8313756)) but attributes it correctly to @theRealAph together with @adinn and @sviswa7 as additional reviewers. >> >> The redo applied cleanly. >> >> Thanks, >> Christian > > Thank you. Thanks @theRealAph for your review! For some reason, @adinn was removed again as reviewer by the bot. I'll add him again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15267#issuecomment-1678832314 From chagedorn at openjdk.org Tue Aug 15 12:19:08 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Aug 2023 12:19:08 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: <3ybDhfQAdvqz_6u19hKmkSHaFNpJI2xbsuTOkTk4Whc=.4990c920-183d-41b4-9b69-9c30049fa0df@github.com> References: <6hZXNmNF0JTrPoYwsCmMHA08KHBj6u0jOThieNrGYhM=.0736f11a-7d6d-446f-a2f0-c1292fb56f30@github.com> <3ybDhfQAdvqz_6u19hKmkSHaFNpJI2xbsuTOkTk4Whc=.4990c920-183d-41b4-9b69-9c30049fa0df@github.com> Message-ID: On Tue, 15 Aug 2023 12:12:56 GMT, Christian Hagedorn wrote: >> Thank you. > > Thanks @theRealAph for your review! > > For some reason, @adinn was removed again as reviewer by the bot. I'll add him again. > @chhagedorn Reviewer `adinn` has already made an authenticated review of this PR, and does not need to be credited manually. @adinn I'm not sure why it says that but does not list you above. Maybe you need to explicitly approve the PR again to appear in the reviewer list again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15267#issuecomment-1678835339 From kcr at openjdk.org Tue Aug 15 12:33:11 2023 From: kcr at openjdk.org (Kevin Rushforth) Date: Tue, 15 Aug 2023 12:33:11 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: References: <6hZXNmNF0JTrPoYwsCmMHA08KHBj6u0jOThieNrGYhM=.0736f11a-7d6d-446f-a2f0-c1292fb56f30@github.com> <3ybDhfQAdvqz_6u19hKmkSHaFNpJI2xbsuTOkTk4Whc=.4990c920-183d-41b4-9b69-9c30049fa0df@github.com> Message-ID: On Tue, 15 Aug 2023 12:15:55 GMT, Christian Hagedorn wrote: >> Reviewer adinn has already made an authenticated review of this PR, and does not need to be credited manually. > > @adinn I'm not sure why it says that but does not list you above. Maybe you need to explicitly approve the PR again to appear in the reviewer list again. Yes, GitHub now sees @adinn as having reviewed, but without approving it, so Skara will not allow him to be added manually. If he reviews / approves, he will then be listed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15267#issuecomment-1678851603 From chagedorn at openjdk.org Tue Aug 15 12:37:08 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Aug 2023 12:37:08 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: References: <6hZXNmNF0JTrPoYwsCmMHA08KHBj6u0jOThieNrGYhM=.0736f11a-7d6d-446f-a2f0-c1292fb56f30@github.com> <3ybDhfQAdvqz_6u19hKmkSHaFNpJI2xbsuTOkTk4Whc=.4990c920-183d-41b4-9b69-9c30049fa0df@github.com> Message-ID: On Tue, 15 Aug 2023 12:30:01 GMT, Kevin Rushforth wrote: >>> @chhagedorn Reviewer `adinn` has already made an authenticated review of this PR, and does not need to be credited manually. >> >> @adinn I'm not sure why it says that but does not list you above. Maybe you need to explicitly approve the PR again to appear in the reviewer list again. > >>> Reviewer adinn has already made an authenticated review of this PR, and does not need to be credited manually. >> >> @adinn I'm not sure why it says that but does not list you above. Maybe you need to explicitly approve the PR again to appear in the reviewer list again. > > Yes, GitHub now sees @adinn as having reviewed, but without approving it, so Skara will not allow him to be added manually. If he reviews / approves, he will then be listed. Thanks @kevinrushforth for the explanation! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15267#issuecomment-1678856431 From epeter at openjdk.org Tue Aug 15 12:58:09 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Aug 2023 12:58:09 GMT Subject: RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE In-Reply-To: References: <-McZdVKFHZcQcCJhosf7KVw34o6ZcAHr0hqGH7QIqsw=.eafa3a9e-f521-4097-aa6c-b00d5302b63d@github.com> Message-ID: On Fri, 4 Aug 2023 03:18:14 GMT, Pengfei Li wrote: >> @pfustc @TobiHartmann I just saw this on my emails. So I'll give a quick response: >> >> We had this running on Aarch64 machines with `asimd` but without `sve`. Why do you think that this even passed with my 32 byte assumption (256 bit)? You say it should only have 128 bit. >> >> What is the `max_pre` for? Is it necessary? >> >> Adding the script to the jdk repo could be nice. But I think there may be some issues with adding python files. We may have to rewrite it in java. But I think for now adding and updating it via JBS is ok. > >> We had this running on Aarch64 machines with asimd but without sve. Why do you think that this even passed with my 32 byte assumption (256 bit)? You say it should only have 128 bit. > > Assuming NEON has larger vector size (256 bit, which is wrong) won't result in any failure on NEON-only machines. But it results in running less IR checks on 256-bit SVE. Let's take below IR condition change as an example. > > - applyIfAnd = {"AlignVector", "false", "MaxVectorSize", ">= 8", "MaxVectorSize", "<= 16"}, > + applyIfAnd = {"AlignVector", "false", "MaxVectorSize", ">= 8"}, > > Before this patch, the existence of vector IRs won't be checked on 256-bit SVE as we have `MaxVectorSize <= 16`. After this patch, it will be checked. The main reason of failures on 512-bit SVE is the lack of `sve == false` check so the IR tests will run on machines with vector length > 256 bits. > >> What is the max_pre for? Is it necessary? > > It just adds a prefix to make the comment more precise, as SVE uses scalable vectors and the vector length ranges from 128 bits to 2048 bits. @pfustc you will have to merge the changes from [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15010#issuecomment-1678887500 From epeter at openjdk.org Tue Aug 15 13:00:13 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Aug 2023 13:00:13 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v2] In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 09:15:28 GMT, Pengfei Li wrote: >> @vnkozlov @TobiHartmann we should re-run testing from our side. >> >> @pfustc Why do you only test correctness (compare results) in some conditions? Is there not a risk that we miss doing it in some cases we should do it, just because we get the conditions slightly wrong? >> >> Just FYI: we should integrate this whole correctness of results testing into the IR framework. I filed [JDK-8310533](https://bugs.openjdk.org/browse/JDK-8310533). That would make it easier to use for new tests. It could also be used for any test, not just the ones located in `test/hotspot/jtreg/compiler/vectorization`. > > Hi @eme64 , > > Thanks for looking at this. > >> @pfustc Why do you only test correctness (compare results) in some conditions? Is there not a risk that we miss doing it in some cases we should do it, just because we get the conditions slightly wrong? > > Yes, you are right! These conditions are added before to avoid jtreg hanging when compilation is locked. But now I can remove them because the lock is removed. In my latest commit, I have removed the conditions and some useless imports. > >> Just FYI: we should integrate this whole correctness of results testing into the IR framework. I filed [JDK-8310533](https://bugs.openjdk.org/browse/JDK-8310533). That would make it easier to use for new tests. It could also be used for any test, not just the ones located in test/hotspot/jtreg/compiler/vectorization. > > I have noticed this JBS before. The reason I didn't added correctness check into the IR framework is that I implemented this kind of check before the IR framework exists. (We have used it internally for a few years.) But anyway, it is a good proposal and I'm willing to help if needed. @pfustc Since you are working on IR tests here, it would be good if you first merge the changes from [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308), and test the IR rules again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1678889705 From epeter at openjdk.org Tue Aug 15 13:28:15 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Aug 2023 13:28:15 GMT Subject: RFR: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally Message-ID: **Problem** In my recent fix of [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302) I forgot to check that the `Bool` node in the `Cmp -> Bool -> CMove` complex must have the same test value for all `Bool` nodes in the pack. Without that check, we fail to see the difference between: https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L354-L357 https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L384-L387 While the first hand-unrolled example has the same test value (tl) in both lines (packing ok) the second example has different test values (lt and le). Before this fix we would just assume they have the same test value, and therefore also use lt for the second line as a consequence. That can lead to wrong results. **Solution** `SuperWord::isomorphic` should return `false` if two `Bool` nodes do not have the same test value. That ensures that only `Bool` nodes with the same test value will ever be packed, since isomorphism is a requirement for packing. In addition, I also added verification code in `SuperWord::output`, just before we turn the `Cmp -> Bool -> CMove` scalar nodes into vector nodes. **Testing** Added Regression Test. Ran Tier1-6 + stress-testing. ------------- Commit messages: - fix after merge for vector size - Merge branch 'master' into JDK-8313720 - remove simple test - add IR tests - 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally Changes: https://git.openjdk.org/jdk/pull/15274/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15274&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313720 Stats: 173 lines in 2 files changed: 164 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/15274.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15274/head:pull/15274 PR: https://git.openjdk.org/jdk/pull/15274 From rehn at openjdk.org Tue Aug 15 13:31:10 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 15 Aug 2023 13:31:10 GMT Subject: RFR: 8314268: Missing include in assembler_riscv.hpp In-Reply-To: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> References: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> Message-ID: On Tue, 15 Aug 2023 11:50:17 GMT, Robbin Ehn wrote: > Hello, please consider. Just WIP local changes were I included assembler_riscv.hpp for Assebmler::LMUL, and notice this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15285#issuecomment-1678936289 From thartmann at openjdk.org Tue Aug 15 13:45:07 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Aug 2023 13:45:07 GMT Subject: RFR: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 14:57:39 GMT, Emanuel Peter wrote: > **Problem** > > In my recent fix of [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302) I forgot to check that the `Bool` node in the `Cmp -> Bool -> CMove` complex must have the same test value for all `Bool` nodes in the pack. Without that check, we fail to see the difference between: > > https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L354-L357 > > https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L384-L387 > > While the first hand-unrolled example has the same test value (tl) in both lines (packing ok) the second example has different test values (lt and le). Before this fix we would just assume they have the same test value, and therefore also use lt for the second line as a consequence. That can lead to wrong results. > > **Solution** > > `SuperWord::isomorphic` should return `false` if two `Bool` nodes do not have the same test value. That ensures that only `Bool` nodes with the same test value will ever be packed, since isomorphism is a requirement for packing. > > In addition, I also added verification code in `SuperWord::output`, just before we turn the `Cmp -> Bool -> CMove` scalar nodes into vector nodes. > > **Testing** > > Added Regression Test. Ran Tier1-6 + stress-testing. src/hotspot/share/opto/superword.cpp line 2660: > 2658: assert(p_bol != nullptr, "CMove must have matching Bool pack"); > 2659: > 2660: #ifndef PRODUCT I think this should be `#ifdef ASSERT`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15274#discussion_r1294613939 From chagedorn at openjdk.org Tue Aug 15 13:49:35 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Aug 2023 13:49:35 GMT Subject: RFR: 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected Message-ID: In the testcase, we try to initialize Assertion Predicates from the templates which have an `Opaque4` node. However, the code finds an unrelated `Opaque4` node added by an intrinsic. This, obviously, will not guard any `OpaqueLoop*` nodes required for Template Assertion Predicates and we fail with the assertion. While splitting some changes away from the main fix of [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981) for [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636), I wrongly removed the check in loop peeling if an `Opaque4` node belongs to an `If` that also shares the uncommon trap with the Parse Predicate (this is not required in the main fix anymore because Template Assertion Predicates will always only have `HaltNodes` - but optimizing this here is wrong/too early). The fix re-establishes the check for the uncommon trap. Thanks, Christian ------------- Commit messages: - Add new line at end of test - 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected Changes: https://git.openjdk.org/jdk/pull/15290/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15290&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314233 Stats: 69 lines in 2 files changed: 68 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15290/head:pull/15290 PR: https://git.openjdk.org/jdk/pull/15290 From thartmann at openjdk.org Tue Aug 15 13:55:07 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Aug 2023 13:55:07 GMT Subject: RFR: 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 13:42:48 GMT, Christian Hagedorn wrote: > In the testcase, we try to initialize Assertion Predicates from the templates which have an `Opaque4` node. However, the code finds an unrelated `Opaque4` node added by an intrinsic. This, obviously, will not guard any `OpaqueLoop*` nodes required for Template Assertion Predicates and we fail with the assertion. > > While splitting some changes away from the main fix of [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981) for [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636), I wrongly removed the check in loop peeling if an `Opaque4` node belongs to an `If` that also shares the uncommon trap with the Parse Predicate (this is not required in the main fix anymore because Template Assertion Predicates will always only have `HaltNodes` - but optimizing this here is wrong/too early). > > The fix re-establishes the check for the uncommon trap. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15290#pullrequestreview-1578603607 From chagedorn at openjdk.org Tue Aug 15 13:55:08 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Aug 2023 13:55:08 GMT Subject: RFR: 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected In-Reply-To: References: Message-ID: <40CYn7a2i-hdBbTrC0s_k5l7qLHI-w3wwpqK17OqRZk=.d2737132-7824-4885-91b1-e355d751e464@github.com> On Tue, 15 Aug 2023 13:42:48 GMT, Christian Hagedorn wrote: > In the testcase, we try to initialize Assertion Predicates from the templates which have an `Opaque4` node. However, the code finds an unrelated `Opaque4` node added by an intrinsic. This, obviously, will not guard any `OpaqueLoop*` nodes required for Template Assertion Predicates and we fail with the assertion. > > While splitting some changes away from the main fix of [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981) for [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636), I wrongly removed the check in loop peeling if an `Opaque4` node belongs to an `If` that also shares the uncommon trap with the Parse Predicate (this is not required in the main fix anymore because Template Assertion Predicates will always only have `HaltNodes` - but optimizing this here is wrong/too early). > > The fix re-establishes the check for the uncommon trap. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15290#issuecomment-1678968040 From chagedorn at openjdk.org Tue Aug 15 14:02:08 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Aug 2023 14:02:08 GMT Subject: RFR: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 14:57:39 GMT, Emanuel Peter wrote: > **Problem** > > In my recent fix of [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302) I forgot to check that the `Bool` node in the `Cmp -> Bool -> CMove` complex must have the same test value for all `Bool` nodes in the pack. Without that check, we fail to see the difference between: > > https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L354-L357 > > https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L384-L387 > > While the first hand-unrolled example has the same test value (tl) in both lines (packing ok) the second example has different test values (lt and le). Before this fix we would just assume they have the same test value, and therefore also use lt for the second line as a consequence. That can lead to wrong results. > > **Solution** > > `SuperWord::isomorphic` should return `false` if two `Bool` nodes do not have the same test value. That ensures that only `Bool` nodes with the same test value will ever be packed, since isomorphism is a requirement for packing. > > In addition, I also added verification code in `SuperWord::output`, just before we turn the `Cmp -> Bool -> CMove` scalar nodes into vector nodes. > > **Testing** > > Added Regression Test. Ran Tier1-6 + stress-testing. Apart from Tobias' suggestion, the fix looks good! You could also add the bug number of this fix to the `@bug` in the JTreg test. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15274#pullrequestreview-1578619983 From epeter at openjdk.org Tue Aug 15 14:38:51 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Aug 2023 14:38:51 GMT Subject: RFR: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally [v2] In-Reply-To: References: Message-ID: > **Problem** > > In my recent fix of [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302) I forgot to check that the `Bool` node in the `Cmp -> Bool -> CMove` complex must have the same test value for all `Bool` nodes in the pack. Without that check, we fail to see the difference between: > > https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L354-L357 > > https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L384-L387 > > While the first hand-unrolled example has the same test value (tl) in both lines (packing ok) the second example has different test values (lt and le). Before this fix we would just assume they have the same test value, and therefore also use lt for the second line as a consequence. That can lead to wrong results. > > **Solution** > > `SuperWord::isomorphic` should return `false` if two `Bool` nodes do not have the same test value. That ensures that only `Bool` nodes with the same test value will ever be packed, since isomorphism is a requirement for packing. > > In addition, I also added verification code in `SuperWord::output`, just before we turn the `Cmp -> Bool -> CMove` scalar nodes into vector nodes. > > **Testing** > > Added Regression Test. Ran Tier1-6 + stress-testing. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15274/files - new: https://git.openjdk.org/jdk/pull/15274/files/a70951cf..a827c9fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15274&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15274&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15274.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15274/head:pull/15274 PR: https://git.openjdk.org/jdk/pull/15274 From thartmann at openjdk.org Tue Aug 15 14:46:09 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 15 Aug 2023 14:46:09 GMT Subject: RFR: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally [v2] In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 14:38:51 GMT, Emanuel Peter wrote: >> **Problem** >> >> In my recent fix of [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302) I forgot to check that the `Bool` node in the `Cmp -> Bool -> CMove` complex must have the same test value for all `Bool` nodes in the pack. Without that check, we fail to see the difference between: >> >> https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L354-L357 >> >> https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L384-L387 >> >> While the first hand-unrolled example has the same test value (tl) in both lines (packing ok) the second example has different test values (lt and le). Before this fix we would just assume they have the same test value, and therefore also use lt for the second line as a consequence. That can lead to wrong results. >> >> **Solution** >> >> `SuperWord::isomorphic` should return `false` if two `Bool` nodes do not have the same test value. That ensures that only `Bool` nodes with the same test value will ever be packed, since isomorphism is a requirement for packing. >> >> In addition, I also added verification code in `SuperWord::output`, just before we turn the `Cmp -> Bool -> CMove` scalar nodes into vector nodes. >> >> **Testing** >> >> Added Regression Test. Ran Tier1-6 + stress-testing. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > review suggestions Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15274#pullrequestreview-1578709488 From chagedorn at openjdk.org Tue Aug 15 15:08:10 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 15 Aug 2023 15:08:10 GMT Subject: RFR: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally [v2] In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 14:38:51 GMT, Emanuel Peter wrote: >> **Problem** >> >> In my recent fix of [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302) I forgot to check that the `Bool` node in the `Cmp -> Bool -> CMove` complex must have the same test value for all `Bool` nodes in the pack. Without that check, we fail to see the difference between: >> >> https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L354-L357 >> >> https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L384-L387 >> >> While the first hand-unrolled example has the same test value (tl) in both lines (packing ok) the second example has different test values (lt and le). Before this fix we would just assume they have the same test value, and therefore also use lt for the second line as a consequence. That can lead to wrong results. >> >> **Solution** >> >> `SuperWord::isomorphic` should return `false` if two `Bool` nodes do not have the same test value. That ensures that only `Bool` nodes with the same test value will ever be packed, since isomorphism is a requirement for packing. >> >> In addition, I also added verification code in `SuperWord::output`, just before we turn the `Cmp -> Bool -> CMove` scalar nodes into vector nodes. >> >> **Testing** >> >> Added Regression Test. Ran Tier1-6 + stress-testing. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > review suggestions Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15274#pullrequestreview-1578755238 From iklam at openjdk.org Tue Aug 15 15:10:19 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 15 Aug 2023 15:10:19 GMT Subject: RFR: 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp [v2] In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 08:54:05 GMT, Aleksey Shipilev wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @dougxc review: Added comments about rawIndex vs cpi > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 783: > >> 781: @Override >> 782: public JavaField lookupField(int rawIndex, ResolvedJavaMethod method, int opcode) { >> 783: final int cpi = compilerToVM().decodeFieldIndexToCPIndex(this, rawIndex); > > I have the new warning here: `cpi` is not used. I guess it is correct, seeing how methods are accepting `rawIndex` now, right? You're right. `cpi` is not used, as the next line uses `rawIndex` to obtain information about the field bytecode. final int nameAndTypeIndex = getNameAndTypeRefIndexAt(rawIndex, opcode); I'll remove the `cpi` line in my next JVMCI cleanup PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15237#discussion_r1294723660 From never at openjdk.org Tue Aug 15 15:47:15 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 15 Aug 2023 15:47:15 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension [v4] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 17:26:28 GMT, Tom Rodriguez wrote: >> Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. > > Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: > > LibJVMCICompilerThreadHidden should just be EXPERIMENTAL Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14799#issuecomment-1679166520 From never at openjdk.org Tue Aug 15 15:47:17 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 15 Aug 2023 15:47:17 GMT Subject: Integrated: 8311557: [JVMCI] deadlock with JVMTI thread suspension In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:13:21 GMT, Tom Rodriguez wrote: > Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. This pull request has now been integrated. Changeset: 004651dd Author: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/004651ddc281be04ea736807797658d64a5a7337 Stats: 23 lines in 6 files changed: 17 ins; 0 del; 6 mod 8311557: [JVMCI] deadlock with JVMTI thread suspension Reviewed-by: thartmann, dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/14799 From iklam at openjdk.org Tue Aug 15 15:57:16 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 15 Aug 2023 15:57:16 GMT Subject: RFR: 8314248: Remove HotSpotConstantPool::isResolvedDynamicInvoke In-Reply-To: <5eOVMCPWZ9NSz7zOq9OTh-D1MplPaSRDZ1_VS8LhVyQ=.8af3d336-4d3f-43a7-876f-3c92f1de9941@github.com> References: <5eOVMCPWZ9NSz7zOq9OTh-D1MplPaSRDZ1_VS8LhVyQ=.8af3d336-4d3f-43a7-876f-3c92f1de9941@github.com> Message-ID: On Tue, 15 Aug 2023 06:12:35 GMT, Tobias Hartmann wrote: >> This method is not used and its implementation is wrong. It should be removed. > > Looks good to me but JVMCI experts should also review this (@dougxc). Thanks @TobiHartmann and @dougxc for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15283#issuecomment-1679180043 From iklam at openjdk.org Tue Aug 15 15:57:18 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 15 Aug 2023 15:57:18 GMT Subject: Integrated: 8314248: Remove HotSpotConstantPool::isResolvedDynamicInvoke In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 03:48:43 GMT, Ioi Lam wrote: > This method is not used and its implementation is wrong. It should be removed. This pull request has now been integrated. Changeset: 80809ef4 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/80809ef4ccdfd2ebfa9fd1eaf393d14e443dc760 Stats: 20 lines in 1 file changed: 0 ins; 20 del; 0 mod 8314248: Remove HotSpotConstantPool::isResolvedDynamicInvoke Reviewed-by: thartmann, dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/15283 From kvn at openjdk.org Tue Aug 15 16:01:07 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 15 Aug 2023 16:01:07 GMT Subject: RFR: 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 13:42:48 GMT, Christian Hagedorn wrote: > In the testcase, we try to initialize Assertion Predicates from the templates which have an `Opaque4` node. However, the code finds an unrelated `Opaque4` node added by an intrinsic. This, obviously, will not guard any `OpaqueLoop*` nodes required for Template Assertion Predicates and we fail with the assertion. > > While splitting some changes away from the main fix of [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981) for [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636), I wrongly removed the check in loop peeling if an `Opaque4` node belongs to an `If` that also shares the uncommon trap with the Parse Predicate (this is not required in the main fix anymore because Template Assertion Predicates will always only have `HaltNodes` - but optimizing this here is wrong/too early). > > The fix re-establishes the check for the uncommon trap. > > Thanks, > Christian Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15290#pullrequestreview-1578915487 From shade at openjdk.org Tue Aug 15 16:25:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Aug 2023 16:25:08 GMT Subject: RFR: 8314268: Missing include in assembler_riscv.hpp In-Reply-To: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> References: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> Message-ID: On Tue, 15 Aug 2023 11:50:17 GMT, Robbin Ehn wrote: > Hello, please consider. It is a bit odd to do this without the actual bug, but I guess it is fine to proactively maintain the includes of headers that define the symbols we use in header definitions. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15285#pullrequestreview-1578958919 From sviswanathan at openjdk.org Tue Aug 15 17:16:09 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 15 Aug 2023 17:16:09 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 12:18:02 GMT, Christian Hagedorn wrote: > This reapplies JDK-8308682 (i.e. reverse the backout done with [JDK-8313756](https://bugs.openjdk.org/browse/JDK-8313756)) but attributes it correctly to @theRealAph together with @adinn and @sviswa7 as additional reviewers. > > The redo applied cleanly. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15267#pullrequestreview-1579039518 From coleenp at openjdk.org Tue Aug 15 18:01:37 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 15 Aug 2023 18:01:37 GMT Subject: RFR: 8314247: JVMCI: expected int64_t but JavaThread::_held_monitor_count is of type intx Message-ID: Fix graal error with @iklam's fix of making intx and int64_t synonyms. Tested with new test in jvmci tests. ------------- Commit messages: - 8314247: JVMCI: expected int64_t but JavaThread::_held_monitor_count is of type intx Changes: https://git.openjdk.org/jdk/pull/15295/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15295&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314247 Stats: 13 lines in 2 files changed: 10 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15295/head:pull/15295 PR: https://git.openjdk.org/jdk/pull/15295 From iklam at openjdk.org Tue Aug 15 18:17:09 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 15 Aug 2023 18:17:09 GMT Subject: RFR: 8314247: JVMCI: expected int64_t but JavaThread::_held_monitor_count is of type intx In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 17:54:57 GMT, Coleen Phillimore wrote: > Fix graal error with @iklam's fix of making intx and int64_t synonyms. Tested with new test in jvmci tests. LGTM ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15295#pullrequestreview-1579134378 From rhalade at openjdk.org Tue Aug 15 18:19:09 2023 From: rhalade at openjdk.org (Rajan Halade) Date: Tue, 15 Aug 2023 18:19:09 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 12:18:02 GMT, Christian Hagedorn wrote: > This reapplies JDK-8308682 (i.e. reverse the backout done with [JDK-8313756](https://bugs.openjdk.org/browse/JDK-8313756)) but attributes it correctly to @theRealAph together with @adinn and @sviswa7 as additional reviewers. > > The redo applied cleanly. > > Thanks, > Christian Marked as reviewed by rhalade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15267#pullrequestreview-1579137715 From coleenp at openjdk.org Tue Aug 15 18:35:05 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 15 Aug 2023 18:35:05 GMT Subject: RFR: 8314247: JVMCI: expected int64_t but JavaThread::_held_monitor_count is of type intx In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 17:54:57 GMT, Coleen Phillimore wrote: > Fix graal error with @iklam's fix of making intx and int64_t synonyms. Tested with new test in jvmci tests. @dougxc Is this okay with you? Otherwise, I'm not really sure how you want to fix it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15295#issuecomment-1679409937 From duke at openjdk.org Tue Aug 15 19:17:48 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 15 Aug 2023 19:17:48 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Fix preservation of NaNs for floats and doubles ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/58467994..07349ec3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=20-21 Stats: 63 lines in 4 files changed: 4 ins; 56 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Tue Aug 15 19:17:48 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 15 Aug 2023 19:17:48 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 22:29:37 GMT, Srinivas Vamsi Parasa wrote: >>> Also need to handle arraySort in file: share/gc/shenandoah/c2/shenandoahSupport.cpp, function: ShenandoahBarrierC2Support::verify around line 3000. >> >> Updated the code in ShenandoahBarrierC2Support as suggested. > >> @vamsi-parasa With fastdebug build I see the following error: Internal Error (jdk/src/hotspot/share/opto/escape.cpp:1196), pid=3543536, tid=3543559 fatal error: EA unexpected CallLeaf arraysort_stub >> >> Please take a look. > > This was fixed as well. > @vamsi-parasa We need to preserve NaNs. The base (https://github.com/intel/x86-simd-sort) algorithm used doesn't preserve NaNs. Thanks for catching this Sandhya! This is fixed now in the most recent commit. A preprocessing step is added to move the NaNs to the top of the array. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1679459713 From duke at openjdk.org Tue Aug 15 19:26:11 2023 From: duke at openjdk.org (iaroslavski) Date: Tue, 15 Aug 2023 19:26:11 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: Message-ID: <6Q6Iir5vHOVTSn5Not2MlqgHgRJq9EhKkXpW5kGaGkw=.6ce83c89-aabe-4d9b-a6c3-9bba23e49492@github.com> On Tue, 15 Aug 2023 19:12:47 GMT, Srinivas Vamsi Parasa wrote: >>> @vamsi-parasa With fastdebug build I see the following error: Internal Error (jdk/src/hotspot/share/opto/escape.cpp:1196), pid=3543536, tid=3543559 fatal error: EA unexpected CallLeaf arraysort_stub >>> >>> Please take a look. >> >> This was fixed as well. > >> @vamsi-parasa We need to preserve NaNs. The base (https://github.com/intel/x86-simd-sort) algorithm used doesn't preserve NaNs. > > Thanks for catching this Sandhya! This is fixed now in the most recent commit. A preprocessing step is added to move the NaNs to the top of the array. Hello @vamsi-parasa ! Do you process negative zeros properly? From one hand -0.0f equals to 0.0f, but negative zeros must be placed before 0.0f. See javadoc for Arrays.sort(float[] a). The same situation with -0.0d (double type). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1679474204 From duke at openjdk.org Tue Aug 15 20:04:20 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 15 Aug 2023 20:04:20 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: <6Q6Iir5vHOVTSn5Not2MlqgHgRJq9EhKkXpW5kGaGkw=.6ce83c89-aabe-4d9b-a6c3-9bba23e49492@github.com> References: <6Q6Iir5vHOVTSn5Not2MlqgHgRJq9EhKkXpW5kGaGkw=.6ce83c89-aabe-4d9b-a6c3-9bba23e49492@github.com> Message-ID: On Tue, 15 Aug 2023 19:23:24 GMT, iaroslavski wrote: >>> @vamsi-parasa We need to preserve NaNs. The base (https://github.com/intel/x86-simd-sort) algorithm used doesn't preserve NaNs. >> >> Thanks for catching this Sandhya! This is fixed now in the most recent commit. A preprocessing step is added to move the NaNs to the top of the array. > > Hello @vamsi-parasa ! > > Do you process negative zeros properly? From one hand -0.0f equals to 0.0f, but negative zeros must be placed before 0.0f. > See javadoc for Arrays.sort(float[] a). The same situation with -0.0d (double type). @iaroslavski Hello Vladimir, Thank you for your comments and suggestions! Please see the answers below: 1) A single method is being used for int/float/long/double currently and is being intrinsified. This helps to handle the call to the AVX512 sort stub in a unified manner without duplication of the code. 2) The long offset is being used during compilation and passing arguments to the AVX512 sort stub. It's true that it's not being used on the Java side. This is along the lines of vectorizedMismatch for MemorySegment. In future, having this API will make it easier to enable sort for memory segments especially those backed by native heap. 3) Thank you for suggesting about parallelSort ! So far, this optimization for intended for the serial sort. Will spend some time to analyze the changes needed to support parallelSort and will provide an update. 4) True, it would be helpful to have one unified benchmark. Will run the benchmark and provide the performance numbers for AVX512 sort and the Arrays.sort() 5) Sure, will also run the above benchmark and do a performance comaparison of AVX512 sort with enhanced DPQS (radix sort) as well. Thank you, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1679526230 From duke at openjdk.org Tue Aug 15 20:08:18 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 15 Aug 2023 20:08:18 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: <6Q6Iir5vHOVTSn5Not2MlqgHgRJq9EhKkXpW5kGaGkw=.6ce83c89-aabe-4d9b-a6c3-9bba23e49492@github.com> References: <6Q6Iir5vHOVTSn5Not2MlqgHgRJq9EhKkXpW5kGaGkw=.6ce83c89-aabe-4d9b-a6c3-9bba23e49492@github.com> Message-ID: On Tue, 15 Aug 2023 19:23:24 GMT, iaroslavski wrote: >>> @vamsi-parasa We need to preserve NaNs. The base (https://github.com/intel/x86-simd-sort) algorithm used doesn't preserve NaNs. >> >> Thanks for catching this Sandhya! This is fixed now in the most recent commit. A preprocessing step is added to move the NaNs to the top of the array. > > Hello @vamsi-parasa ! > > Do you process negative zeros properly? From one hand -0.0f equals to 0.0f, but negative zeros must be placed before 0.0f. > See javadoc for Arrays.sort(float[] a). The same situation with -0.0d (double type). @iaroslavski Hello Vladimir, The algorithm is handling zeros correctly. It places -0.0 before +0.0. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1679532729 From dnsimon at openjdk.org Tue Aug 15 20:42:08 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 15 Aug 2023 20:42:08 GMT Subject: RFR: 8314247: JVMCI: expected int64_t but JavaThread::_held_monitor_count is of type intx In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 17:54:57 GMT, Coleen Phillimore wrote: > Fix graal error with @iklam's fix of making intx and int64_t synonyms. Tested with new test in jvmci tests. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotVMConfigAccess.java line 325: > 323: > 324: // Make sure the native type is still the type we expect. > 325: if (cppType != null && !typeEquals(cppType, entry.type)) { We have conditional code in Graal (e.g. [here](https://github.com/oracle/graal/blob/5097b1dabf01fd6d2ea1ea3b470060a138d49fa2/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/hotspot/GraalHotSpotVMConfig.java#L392-L405)) to deal with type changes in vmstructs so there's no need to fix it here. I'd prefer to not have this code do type aliasing like this as it might hide other changes we need to make in Graal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15295#discussion_r1295089272 From dholmes at openjdk.org Tue Aug 15 20:51:19 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 15 Aug 2023 20:51:19 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension [v4] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 17:26:28 GMT, Tom Rodriguez wrote: >> Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. > > Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: > > LibJVMCICompilerThreadHidden should just be EXPERIMENTAL src/hotspot/share/jvmci/jvmci_globals.hpp line 160: > 158: "[default: ./" LIBJVMCI_ERR_FILE "] (%p replaced with pid)") \ > 159: \ > 160: product(bool, LibJVMCICompilerThreadHidden, true, EXPERIMENTAL, \ This sounds more like a diagnostic flag than experimental. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14799#discussion_r1295099859 From coleenp at openjdk.org Tue Aug 15 21:21:19 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 15 Aug 2023 21:21:19 GMT Subject: Withdrawn: 8314247: JVMCI: expected int64_t but JavaThread::_held_monitor_count is of type intx In-Reply-To: References: Message-ID: <2GdofM4avQxfvB0JdSGe13rGjE9WyUVkW6YtqZ8eHG8=.47cd2b8c-c1d5-4922-a781-0d188bec5c55@github.com> On Tue, 15 Aug 2023 17:54:57 GMT, Coleen Phillimore wrote: > Fix graal error with @iklam's fix of making intx and int64_t synonyms. Tested with new test in jvmci tests. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/15295 From coleenp at openjdk.org Tue Aug 15 21:21:18 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 15 Aug 2023 21:21:18 GMT Subject: RFR: 8314247: JVMCI: expected int64_t but JavaThread::_held_monitor_count is of type intx In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 20:39:47 GMT, Doug Simon wrote: >> Fix graal error with @iklam's fix of making intx and int64_t synonyms. Tested with new test in jvmci tests. > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotVMConfigAccess.java line 325: > >> 323: >> 324: // Make sure the native type is still the type we expect. >> 325: if (cppType != null && !typeEquals(cppType, entry.type)) { > > We have conditional code in Graal (e.g. [here](https://github.com/oracle/graal/blob/5097b1dabf01fd6d2ea1ea3b470060a138d49fa2/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/hotspot/GraalHotSpotVMConfig.java#L392-L405)) to deal with type changes in vmstructs so there's no need to fix it here. I'd prefer to not have this code do type aliasing like this as it might hide other changes we need to make in Graal. Okay, if you can fix it in Graal, that would be great. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15295#discussion_r1295125128 From never at openjdk.org Tue Aug 15 21:32:15 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 15 Aug 2023 21:32:15 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension cause various failures Message-ID: I accidentally reversed the default in my refactor. Testing is in progress. ------------- Commit messages: - 8311557: [JVMCI] deadlock with JVMTI thread suspension cause various failures Changes: https://git.openjdk.org/jdk/pull/15300/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15300&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311557 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15300.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15300/head:pull/15300 PR: https://git.openjdk.org/jdk/pull/15300 From iklam at openjdk.org Tue Aug 15 22:24:29 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 15 Aug 2023 22:24:29 GMT Subject: RFR: 8314249: Refactor handling of invokedynamic in JVMCI ConstantPool Message-ID: <6pkU1viiXcWvXP5KwsD9-8HYEy_SoPkR6-Ea4l9OWhs=.dba83e93-e1e8-4778-b2af-f5a1b7792a73@github.com> This PR is part of the clean up JVMCI to track [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993), where the constant pool cache is being removed (as of now, only method references use the CpCache). 1. `rawIndexToConstantPoolIndex()` is used only for the `invokedynamic` bytecode. It should be renamed to `indyIndexConstantPoolIndex()` 2. `rawIndexToConstantPoolCacheIndex()` should not be called for the `invokedynamic` bytecode, which doesn't use cpCache entries after [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995). 3. Some `cpi` parameters should be renamed to `rawIndex` or `which` 4. Added a test case for `ConstantPool.lookupAppendix()`, which was not tested in the JDK repo. I added comments about the 4 types of indices used in HotSpotConstantPool.java: `cpi`, `rawIndex`, `cpci` and `which`. The latter two types will be removed after [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993) is complete. Note that there are still some incorrect use of `cpi` in the implementation and test cases. Those will be cleaned up in [JDK-8314172](https://bugs.openjdk.org/browse/JDK-8314172) ------------- Commit messages: - fixed whitespace - fixed test - added test case for ConstantPool.lookupAppendix; other code touch by this PR already have test cases - fixed comments - Added docs about the names we use for indices: cpi, rawIndex, cpci and which - 8314249: Refactor handling of invokedynamic in JVMCI ConstantPool Changes: https://git.openjdk.org/jdk/pull/15297/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15297&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314249 Stats: 188 lines in 6 files changed: 120 ins; 23 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/15297.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15297/head:pull/15297 PR: https://git.openjdk.org/jdk/pull/15297 From never at openjdk.org Tue Aug 15 23:31:20 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 15 Aug 2023 23:31:20 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension [v4] In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 20:48:48 GMT, David Holmes wrote: >> Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: >> >> LibJVMCICompilerThreadHidden should just be EXPERIMENTAL > > src/hotspot/share/jvmci/jvmci_globals.hpp line 160: > >> 158: "[default: ./" LIBJVMCI_ERR_FILE "] (%p replaced with pid)") \ >> 159: \ >> 160: product(bool, LibJVMCICompilerThreadHidden, true, EXPERIMENTAL, \ > > This sounds more like a diagnostic flag than experimental. It kind of is but because of the way the way [JVMCIGlobals::enable_jvmci_product_mode ](https://github.com/tkrodriguez/jdk/blob/tkr-hidden-fix/src/hotspot/share/jvmci/jvmci_globals.cpp#L169) works it seems easier to just make it experimental that becomes product like all the other special JVMCI flags. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14799#discussion_r1295236623 From cjplummer at openjdk.org Wed Aug 16 02:02:05 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 16 Aug 2023 02:02:05 GMT Subject: RFR: 8314324: 8311557: [JVMCI] deadlock with JVMTI thread suspension cause various failures In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 21:25:37 GMT, Tom Rodriguez wrote: > I accidentally reversed the default in my refactor. Testing is in progress. Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15300#pullrequestreview-1579690548 From fyang at openjdk.org Wed Aug 16 03:37:17 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 16 Aug 2023 03:37:17 GMT Subject: RFR: 8314268: Missing include in assembler_riscv.hpp In-Reply-To: References: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> Message-ID: <4IyW3l1BpJa49VuFbu5ETdyZEfjFz2x-N_OdG5EPADk=.40ba84b5-3f74-4c5c-b7af-05ddf388391d@github.com> On Tue, 15 Aug 2023 13:28:23 GMT, Robbin Ehn wrote: > Just WIP local changes were I included assembler_riscv.hpp for Assebmler::LMUL, and notice this. I guess you might want to include "asm/assembler.inline.hpp" instead which transitively includes assembler_riscv.hpp? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15285#issuecomment-1679907790 From thartmann at openjdk.org Wed Aug 16 05:12:08 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Aug 2023 05:12:08 GMT Subject: RFR: 8314324: "8311557: [JVMCI] deadlock with JVMTI thread suspension" causes various failures In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 21:25:37 GMT, Tom Rodriguez wrote: > I accidentally reversed the default in my refactor. Testing is in progress. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15300#pullrequestreview-1579811775 From thartmann at openjdk.org Wed Aug 16 05:13:06 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 16 Aug 2023 05:13:06 GMT Subject: RFR: 8314324: "8311557: [JVMCI] deadlock with JVMTI thread suspension" causes various failures In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 21:25:37 GMT, Tom Rodriguez wrote: > I accidentally reversed the default in my refactor. Testing is in progress. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15300#pullrequestreview-1579811775 From never at openjdk.org Wed Aug 16 06:10:15 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 16 Aug 2023 06:10:15 GMT Subject: Integrated: 8314324: "8311557: [JVMCI] deadlock with JVMTI thread suspension" causes various failures In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 21:25:37 GMT, Tom Rodriguez wrote: > I accidentally reversed the default in my refactor. Testing is in progress. This pull request has now been integrated. Changeset: e1fdef56 Author: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/e1fdef56135c2987b128884ef632b64c32dd674a Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8314324: "8311557: [JVMCI] deadlock with JVMTI thread suspension" causes various failures Reviewed-by: cjplummer, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/15300 From never at openjdk.org Wed Aug 16 06:10:14 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 16 Aug 2023 06:10:14 GMT Subject: RFR: 8314324: "8311557: [JVMCI] deadlock with JVMTI thread suspension" causes various failures In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 21:25:37 GMT, Tom Rodriguez wrote: > I accidentally reversed the default in my refactor. Testing is in progress. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15300#issuecomment-1680012992 From chagedorn at openjdk.org Wed Aug 16 07:01:22 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 16 Aug 2023 07:01:22 GMT Subject: RFR: 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 13:42:48 GMT, Christian Hagedorn wrote: > In the testcase, we try to initialize Assertion Predicates from the templates which have an `Opaque4` node. However, the code finds an unrelated `Opaque4` node added by an intrinsic. This, obviously, will not guard any `OpaqueLoop*` nodes required for Template Assertion Predicates and we fail with the assertion. > > While splitting some changes away from the main fix of [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981) for [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636), I wrongly removed the check in loop peeling if an `Opaque4` node belongs to an `If` that also shares the uncommon trap with the Parse Predicate (this is not required in the main fix anymore because Template Assertion Predicates will always only have `HaltNodes` - but optimizing this here is wrong/too early). > > The fix re-establishes the check for the uncommon trap. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15290#issuecomment-1680065905 From chagedorn at openjdk.org Wed Aug 16 07:01:24 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 16 Aug 2023 07:01:24 GMT Subject: Integrated: 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 13:42:48 GMT, Christian Hagedorn wrote: > In the testcase, we try to initialize Assertion Predicates from the templates which have an `Opaque4` node. However, the code finds an unrelated `Opaque4` node added by an intrinsic. This, obviously, will not guard any `OpaqueLoop*` nodes required for Template Assertion Predicates and we fail with the assertion. > > While splitting some changes away from the main fix of [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981) for [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636), I wrongly removed the check in loop peeling if an `Opaque4` node belongs to an `If` that also shares the uncommon trap with the Parse Predicate (this is not required in the main fix anymore because Template Assertion Predicates will always only have `HaltNodes` - but optimizing this here is wrong/too early). > > The fix re-establishes the check for the uncommon trap. > > Thanks, > Christian This pull request has now been integrated. Changeset: 0b12480d Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/0b12480de88dc1d2a8d7ca3aa2597be3df1ebde1 Stats: 69 lines in 2 files changed: 68 ins; 1 del; 0 mod 8314233: C2: assert(assertion_predicate_has_loop_opaque_node(iff)) failed: unexpected Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15290 From adinn at openjdk.org Wed Aug 16 07:07:11 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 16 Aug 2023 07:07:11 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 12:18:02 GMT, Christian Hagedorn wrote: > This reapplies JDK-8308682 (i.e. reverse the backout done with [JDK-8313756](https://bugs.openjdk.org/browse/JDK-8313756)) but attributes it correctly to @theRealAph together with @adinn and @sviswa7 as additional reviewers. > > The redo applied cleanly. > > Thanks, > Christian Marked as reviewed by adinn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15267#pullrequestreview-1579929933 From dnsimon at openjdk.org Wed Aug 16 07:17:07 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 16 Aug 2023 07:17:07 GMT Subject: RFR: 8314249: Refactor handling of invokedynamic in JVMCI ConstantPool In-Reply-To: <6pkU1viiXcWvXP5KwsD9-8HYEy_SoPkR6-Ea4l9OWhs=.dba83e93-e1e8-4778-b2af-f5a1b7792a73@github.com> References: <6pkU1viiXcWvXP5KwsD9-8HYEy_SoPkR6-Ea4l9OWhs=.dba83e93-e1e8-4778-b2af-f5a1b7792a73@github.com> Message-ID: On Tue, 15 Aug 2023 20:03:20 GMT, Ioi Lam wrote: > This PR is part of the clean up JVMCI to track [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993), where the constant pool cache is being removed (as of now, only method references use the CpCache). > > 1. `rawIndexToConstantPoolIndex()` is used only for the `invokedynamic` bytecode. It should be renamed to `indyIndexConstantPoolIndex()` > > 2. `rawIndexToConstantPoolCacheIndex()` should not be called for the `invokedynamic` bytecode, which doesn't use cpCache entries after [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995). > > 3. Some `cpi` parameters should be renamed to `rawIndex` or `which` > > 4. Added a test case for `ConstantPool.lookupAppendix()`, which was not tested in the JDK repo. > > I added comments about the 4 types of indices used in HotSpotConstantPool.java: `cpi`, `rawIndex`, `cpci` and `which`. The latter two types will be removed after [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993) is complete. > > Note that there are still some incorrect use of `cpi` in the implementation and test cases. Those will be cleaned up in [JDK-8314172](https://bugs.openjdk.org/browse/JDK-8314172) Marked as reviewed by dnsimon (Reviewer). Thanks a lot for this cleanup and adding the extra tests. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 565: > 563: * Gets the appendix object (if any) associated with the entry identified by {@code which}. > 564: * > 565: * @param which if negative, is treated as an encoded indy index for INVONEDYNAMIC; INVONEDYNAMIC -> INVOKEDYNAMIC src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 60: > 58: * > 59: * > 60: * Note that {@code cpci} and {@code which} are used only in the HotSpot-specific implementation. They are not used by the public iterface in jdk.vm.ci.meta.*. iterface -> iterface src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 179: > 177: * > 178: * @param index if {@code opcode} is -1, {@code index} is a constant pool index. Otherwise {@code opcode} > 179: * must be ${code Bytecodes.INVOKEDYNAMIC}, and {@code index} must be the operand of that `${code Bytecodes.INVOKEDYNAMIC}` -> `{@code INVOKEDYNAMIC}` (in numerous places) ------------- PR Review: https://git.openjdk.org/jdk/pull/15297#pullrequestreview-1579923317 PR Comment: https://git.openjdk.org/jdk/pull/15297#issuecomment-1680085580 PR Review Comment: https://git.openjdk.org/jdk/pull/15297#discussion_r1295459900 PR Review Comment: https://git.openjdk.org/jdk/pull/15297#discussion_r1295461310 PR Review Comment: https://git.openjdk.org/jdk/pull/15297#discussion_r1295466901 From epeter at openjdk.org Wed Aug 16 07:18:18 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 16 Aug 2023 07:18:18 GMT Subject: RFR: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally [v2] In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 14:43:33 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> review suggestions > > Looks good to me. @TobiHartmann @chhagedorn thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15274#issuecomment-1680085962 From epeter at openjdk.org Wed Aug 16 07:18:19 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 16 Aug 2023 07:18:19 GMT Subject: Integrated: 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 14:57:39 GMT, Emanuel Peter wrote: > **Problem** > > In my recent fix of [JDK-8306302](https://bugs.openjdk.org/browse/JDK-8306302) I forgot to check that the `Bool` node in the `Cmp -> Bool -> CMove` complex must have the same test value for all `Bool` nodes in the pack. Without that check, we fail to see the difference between: > > https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L354-L357 > > https://github.com/openjdk/jdk/blob/6d545b1580e0b3df9bc01bd64bd1a616c6ceeb9b/test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java#L384-L387 > > While the first hand-unrolled example has the same test value (tl) in both lines (packing ok) the second example has different test values (lt and le). Before this fix we would just assume they have the same test value, and therefore also use lt for the second line as a consequence. That can lead to wrong results. > > **Solution** > > `SuperWord::isomorphic` should return `false` if two `Bool` nodes do not have the same test value. That ensures that only `Bool` nodes with the same test value will ever be packed, since isomorphism is a requirement for packing. > > In addition, I also added verification code in `SuperWord::output`, just before we turn the `Cmp -> Bool -> CMove` scalar nodes into vector nodes. > > **Testing** > > Added Regression Test. Ran Tier1-6 + stress-testing. This pull request has now been integrated. Changeset: d46f0fb3 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/d46f0fb31888db75f5b2b78a162fec16dfc5d0d9 Stats: 174 lines in 2 files changed: 164 ins; 0 del; 10 mod 8313720: C2 SuperWord: wrong result with -XX:+UseVectorCmov -XX:+UseCMoveUnconditionally Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/15274 From chagedorn at openjdk.org Wed Aug 16 07:23:20 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 16 Aug 2023 07:23:20 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: References: Message-ID: On Wed, 16 Aug 2023 07:04:17 GMT, Andrew Dinn wrote: >> This reapplies JDK-8308682 (i.e. reverse the backout done with [JDK-8313756](https://bugs.openjdk.org/browse/JDK-8313756)) but attributes it correctly to @theRealAph together with @adinn and @sviswa7 as additional reviewers. >> >> The redo applied cleanly. >> >> Thanks, >> Christian > > Marked as reviewed by adinn (Reviewer). Thanks @adinn, @theRealAph, @sviswa7, and @rhalade for approving this REDO! Sanity testing looked good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15267#issuecomment-1680092285 From chagedorn at openjdk.org Wed Aug 16 07:23:21 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 16 Aug 2023 07:23:21 GMT Subject: Integrated: 8313760: [REDO] Enhance AES performance In-Reply-To: References: Message-ID: <6l8zsntTedi6U40uKI2-6Wc4ixb3sJRt01YkJhrYqIM=.76d27190-3d9b-43a3-b9c8-1266c921bcd9@github.com> On Mon, 14 Aug 2023 12:18:02 GMT, Christian Hagedorn wrote: > This reapplies JDK-8308682 (i.e. reverse the backout done with [JDK-8313756](https://bugs.openjdk.org/browse/JDK-8313756)) but attributes it correctly to @theRealAph together with @adinn and @sviswa7 as additional reviewers. > > The redo applied cleanly. > > Thanks, > Christian This pull request has now been integrated. Changeset: 49ddb199 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/49ddb1997256d9fb7149d274d8afa18f7c2609a4 Stats: 107 lines in 7 files changed: 70 ins; 1 del; 36 mod 8313760: [REDO] Enhance AES performance Co-authored-by: Andrew Haley Reviewed-by: adinn, aph, sviswanathan, rhalade, kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/15267 From rehn at openjdk.org Wed Aug 16 07:58:07 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 16 Aug 2023 07:58:07 GMT Subject: RFR: 8314268: Missing include in assembler_riscv.hpp In-Reply-To: <4IyW3l1BpJa49VuFbu5ETdyZEfjFz2x-N_OdG5EPADk=.40ba84b5-3f74-4c5c-b7af-05ddf388391d@github.com> References: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> <4IyW3l1BpJa49VuFbu5ETdyZEfjFz2x-N_OdG5EPADk=.40ba84b5-3f74-4c5c-b7af-05ddf388391d@github.com> Message-ID: On Wed, 16 Aug 2023 03:32:16 GMT, Fei Yang wrote: > > Just WIP local changes were I included assembler_riscv.hpp for Assebmler::LMUL, and notice this. > > I guess you might want to include "asm/assembler.inline.hpp" instead which would bring assembler_riscv.hpp? It as in another hpp file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15285#issuecomment-1680136491 From rehn at openjdk.org Wed Aug 16 08:03:14 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 16 Aug 2023 08:03:14 GMT Subject: RFR: 8314268: Missing include in assembler_riscv.hpp In-Reply-To: References: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> Message-ID: On Tue, 15 Aug 2023 16:22:15 GMT, Aleksey Shipilev wrote: > It is a bit odd to do this without the actual bug, but I guess it is fine to proactively maintain the includes of headers that define the symbols we use in header definitions. I have done several, e.g. https://bugs.openjdk.org/browse/JDK-8226227 And other people also such as https://bugs.openjdk.org/browse/JDK-8230888 So I don't find it 'unusual'. Thanks for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15285#issuecomment-1680142021 From aph at openjdk.org Wed Aug 16 09:27:23 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 16 Aug 2023 09:27:23 GMT Subject: RFR: 8313760: [REDO] Enhance AES performance In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 12:18:02 GMT, Christian Hagedorn wrote: > This reapplies JDK-8308682 (i.e. reverse the backout done with [JDK-8313756](https://bugs.openjdk.org/browse/JDK-8313756)) but attributes it correctly to @theRealAph together with @adinn and @sviswa7 as additional reviewers. > > The redo applied cleanly. > > Thanks, > Christian It's come through as co-authored-by instead of authored-by, which is wrong because I wrote every single byte, but never mind. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15267#issuecomment-1680263474 From duke at openjdk.org Wed Aug 16 17:50:16 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 16 Aug 2023 17:50:16 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: <6Q6Iir5vHOVTSn5Not2MlqgHgRJq9EhKkXpW5kGaGkw=.6ce83c89-aabe-4d9b-a6c3-9bba23e49492@github.com> References: <6Q6Iir5vHOVTSn5Not2MlqgHgRJq9EhKkXpW5kGaGkw=.6ce83c89-aabe-4d9b-a6c3-9bba23e49492@github.com> Message-ID: On Tue, 15 Aug 2023 19:23:24 GMT, iaroslavski wrote: >>> @vamsi-parasa We need to preserve NaNs. The base (https://github.com/intel/x86-simd-sort) algorithm used doesn't preserve NaNs. >> >> Thanks for catching this Sandhya! This is fixed now in the most recent commit. A preprocessing step is added to move the NaNs to the top of the array. > > Hello @vamsi-parasa ! > > Do you process negative zeros properly? From one hand -0.0f equals to 0.0f, but negative zeros must be placed before 0.0f. > See javadoc for Arrays.sort(float[] a). The same situation with -0.0d (double type). @iaroslavski Hello Vladimir, Please see the `Arrays.sort()` performance comparison between the current **Java baseline (DPQS)** vs. **AVX512 sort intrinsic** (this PR) using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568) - The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort` - Please see the performance numbers below. The scores shown are the average time (us/op), thus lower is better. The last column towards the right shows the speedup. - For the majority of the cases, it can be seen that AVX512 sort gives good speedup. | Benchmark | Mode | Size | Baseline DPQS (us/op) | AVX512 Sort (us/op) | Spedup | | --- | --- | --- | --- | --- | --- | | ArraysSortTests.Double.testSort | RANDOM | 800 | 7.8 | 2.6 | 3.0 | | ArraysSortTests.Double.testSort | RANDOM | 7000 | 238.3 | 28.0 | 8.5 | | ArraysSortTests.Double.testSort | RANDOM | 50000 | 2217.9 | 238.4 | 9.3 | | ArraysSortTests.Double.testSort | RANDOM | 300000 | 15096.9 | 1721.6 | 8.8 | | ArraysSortTests.Double.testSort | RANDOM | 2000000 | 117451.3 | 13854.8 | 8.5 | | ArraysSortTests.Double.testSort | REPEATED | 800 | 2.3 | 1.6 | 1.4 | | ArraysSortTests.Double.testSort | REPEATED | 7000 | 42.7 | 37.2 | 1.1 | | ArraysSortTests.Double.testSort | REPEATED | 50000 | 458.4 | 386.3 | 1.2 | | ArraysSortTests.Double.testSort | REPEATED | 300000 | 2933.8 | 342.7 | 8.6 | | ArraysSortTests.Double.testSort | REPEATED | 2000000 | 18759.5 | 2838.9 | 6.6 | | ArraysSortTests.Double.testSort | STAGGER | 800 | 2.5 | 2.7 | 0.9 | | ArraysSortTests.Double.testSort | STAGGER | 7000 | 28.2 | 28.0 | 1.0 | | ArraysSortTests.Double.testSort | STAGGER | 50000 | 147.2 | 224.0 | 0.7 | | ArraysSortTests.Double.testSort | STAGGER | 300000 | 1030.6 | 1566.8 | 0.7 | | ArraysSortTests.Double.testSort | STAGGER | 2000000 | 8498.2 | 12668.3 | 0.7 | | ArraysSortTests.Double.testSort | SHUFFLE | 800 | 4.7 | 2.7 | 1.7 | | ArraysSortTests.Double.testSort | SHUFFLE | 7000 | 86.7 | 27.5 | 3.2 | | ArraysSortTests.Double.testSort | SHUFFLE | 50000 | 844.7 | 233.5 | 3.6 | | ArraysSortTests.Double.testSort | SHUFFLE | 300000 | 5764.4 | 1569.6 | 3.7 | | ArraysSortTests.Double.testSort | SHUFFLE | 2000000 | 37797.5 | 12963.4 | 2.9 | | ArraysSortTests.Float.testSort | RANDOM | 800 | 6.7 | 1.8 | 3.7 | | ArraysSortTests.Float.testSort | RANDOM | 7000 | 239.2 | 18.8 | 12.7 | | ArraysSortTests.Float.testSort | RANDOM | 50000 | 2204.3 | 177.2 | 12.4 | | ArraysSortTests.Float.testSort | RANDOM | 300000 | 15123.9 | 1342.3 | 11.3 | | ArraysSortTests.Float.testSort | RANDOM | 2000000 | 116263.9 | 10087.5 | 11.5 | | ArraysSortTests.Float.testSort | REPEATED | 800 | 2.3 | 0.8 | 2.7 | | ArraysSortTests.Float.testSort | REPEATED | 7000 | 28.2 | 6.6 | 4.3 | | ArraysSortTests.Float.testSort | REPEATED | 50000 | 469.3 | 34.9 | 13.4 | | ArraysSortTests.Float.testSort | REPEATED | 300000 | 2996.0 | 207.1 | 14.5 | | ArraysSortTests.Float.testSort | REPEATED | 2000000 | 19391.7 | 1503.9 | 12.9 | | ArraysSortTests.Float.testSort | STAGGER | 800 | 2.3 | 1.7 | 1.4 | | ArraysSortTests.Float.testSort | STAGGER | 7000 | 20.0 | 19.0 | 1.1 | | ArraysSortTests.Float.testSort | STAGGER | 50000 | 140.5 | 153.6 | 0.9 | | ArraysSortTests.Float.testSort | STAGGER | 300000 | 831.9 | 1165.8 | 0.7 | | ArraysSortTests.Float.testSort | STAGGER | 2000000 | 5591.0 | 8600.0 | 0.7 | | ArraysSortTests.Float.testSort | SHUFFLE | 800 | 4.8 | 1.7 | 2.8 | | ArraysSortTests.Float.testSort | SHUFFLE | 7000 | 85.1 | 18.5 | 4.6 | | ArraysSortTests.Float.testSort | SHUFFLE | 50000 | 851.8 | 156.4 | 5.4 | | ArraysSortTests.Float.testSort | SHUFFLE | 300000 | 5617.5 | 1204.3 | 4.7 | | ArraysSortTests.Float.testSort | SHUFFLE | 2000000 | 37380.1 | 9040.1 | 4.1 | | ArraysSortTests.Int.testSort | RANDOM | 800 | 6.3 | 1.3 | 4.9 | | ArraysSortTests.Int.testSort | RANDOM | 7000 | 209.9 | 14.2 | 14.7 | | ArraysSortTests.Int.testSort | RANDOM | 50000 | 2037.8 | 153.4 | 13.3 | | ArraysSortTests.Int.testSort | RANDOM | 300000 | 14119.9 | 1139.2 | 12.4 | | ArraysSortTests.Int.testSort | RANDOM | 2000000 | 111777.9 | 8509.0 | 13.1 | | ArraysSortTests.Int.testSort | REPEATED | 800 | 1.6 | 0.5 | 3.3 | | ArraysSortTests.Int.testSort | REPEATED | 7000 | 23.0 | 3.9 | 5.8 | | ArraysSortTests.Int.testSort | REPEATED | 50000 | 311.2 | 20.8 | 15.0 | | ArraysSortTests.Int.testSort | REPEATED | 300000 | 1961.6 | 138.8 | 14.1 | | ArraysSortTests.Int.testSort | REPEATED | 2000000 | 11834.8 | 732.2 | 16.2 | | ArraysSortTests.Int.testSort | STAGGER | 800 | 1.7 | 1.2 | 1.5 | | ArraysSortTests.Int.testSort | STAGGER | 7000 | 15.9 | 14.2 | 1.1 | | ArraysSortTests.Int.testSort | STAGGER | 50000 | 96.7 | 116.7 | 0.8 | | ArraysSortTests.Int.testSort | STAGGER | 300000 | 591.1 | 949.2 | 0.6 | | ArraysSortTests.Int.testSort | STAGGER | 2000000 | 4681.4 | 7195.2 | 0.7 | | ArraysSortTests.Int.testSort | SHUFFLE | 800 | 4.3 | 1.2 | 3.5 | | ArraysSortTests.Int.testSort | SHUFFLE | 7000 | 82.8 | 13.9 | 6.0 | | ArraysSortTests.Int.testSort | SHUFFLE | 50000 | 769.5 | 138.3 | 5.6 | | ArraysSortTests.Int.testSort | SHUFFLE | 300000 | 5076.7 | 1030.3 | 4.9 | | ArraysSortTests.Int.testSort | SHUFFLE | 2000000 | 33627.7 | 7631.4 | 4.4 | | ArraysSortTests.Long.testSort | RANDOM | 800 | 6.4 | 3.0 | 2.2 | | ArraysSortTests.Long.testSort | RANDOM | 7000 | 204.9 | 33.2 | 6.2 | | ArraysSortTests.Long.testSort | RANDOM | 50000 | 2061.8 | 281.7 | 7.3 | | ArraysSortTests.Long.testSort | RANDOM | 300000 | 14055.6 | 1988.4 | 7.1 | | ArraysSortTests.Long.testSort | RANDOM | 2000000 | 110750.1 | 15483.1 | 7.2 | | ArraysSortTests.Long.testSort | REPEATED | 800 | 1.6 | 1.7 | 1.0 | | ArraysSortTests.Long.testSort | REPEATED | 7000 | 19.8 | 35.1 | 0.6 | | ArraysSortTests.Long.testSort | REPEATED | 50000 | 308.4 | 390.3 | 0.8 | | ArraysSortTests.Long.testSort | REPEATED | 300000 | 1924.3 | 253.0 | 7.6 | | ArraysSortTests.Long.testSort | REPEATED | 2000000 | 12113.7 | 1985.3 | 6.1 | | ArraysSortTests.Long.testSort | STAGGER | 800 | 1.9 | 3.2 | 0.6 | | ArraysSortTests.Long.testSort | STAGGER | 7000 | 18.2 | 32.8 | 0.6 | | ArraysSortTests.Long.testSort | STAGGER | 50000 | 110.2 | 265.0 | 0.4 | | ArraysSortTests.Long.testSort | STAGGER | 300000 | 851.0 | 1827.9 | 0.5 | | ArraysSortTests.Long.testSort | STAGGER | 2000000 | 6601.0 | 14350.6 | 0.5 | | ArraysSortTests.Long.testSort | SHUFFLE | 800 | 4.2 | 3.3 | 1.3 | | ArraysSortTests.Long.testSort | SHUFFLE | 7000 | 83.0 | 32.6 | 2.5 | | ArraysSortTests.Long.testSort | SHUFFLE | 50000 | 761.7 | 274.2 | 2.8 | | ArraysSortTests.Long.testSort | SHUFFLE | 300000 | 5204.1 | 1999.7 | 2.6 | | ArraysSortTests.Long.testSort | SHUFFLE | 2000000 | 33995.3 | 14655.9 | 2.3 | ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1681035789 From jvernee at openjdk.org Wed Aug 16 18:18:46 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 16 Aug 2023 18:18:46 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v5] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge branch 'master' into JEP22 - remove spurious imports - enable fallback linker on linux x86 in GHA - make Arena::allocate abstract - 8313894: Rename isTrivial linker option to critical Reviewed-by: pminborg, mcimadamore - 8313680: Disallow combining caputreCallState with isTrivial Reviewed-by: mcimadamore - Merge branch 'master' into JEP22 - use immutable map for fallback linker canonical layouts - 8313265: Move the FFM API out of preview Reviewed-by: mcimadamore - 8313005: Ensure native access check can fold away Reviewed-by: mcimadamore - ... and 11 more: https://git.openjdk.org/jdk/compare/6b396da2...5352dc0f ------------- Changes: https://git.openjdk.org/jdk/pull/15103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=04 Stats: 2834 lines in 232 files changed: 1245 ins; 901 del; 688 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From duke at openjdk.org Wed Aug 16 18:25:13 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 16 Aug 2023 18:25:13 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: <6Q6Iir5vHOVTSn5Not2MlqgHgRJq9EhKkXpW5kGaGkw=.6ce83c89-aabe-4d9b-a6c3-9bba23e49492@github.com> References: <6Q6Iir5vHOVTSn5Not2MlqgHgRJq9EhKkXpW5kGaGkw=.6ce83c89-aabe-4d9b-a6c3-9bba23e49492@github.com> Message-ID: On Tue, 15 Aug 2023 19:23:24 GMT, iaroslavski wrote: >>> @vamsi-parasa We need to preserve NaNs. The base (https://github.com/intel/x86-simd-sort) algorithm used doesn't preserve NaNs. >> >> Thanks for catching this Sandhya! This is fixed now in the most recent commit. A preprocessing step is added to move the NaNs to the top of the array. > > Hello @vamsi-parasa ! > > Do you process negative zeros properly? From one hand -0.0f equals to 0.0f, but negative zeros must be placed before 0.0f. > See javadoc for Arrays.sort(float[] a). The same situation with -0.0d (double type). @iaroslavski Hello Vladimir, Please see the `Arrays.sort()` performance comparison between the **enhanced DPQS/Radix sort #13568** vs. **AVX512 sort intrinsic** (this PR) using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568) - The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort` - Please see the performance numbers below. The scores shown are the average time (us/op), thus lower is better. The last column towards the right shows the speedup. - Space complexity was not benchmarked as AVX512 sort is an in-place sorting algorithm while the Radix sort needs extra memory. | Benchmark | Mode | Size | Enhanced DPQS/Radix (us/op) | AVX512 Sort (us/op) | Speedup | | --- | --- | --- | --- | --- | --- | | ArraysSortTests.Double.testSort | RANDOM | 800 | 8.4 | 2.6 | 3.3x | | ArraysSortTests.Double.testSort | RANDOM | 7000 | 94.4 | 28.0 | 3.4x | | ArraysSortTests.Double.testSort | RANDOM | 50000 | 634.9 | 238.4 | 2.7x | | ArraysSortTests.Double.testSort | RANDOM | 300000 | 3887.6 | 1721.6 | 2.3x | | ArraysSortTests.Double.testSort | RANDOM | 2000000 | 29348.9 | 13854.8 | 2.1x | | ArraysSortTests.Double.testSort | REPEATED | 800 | 2.0 | 1.6 | 1.2x | | ArraysSortTests.Double.testSort | REPEATED | 7000 | 39.6 | 37.2 | 1.1x | | ArraysSortTests.Double.testSort | REPEATED | 50000 | 470.0 | 386.3 | 1.2x | | ArraysSortTests.Double.testSort | REPEATED | 300000 | 2940.8 | 342.7 | 8.6x | | ArraysSortTests.Double.testSort | REPEATED | 2000000 | 19361.4 | 2838.9 | 6.8x | | ArraysSortTests.Double.testSort | STAGGER | 800 | 2.3 | 2.7 | 0.8x | | ArraysSortTests.Double.testSort | STAGGER | 7000 | 25.6 | 28.0 | 0.9x | | ArraysSortTests.Double.testSort | STAGGER | 50000 | 159.8 | 224.0 | 0.7x | | ArraysSortTests.Double.testSort | STAGGER | 300000 | 945.3 | 1566.8 | 0.6x | | ArraysSortTests.Double.testSort | STAGGER | 2000000 | 6398.7 | 12668.3 | 0.5x | | ArraysSortTests.Double.testSort | SHUFFLE | 800 | 4.9 | 2.7 | 1.8x | | ArraysSortTests.Double.testSort | SHUFFLE | 7000 | 55.9 | 27.5 | 2.0x | | ArraysSortTests.Double.testSort | SHUFFLE | 50000 | 419.8 | 233.5 | 1.8x | | ArraysSortTests.Double.testSort | SHUFFLE | 300000 | 2636.2 | 1569.6 | 1.7x | | ArraysSortTests.Double.testSort | SHUFFLE | 2000000 | 21131.7 | 12963.4 | 1.6x | | ArraysSortTests.Float.testSort | RANDOM | 800 | 7.4 | 1.8 | 4.1x | | ArraysSortTests.Float.testSort | RANDOM | 7000 | 46.1 | 18.8 | 2.4x | | ArraysSortTests.Float.testSort | RANDOM | 50000 | 328.5 | 177.2 | 1.9x | | ArraysSortTests.Float.testSort | RANDOM | 300000 | 1960.5 | 1342.3 | 1.5x | | ArraysSortTests.Float.testSort | RANDOM | 2000000 | 14502.4 | 10087.5 | 1.4x | | ArraysSortTests.Float.testSort | REPEATED | 800 | 2.0 | 0.8 | 2.4x | | ArraysSortTests.Float.testSort | REPEATED | 7000 | 30.6 | 6.6 | 4.6x | | ArraysSortTests.Float.testSort | REPEATED | 50000 | 369.0 | 34.9 | 10.6x | | ArraysSortTests.Float.testSort | REPEATED | 300000 | 2937.7 | 207.1 | 14.2x | | ArraysSortTests.Float.testSort | REPEATED | 2000000 | 19008.3 | 1503.9 | 12.6x | | ArraysSortTests.Float.testSort | STAGGER | 800 | 2.2 | 1.7 | 1.3x | | ArraysSortTests.Float.testSort | STAGGER | 7000 | 25.7 | 19.0 | 1.4x | | ArraysSortTests.Float.testSort | STAGGER | 50000 | 151.2 | 153.6 | 1.0x | | ArraysSortTests.Float.testSort | STAGGER | 300000 | 873.0 | 1165.8 | 0.7x | | ArraysSortTests.Float.testSort | STAGGER | 2000000 | 5720.0 | 8600.0 | 0.7x | | ArraysSortTests.Float.testSort | SHUFFLE | 800 | 4.9 | 1.7 | 2.9x | | ArraysSortTests.Float.testSort | SHUFFLE | 7000 | 41.6 | 18.5 | 2.3x | | ArraysSortTests.Float.testSort | SHUFFLE | 50000 | 362.9 | 156.4 | 2.3x | | ArraysSortTests.Float.testSort | SHUFFLE | 300000 | 2067.8 | 1204.3 | 1.7x | | ArraysSortTests.Float.testSort | SHUFFLE | 2000000 | 14591.3 | 9040.1 | 1.6x | | ArraysSortTests.Int.testSort | RANDOM | 800 | 7.0 | 1.3 | 5.4x | | ArraysSortTests.Int.testSort | RANDOM | 7000 | 30.4 | 14.2 | 2.1x | | ArraysSortTests.Int.testSort | RANDOM | 50000 | 255.2 | 153.4 | 1.7x | | ArraysSortTests.Int.testSort | RANDOM | 300000 | 1618.5 | 1139.2 | 1.4x | | ArraysSortTests.Int.testSort | RANDOM | 2000000 | 11557.7 | 8509.0 | 1.4x | | ArraysSortTests.Int.testSort | REPEATED | 800 | 1.2 | 0.5 | 2.5x | | ArraysSortTests.Int.testSort | REPEATED | 7000 | 10.6 | 3.9 | 2.7x | | ArraysSortTests.Int.testSort | REPEATED | 50000 | 192.7 | 20.8 | 9.3x | | ArraysSortTests.Int.testSort | REPEATED | 300000 | 1952.6 | 138.8 | 14.1x | | ArraysSortTests.Int.testSort | REPEATED | 2000000 | 12969.4 | 732.2 | 17.7x | | ArraysSortTests.Int.testSort | STAGGER | 800 | 1.5 | 1.2 | 1.3x | | ArraysSortTests.Int.testSort | STAGGER | 7000 | 13.2 | 14.2 | 0.9x | | ArraysSortTests.Int.testSort | STAGGER | 50000 | 94.1 | 116.7 | 0.8x | | ArraysSortTests.Int.testSort | STAGGER | 300000 | 620.0 | 949.2 | 0.7x | | ArraysSortTests.Int.testSort | STAGGER | 2000000 | 4279.9 | 7195.2 | 0.6x | | ArraysSortTests.Int.testSort | SHUFFLE | 800 | 4.6 | 1.2 | 3.7x | | ArraysSortTests.Int.testSort | SHUFFLE | 7000 | 33.2 | 13.9 | 2.4x | | ArraysSortTests.Int.testSort | SHUFFLE | 50000 | 385.1 | 138.3 | 2.8x | | ArraysSortTests.Int.testSort | SHUFFLE | 300000 | 2031.9 | 1030.3 | 2.0x | | ArraysSortTests.Int.testSort | SHUFFLE | 2000000 | 20460.8 | 7631.4 | 2.7x | | ArraysSortTests.Long.testSort | RANDOM | 800 | 6.8 | 3.0 | 2.3x | | ArraysSortTests.Long.testSort | RANDOM | 7000 | 83.6 | 33.2 | 2.5x | | ArraysSortTests.Long.testSort | RANDOM | 50000 | 616.0 | 281.7 | 2.2x | | ArraysSortTests.Long.testSort | RANDOM | 300000 | 3752.6 | 1988.4 | 1.9x | | ArraysSortTests.Long.testSort | RANDOM | 2000000 | 28236.2 | 15483.1 | 1.8x | | ArraysSortTests.Long.testSort | REPEATED | 800 | 1.3 | 1.7 | 0.8x | | ArraysSortTests.Long.testSort | REPEATED | 7000 | 19.5 | 35.1 | 0.6x | | ArraysSortTests.Long.testSort | REPEATED | 50000 | 309.8 | 390.3 | 0.8x | | ArraysSortTests.Long.testSort | REPEATED | 300000 | 2046.2 | 253.0 | 8.1x | | ArraysSortTests.Long.testSort | REPEATED | 2000000 | 13105.2 | 1985.3 | 6.6x | | ArraysSortTests.Long.testSort | STAGGER | 800 | 1.6 | 3.2 | 0.5x | | ArraysSortTests.Long.testSort | STAGGER | 7000 | 13.3 | 32.8 | 0.4x | | ArraysSortTests.Long.testSort | STAGGER | 50000 | 116.3 | 265.0 | 0.4x | | ArraysSortTests.Long.testSort | STAGGER | 300000 | 684.3 | 1827.9 | 0.4x | | ArraysSortTests.Long.testSort | STAGGER | 2000000 | 5022.3 | 14350.6 | 0.3x | | ArraysSortTests.Long.testSort | SHUFFLE | 800 | 4.4 | 3.3 | 1.3x | | ArraysSortTests.Long.testSort | SHUFFLE | 7000 | 39.9 | 32.6 | 1.2x | | ArraysSortTests.Long.testSort | SHUFFLE | 50000 | 614.9 | 274.2 | 2.2x | | ArraysSortTests.Long.testSort | SHUFFLE | 300000 | 2546.9 | 1999.7 | 1.3x | | ArraysSortTests.Long.testSort | SHUFFLE | 2000000 | 31300.8 | 14655.9 | 2.1x | ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1681080351 From iklam at openjdk.org Wed Aug 16 19:34:32 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 16 Aug 2023 19:34:32 GMT Subject: RFR: 8314249: Refactor handling of invokedynamic in JVMCI ConstantPool [v2] In-Reply-To: <6pkU1viiXcWvXP5KwsD9-8HYEy_SoPkR6-Ea4l9OWhs=.dba83e93-e1e8-4778-b2af-f5a1b7792a73@github.com> References: <6pkU1viiXcWvXP5KwsD9-8HYEy_SoPkR6-Ea4l9OWhs=.dba83e93-e1e8-4778-b2af-f5a1b7792a73@github.com> Message-ID: <-3xlD1aO8Z6hv9_2bHIr8rneyyAhib-UL4Y_r0R9_LM=.47e09803-88f5-4173-b72b-86e32e779d2f@github.com> > This PR is part of the clean up JVMCI to track [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993), where the constant pool cache is being removed (as of now, only method references use the CpCache). > > 1. `rawIndexToConstantPoolIndex()` is used only for the `invokedynamic` bytecode. It should be renamed to `indyIndexConstantPoolIndex()` > > 2. `rawIndexToConstantPoolCacheIndex()` should not be called for the `invokedynamic` bytecode, which doesn't use cpCache entries after [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995). > > 3. Some `cpi` parameters should be renamed to `rawIndex` or `which` > > 4. Added a test case for `ConstantPool.lookupAppendix()`, which was not tested in the JDK repo. > > I added comments about the 4 types of indices used in HotSpotConstantPool.java: `cpi`, `rawIndex`, `cpci` and `which`. The latter two types will be removed after [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993) is complete. > > Note that there are still some incorrect use of `cpi` in the implementation and test cases. Those will be cleaned up in [JDK-8314172](https://bugs.openjdk.org/browse/JDK-8314172) Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @dougxc comments - fixed typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15297/files - new: https://git.openjdk.org/jdk/pull/15297/files/337895c6..e09f65fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15297&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15297&range=00-01 Stats: 6 lines in 3 files changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/15297.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15297/head:pull/15297 PR: https://git.openjdk.org/jdk/pull/15297 From coleenp at openjdk.org Wed Aug 16 19:53:08 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 16 Aug 2023 19:53:08 GMT Subject: RFR: 8314249: Refactor handling of invokedynamic in JVMCI ConstantPool [v2] In-Reply-To: <-3xlD1aO8Z6hv9_2bHIr8rneyyAhib-UL4Y_r0R9_LM=.47e09803-88f5-4173-b72b-86e32e779d2f@github.com> References: <6pkU1viiXcWvXP5KwsD9-8HYEy_SoPkR6-Ea4l9OWhs=.dba83e93-e1e8-4778-b2af-f5a1b7792a73@github.com> <-3xlD1aO8Z6hv9_2bHIr8rneyyAhib-UL4Y_r0R9_LM=.47e09803-88f5-4173-b72b-86e32e779d2f@github.com> Message-ID: On Wed, 16 Aug 2023 19:34:32 GMT, Ioi Lam wrote: >> This PR is part of the clean up JVMCI to track [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993), where the constant pool cache is being removed (as of now, only method references use the CpCache). >> >> 1. `rawIndexToConstantPoolIndex()` is used only for the `invokedynamic` bytecode. It should be renamed to `indyIndexConstantPoolIndex()` >> >> 2. `rawIndexToConstantPoolCacheIndex()` should not be called for the `invokedynamic` bytecode, which doesn't use cpCache entries after [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995). >> >> 3. Some `cpi` parameters should be renamed to `rawIndex` or `which` >> >> 4. Added a test case for `ConstantPool.lookupAppendix()`, which was not tested in the JDK repo. >> >> I added comments about the 4 types of indices used in HotSpotConstantPool.java: `cpi`, `rawIndex`, `cpci` and `which`. The latter two types will be removed after [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993) is complete. >> >> Note that there are still some incorrect use of `cpi` in the implementation and test cases. Those will be cleaned up in [JDK-8314172](https://bugs.openjdk.org/browse/JDK-8314172) > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @dougxc comments - fixed typos This makes sense. Do we run this test with hotspot testing? src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 588: > 586: @Override > 587: public BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { > 588: int cpi = opcode == -1 ? index : indyIndexConstantPoolIndex(index, opcode); Why would opcode be -1 here? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15297#pullrequestreview-1581310736 PR Review Comment: https://git.openjdk.org/jdk/pull/15297#discussion_r1296357340 From dnsimon at openjdk.org Wed Aug 16 20:31:12 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 16 Aug 2023 20:31:12 GMT Subject: RFR: 8314249: Refactor handling of invokedynamic in JVMCI ConstantPool [v2] In-Reply-To: References: <6pkU1viiXcWvXP5KwsD9-8HYEy_SoPkR6-Ea4l9OWhs=.dba83e93-e1e8-4778-b2af-f5a1b7792a73@github.com> <-3xlD1aO8Z6hv9_2bHIr8rneyyAhib-UL4Y_r0R9_LM=.47e09803-88f5-4173-b72b-86e32e779d2f@github.com> Message-ID: <5Tp-G3g7Ib9pLcj3PV1TLwy0H_k6Dyz_Th1OqYyu6yw=.0e516d2d-6b55-447e-afce-e78ef919134a@github.com> On Wed, 16 Aug 2023 19:47:00 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @dougxc comments - fixed typos > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 588: > >> 586: @Override >> 587: public BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { >> 588: int cpi = opcode == -1 ? index : indyIndexConstantPoolIndex(index, opcode); > > Why would opcode be -1 here? So that tests such as TestDynamicConstant.java can [iterate through the constant pool, looking for invokedynamic related entries](https://github.com/openjdk/jdk/blob/f143380d013b8c0e5ab7ca0026c34e27e7946f69/test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/src/jdk/vm/ci/hotspot/test/TestDynamicConstant.java#L356-L359). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15297#discussion_r1296392827 From duke at openjdk.org Wed Aug 16 21:49:30 2023 From: duke at openjdk.org (iaroslavski) Date: Wed, 16 Aug 2023 21:49:30 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14] In-Reply-To: References: <6Q6Iir5vHOVTSn5Not2MlqgHgRJq9EhKkXpW5kGaGkw=.6ce83c89-aabe-4d9b-a6c3-9bba23e49492@github.com> Message-ID: On Wed, 16 Aug 2023 18:22:19 GMT, Srinivas Vamsi Parasa wrote: >> Hello @vamsi-parasa ! >> >> Do you process negative zeros properly? From one hand -0.0f equals to 0.0f, but negative zeros must be placed before 0.0f. >> See javadoc for Arrays.sort(float[] a). The same situation with -0.0d (double type). > > @iaroslavski > > Hello Vladimir, > > Please see the `Arrays.sort()` performance comparison between the **enhanced DPQS/Radix sort #13568** vs. **AVX512 sort intrinsic** (this PR) using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568) > > - The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSortTests` > - Please see the performance numbers below. The scores shown are the average time (us/op), thus lower is better. The last column towards the right shows the speedup. > - Space complexity was not benchmarked as AVX512 sort is an in-place sorting algorithm while the Radix sort needs extra memory. > > | Benchmark | Mode | Size | Enhanced DPQS/Radix (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | --- | > | ArraysSortTests.Double.testSort | RANDOM | 800 | 8.4 | 2.6 | 3.3x | > | ArraysSortTests.Double.testSort | RANDOM | 7000 | 94.4 | 28.0 | 3.4x | > | ArraysSortTests.Double.testSort | RANDOM | 50000 | 634.9 | 238.4 | 2.7x | > | ArraysSortTests.Double.testSort | RANDOM | 300000 | 3887.6 | 1721.6 | 2.3x | > | ArraysSortTests.Double.testSort | RANDOM | 2000000 | 29348.9 | 13854.8 | 2.1x | > | ArraysSortTests.Double.testSort | REPEATED | 800 | 2.0 | 1.6 | 1.2x | > | ArraysSortTests.Double.testSort | REPEATED | 7000 | 39.6 | 37.2 | 1.1x | > | ArraysSortTests.Double.testSort | REPEATED | 50000 | 470.0 | 386.3 | 1.2x | > | ArraysSortTests.Double.testSort | REPEATED | 300000 | 2940.8 | 342.7 | 8.6x | > | ArraysSortTests.Double.testSort | REPEATED | 2000000 | 19361.4 | 2838.9 | 6.8x | > | ArraysSortTests.Double.testSort | STAGGER | 800 | 2.3 | 2.7 | 0.8x | > | ArraysSortTests.Double.testSort | STAGGER | 7000 | 25.6 | 28.0 | 0.9x | > | ArraysSortTests.Double.testSort | STAGGER | 50000 | 159.8 | 224.0 | 0.7x | > | ArraysSortTests.Double.testSort | STAGGER | 300000 | 945.3 | 1566.8 | 0.6x | > | ArraysSortTests.Double.testSort | STAGGER | 2000000 | 6398.7 | 12668.3 | 0.5x | > | ArraysSortTests.Double.testSort | SHUFFLE | 800 | 4.9 | 2.7 | 1.8x | > | ArraysSortTests.Double.testSort | SHUFFLE | 7000 | 55.9 | 27.5 | 2.0x | > | ArraysSortTests.Double.testSort | SHUFFLE | 50000 | 4... @vamsi-parasa Hello Vamsi, Many thanks for the great details! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1681312328 From iklam at openjdk.org Wed Aug 16 22:03:27 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 16 Aug 2023 22:03:27 GMT Subject: RFR: 8314249: Refactor handling of invokedynamic in JVMCI ConstantPool [v2] In-Reply-To: References: <6pkU1viiXcWvXP5KwsD9-8HYEy_SoPkR6-Ea4l9OWhs=.dba83e93-e1e8-4778-b2af-f5a1b7792a73@github.com> <-3xlD1aO8Z6hv9_2bHIr8rneyyAhib-UL4Y_r0R9_LM=.47e09803-88f5-4173-b72b-86e32e779d2f@github.com> Message-ID: On Wed, 16 Aug 2023 19:50:02 GMT, Coleen Phillimore wrote: > This makes sense. Do we run this test with hotspot testing? Yes, tests under `test/hotspot/jtreg/compiler/jvmci` are regularly tested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15297#issuecomment-1681325411 From fyang at openjdk.org Thu Aug 17 02:15:26 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 17 Aug 2023 02:15:26 GMT Subject: RFR: 8314268: Missing include in assembler_riscv.hpp In-Reply-To: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> References: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> Message-ID: On Tue, 15 Aug 2023 11:50:17 GMT, Robbin Ehn wrote: > Hello, please consider. Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15285#pullrequestreview-1581658503 From rehn at openjdk.org Thu Aug 17 06:49:27 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 17 Aug 2023 06:49:27 GMT Subject: RFR: 8314268: Missing include in assembler_riscv.hpp In-Reply-To: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> References: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> Message-ID: On Tue, 15 Aug 2023 11:50:17 GMT, Robbin Ehn wrote: > Hello, please consider. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15285#issuecomment-1681716856 From fyang at openjdk.org Thu Aug 17 07:45:38 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 17 Aug 2023 07:45:38 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v2] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 16:27:50 GMT, Ilya Gavrilin wrote: >> Please review this changes into risc-v double rounding intrinsic. >> >> On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). >> >> Also, we should provide some rounding mode to fcvt.x.x instruction. >> >> Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: >> >>> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). >> >> For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. >> For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. >> >> For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). >> As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). >> >> We have observed significant improvement on hifive and thead boards. >> >> testing: tier1, tier2 and hotspot:tier3 on hifive >> >> Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): >> >> Without intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms >> >> With intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Change fsgnj_d(dst, src, src) to fmv_d(dst, src) Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4250: > 4248: > 4249: // Round double with mode > 4250: We should add some comment here. Maybe: // According to Java SE specification, for floating-point round operations, if // the input is NaN, +/-infinity, or +/-0, the same input is returned as the // rounded result; this differs from behavior of RISC-V fcvt instructions (which // round out-of-range values to the nearest max or min value), therefore special // handling is needed by NaN, +/-Infinity, +/-0. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4258: > 4256: > 4257: // setting roundig mode to double->long (rm_direct) and long->double (rm_back) conversions > 4258: RoundingMode rm_direct, rm_back; Can we use the same rounding mode for conversions in both direction? Say `rup` for `ceil`, and `rdn` for `floor`. I see this policy is used for both glibc [1] and V8. [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/riscv/rv64/rvd/s_ceil.c;h=6c355cd72691c45c97201fe8947683287982ade9;hb=41d8c3bc33bcae1ebb8077b0442caef4917f763a src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1236: > 1234: rmode_floor, > 1235: rmode_rint > 1236: }; Why not use the existing `RoundDoubleModeNode::rmode_ceil`, `RoundDoubleModeNode::rmode_floor` and `RoundDoubleModeNode::rmode_rint` instead? src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1238: > 1236: }; > 1237: > 1238: void round_double_mode(FloatRegister dst, FloatRegister src, enum Round_double_mode round_mode, Register converted_dbl, Register mask, Register converted_dbl_masked); I would prefer to rename the last three parameters as `tmp1`, `tmp2` and `tmp3`. You could create aliases for these parameters where you feel necessary. src/hotspot/cpu/riscv/riscv.ad line 7706: > 7704: match(Set dst (RoundDoubleMode src rmode)); > 7705: ins_cost(2 * XFER_COST + BRANCH_COST); > 7706: effect(TEMP_DEF dst, TEMP tmp1, TEMP tmp2, TEMP tmp3, KILL cr); Did we kill `cr` anywhere in the assembly code? src/hotspot/cpu/riscv/riscv.ad line 7708: > 7706: effect(TEMP_DEF dst, TEMP tmp1, TEMP tmp2, TEMP tmp3, KILL cr); > 7707: > 7708: format %{ "RoundDoubleMode $src,$rmode" %} Indentation: Please leave a space between the two operands. src/hotspot/cpu/riscv/riscv.ad line 7727: > 7725: } > 7726: %} > 7727: ins_pipe(fp_rnd_d); I think `pipe_class_default` will do here. No need for another new pipe class. ------------- PR Review: https://git.openjdk.org/jdk/pull/14991#pullrequestreview-1581868109 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296756944 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296766101 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296734767 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296743831 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296750058 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296751637 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296749660 From vkempik at openjdk.org Thu Aug 17 08:17:29 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 17 Aug 2023 08:17:29 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v2] In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 06:30:47 GMT, Fei Yang wrote: >> Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: >> >> Change fsgnj_d(dst, src, src) to fmv_d(dst, src) > > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1236: > >> 1234: rmode_floor, >> 1235: rmode_rint >> 1236: }; > > Why not use the existing `RoundDoubleModeNode::rmode_ceil`, `RoundDoubleModeNode::rmode_floor` and `RoundDoubleModeNode::rmode_rint` instead? not sure it's good idea: RoundDoubleModeNode enum is purely C2 entity. Using some C2 enum in macroAssembler_riscv ( not c2_MacroAssembler_riscv) doesn't sound good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296845599 From fyang at openjdk.org Thu Aug 17 08:23:27 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 17 Aug 2023 08:23:27 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v2] In-Reply-To: References: Message-ID: <40ZBlZSH8kW1ku-h_JWAOq3h-OtJPbcqV9X6DZ_Ssmg=.2fff99ca-2f9e-45dd-9914-2731663f4c2d@github.com> On Thu, 17 Aug 2023 08:15:01 GMT, Vladimir Kempik wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1236: >> >>> 1234: rmode_floor, >>> 1235: rmode_rint >>> 1236: }; >> >> Why not use the existing `RoundDoubleModeNode::rmode_ceil`, `RoundDoubleModeNode::rmode_floor` and `RoundDoubleModeNode::rmode_rint` instead? > > not sure it's good idea: RoundDoubleModeNode enum is purely C2 entity. Using some C2 enum in macroAssembler_riscv ( not c2_MacroAssembler_riscv) doesn't sound good. Make sense. So we might further move the new assembler function `MacroAssembler::round_double_mode` into c2_MacroAssembler_riscv as it's only used for C2 for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296852355 From duke at openjdk.org Thu Aug 17 10:02:14 2023 From: duke at openjdk.org (emmyyin) Date: Thu, 17 Aug 2023 10:02:14 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm Message-ID: ### Purpose IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". The difference between the algorithms is illustrated in the following video: https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). ### Implementation The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a few iterations. In case that happens, one can "reset" the layout using the `HierarchicalLayoutManager` by clicking the icon for the sea of nodes layout and then go back to the stable sea of nodes layout. ### Testing The testing has mostly been through manual testing. A program that randomly tests graphs has also been utilized, which can be [found on this branch](https://github.com/emmyyin/jdk/tree/JDK-8309463-test). ### Miscellaneous Some things that are helpful to be aware of: * The code used to test the layout quality and stability for the thesis still remains in the class. * The dynamic algorithm works best for small changes and is recommended for exploration (expanding and collapsing nodes). * Every time a change occurs to the displayed graph, the obtained graph is treated as a completely new graph. It is therefore necessary to go through all the `LayoutNode` objects in both graphs to see if they appear in both graphs, and if so update the reference from the old to the new node. Similarly for the edges. To further enhance the algorithm, one could explore: * Automatic resetting with `HierarchicalLayoutManager` * The order of which the update actions are applied * How the layers and positions are chosen for nodes that are to be inserted into the graph ### Known issues * The difference graph functionality (layout highlighting the difference between two chosen graphs) is not working for the `HierarchicalStableLayoutManager`. ------------- Commit messages: - removing trailing ws - removing unnecessary call - clean up comments - move link check to stable layout manager - speed up layout manager - fic bug: let layout stay the same when no changes to graph - removing trailing ws - fix bug: CFG - bug fix: enable removal of empty layers - fix bug: adding node to empty graph - ... and 20 more: https://git.openjdk.org/jdk/compare/97df6cf5...9e33de52 Changes: https://git.openjdk.org/jdk/pull/14349/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309463 Stats: 2194 lines in 12 files changed: 2121 ins; 51 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From tholenstein at openjdk.org Thu Aug 17 10:02:19 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 17 Aug 2023 10:02:19 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 08:33:12 GMT, emmyyin wrote: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... A few tips and comments: - Remove TODOs, unused code (commented out) and personal comments - Adjust copyright year of all touched files to 2023 - Add comments to important and/or complex parts of your code - Can we somehow separate the evaluation from the layout algorithm? - Either in separate files or at least disable evaluation by default with a flag - Do we want to keep the shortcuts? - Perhaps keep `LayoutAction1` for relayout with static layout algorithm - rename `LayoutAction1` and give it a better shortcut - ideally leave `HierarchicalLayoutManager.java` untouched or at least make sure other parts of IGV work as before - rename `new_layout.png` , `NewLayoutManager.java` and `EnableNewLayoutAction.java` - make sure to document in the PR description what changed other than the new layout algorithm - e.g. what changed in the settings (`ANIMATION_LIMIT`) - what changed in `ServerCompilerScheduler.java` src/utils/IdealGraphVisualizer/ControlFlow/src/main/java/com/sun/hotspot/igv/controlflow/HierarchicalGraphLayout.java line 164: > 162: > 163: LayoutGraph layoutGraph = new LayoutGraph(links, vertices); > 164: m.doLayout(layoutGraph); should be uncommented src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalLayoutManager.java line 1463: > 1461: boolean hasReversedDown = > 1462: reversedDown.size() > 0 && > 1463: !(reversedDown.size() == 1 && hasSelfEdge); Please avoid style changes to code that you did not touch otherwise (sometimes the IDE does this automatically) src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java line 1221: > 1219: HashSet visibleConnections = getVisibleConnections(); > 1220: > 1221: String key = getModel().getGraph().getGroup().getName() + "::" + getModel().getGraph().getName(); What is the purpose of this? Is it needed? src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramViewModel.java line 369: > 367: Scheduler s = Lookup.getDefault().lookup(Scheduler.class); > 368: graph.clearBlocks(); > 369: s.schedule(graph); was this commented out on purpose? If yes, we can remove the code ------------- PR Comment: https://git.openjdk.org/jdk/pull/14349#issuecomment-1580239463 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1221161139 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1221210767 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1221216343 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1221218395 From rcastanedalo at openjdk.org Thu Aug 17 10:02:20 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 17 Aug 2023 10:02:20 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 08:33:12 GMT, emmyyin wrote: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... Thanks Emmy for creating this pull request! The changes related to improving animations (merged from https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:igv-stable) are orthogonal to the addition of a dynamic layout algorithm and should be extracted, I think, into a separate RFE. That would simplify the task of reviewing and testing your core changes. > Thanks Emmy for creating this pull request! The changes related to improving animations (merged from [master...robcasloz:jdk:igv-stable](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:igv-stable)) are orthogonal to the addition of a dynamic layout algorithm and should be extracted, I think, into a separate RFE. That would simplify the task of reviewing and testing your core changes. @tobiasholenstein has extracted the animation stuff and put your core work in a branch: https://github.com/openjdk/jdk/compare/master...tobiasholenstein:jdk:rebase-toby (thanks @tobiasholenstein!). That branch is probably a better starting point for this PR. I just tried out the PR a little bit and it is starting to look good and robust, good job Emmy! Sometimes, the layout in "Stable sea of nodes" changes when going from one graph to the next in the group even though there are no graph changes at all. Is there any simple solution for that? E.g. not re-computing the layout at all if no graph changes are detected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14349#issuecomment-1582417671 PR Comment: https://git.openjdk.org/jdk/pull/14349#issuecomment-1586725452 PR Comment: https://git.openjdk.org/jdk/pull/14349#issuecomment-1674363021 From duke at openjdk.org Thu Aug 17 10:02:21 2023 From: duke at openjdk.org (emmyyin) Date: Thu, 17 Aug 2023 10:02:21 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 08:33:12 GMT, emmyyin wrote: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... Some TODOs: - [x] Fix relayouting. When the layout drawn by the stable layout manager is no longer readable, redraw using regular layout manager. When user goes back and forth from stable -> regular -> stable it should be "resetted" - [x] Ensure filters work properly - [x] Fix CF for graphs drawn by stable layout manager (on RHS window panel) OBS: does not handle self-edges atm, is it something that should be considered? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14349#issuecomment-1667771411 PR Comment: https://git.openjdk.org/jdk/pull/14349#issuecomment-1674327661 From rcastanedalo at openjdk.org Thu Aug 17 10:02:22 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 17 Aug 2023 10:02:22 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm In-Reply-To: References: Message-ID: <3aYySAZ5d15Uyx1qzzLENbsBjxc5J3BV1my27QYtt58=.7523b2f0-7cf8-455e-8790-468321432a56@github.com> On Fri, 11 Aug 2023 07:35:55 GMT, emmyyin wrote: > OBS: does not handle self-edges atm, is it something that should be considered? Self-edges are only relevant in the CFG view, so we don't need to consider them in the scope of this work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14349#issuecomment-1674361667 From duke at openjdk.org Thu Aug 17 10:18:11 2023 From: duke at openjdk.org (emmyyin) Date: Thu, 17 Aug 2023 10:18:11 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v2] In-Reply-To: References: Message-ID: <2T20E7tdjqEOnPlFR7qh1URlG5nYnOQxK0Jn_m6IuZ8=.6566a3c0-1dce-4603-8031-4f025fb59d8c@github.com> > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with one additional commit since the last revision: adding back comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/9e33de52..f7494095 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From duke at openjdk.org Thu Aug 17 10:55:46 2023 From: duke at openjdk.org (Kimura Yukihiro) Date: Thu, 17 Aug 2023 10:55:46 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError Message-ID: I would like to fix this issue because it is difficult for testers to understand why the test failed. There is no risk as I just added an assertion message instead of exit code error. I would appreciate it if someone could review the fix. ------------- Commit messages: - 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError Changes: https://git.openjdk.org/jdk/pull/15329/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15329&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313901 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15329/head:pull/15329 PR: https://git.openjdk.org/jdk/pull/15329 From shade at openjdk.org Thu Aug 17 11:08:26 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 17 Aug 2023 11:08:26 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 10:48:49 GMT, Kimura Yukihiro wrote: > I would like to fix this issue because it is difficult for testers to understand why the test failed. > There is no risk as I just added an assertion message instead of exit code error. > I would appreciate it if someone could review the fix. Sorry, but we should not do this for two reasons: 1. We cannot be sure that non-zero exit code means we failed to create adapters. 2. stderr printout (as shown in the bug) provides enough diagnostic breadcrumbs. ------------- Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15329#pullrequestreview-1582373547 From thartmann at openjdk.org Thu Aug 17 11:17:31 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 17 Aug 2023 11:17:31 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 10:48:49 GMT, Kimura Yukihiro wrote: > I would like to fix this issue because it is difficult for testers to understand why the test failed. > There is no risk as I just added an assertion message instead of exit code error. > I would appreciate it if someone could review the fix. `compiler/startup/StartupOutput.java` does the same and should then probably be improved as well. I think we can simply assert that the output contains "Out of space in CodeCache for adapters" in this case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15329#issuecomment-1682101643 From rcastanedalo at openjdk.org Thu Aug 17 12:44:55 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 17 Aug 2023 12:44:55 GMT Subject: RFR: 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) Message-ID: This changeset ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see [initial analysis](https://bugs.openjdk.org/browse/JDK-8312749?focusedCommentId=14600658&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14600658) by Erik ?sterlund and Stefan Karlsson and comments in the regression test included in this changeset. As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. #### Testing ##### Functionality - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64) - tier4-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; ZGC-specific tests only) - tier6-9 (linux-x64; ZGC-specific tests only) - tier1-3, and a few custom examples, applying [JDK-8139457](https://github.com/openjdk/jdk/pull/11044) (under review) on top of this changeset ##### Performance Tested performance on the following set of OpenJDK micro-benchmarks, on linux-x64 (for both G1 and ZGC, using different ObjectAlignmentInBytes values): - `openjdk.bench.java.lang.ArrayClone.byteClone` - `openjdk.bench.java.lang.ArrayClone.intClone` - `openjdk.bench.java.lang.ArrayFiddle.simple_clone` - `openjdk.bench.java.lang.Clone.cloneLarge` - `openjdk.bench.java.lang.Clone.cloneThreeDifferent` No significant regression was observed. ------------- Commit messages: - Remove extra whitespace - Revert use of UseNewCode - Revert "TEMPORARY: add additional macro-assembly comments" - Revert "TEMPORARY: set UseNewCode to true by default" - Revert "TEMPORARY: print only 'oop_disjoint_arraycopy_uninit' stub code" - Require GenZGC in the test - Round up object size at the end of the computation - Comment and rename for clarity - Add a regression test - Remove unused variable - ... and 8 more: https://git.openjdk.org/jdk/compare/ec2f38fd...5c56a5e5 Changes: https://git.openjdk.org/jdk/pull/15288/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15288&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312749 Stats: 114 lines in 4 files changed: 89 ins; 9 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/15288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15288/head:pull/15288 PR: https://git.openjdk.org/jdk/pull/15288 From rcastanedalo at openjdk.org Thu Aug 17 13:02:33 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 17 Aug 2023 13:02:33 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v2] In-Reply-To: <2T20E7tdjqEOnPlFR7qh1URlG5nYnOQxK0Jn_m6IuZ8=.6566a3c0-1dce-4603-8031-4f025fb59d8c@github.com> References: <2T20E7tdjqEOnPlFR7qh1URlG5nYnOQxK0Jn_m6IuZ8=.6566a3c0-1dce-4603-8031-4f025fb59d8c@github.com> Message-ID: On Thu, 17 Aug 2023 10:18:11 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with one additional commit since the last revision: > > adding back comment src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalLayoutManager.java line 353: > 351: } > 352: > 353: // THIS PART MIGHT NOT BE NECESSARY SINCE ALL EDGES CAN BE DRAWN FROM BOTTOM UP This TODO item is now lifted to https://bugs.openjdk.org/browse/JDK-8314512, please remove it from here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1297187380 From duke at openjdk.org Thu Aug 17 13:19:06 2023 From: duke at openjdk.org (emmyyin) Date: Thu, 17 Aug 2023 13:19:06 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v3] In-Reply-To: References: Message-ID: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with one additional commit since the last revision: remove comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/f7494095..a6712410 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From duke at openjdk.org Thu Aug 17 14:12:26 2023 From: duke at openjdk.org (emmyyin) Date: Thu, 17 Aug 2023 14:12:26 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v4] In-Reply-To: References: Message-ID: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with two additional commits since the last revision: - Accept suggestion of deactivating stable sea of nodes for diff graphs Co-authored-by: Toby Holenstein - adding blank line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/a6712410..13a10201 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=02-03 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From tholenstein at openjdk.org Thu Aug 17 14:12:28 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 17 Aug 2023 14:12:28 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v4] In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 14:06:36 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with two additional commits since the last revision: > > - Accept suggestion of deactivating stable sea of nodes for diff graphs > > Co-authored-by: Toby Holenstein > - adding blank line src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java line 198: > 196: layoutButtons.add(cfgLayoutButton); > 197: toolBar.add(cfgLayoutButton); > 198: I suggest to deactivate `HierarchicalStableLayoutManager` for difference graphs regarding > The difference graph functionality (layout highlighting the difference between two chosen graphs) is not working for the HierarchicalStableLayoutManager. Suggestion: diagramViewModel.getGraphChangedEvent().addListener(model -> { // HierarchicalStableLayoutManager is not stable for difference graphs boolean isDiffGraph = model.getGraph().isDiffGraph(); // deactivate HierarchicalStableLayoutManager for difference graphs stableSeaLayoutButton.setEnabled(!isDiffGraph); if (stableSeaLayoutButton.isSelected() && isDiffGraph) { // fallback to HierarchicalLayoutManager for difference graphs seaLayoutButton.setSelected(true); } }); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1297273972 From duke at openjdk.org Thu Aug 17 14:17:24 2023 From: duke at openjdk.org (emmyyin) Date: Thu, 17 Aug 2023 14:17:24 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v5] In-Reply-To: References: Message-ID: <7RAXRCktzo9NArTyV_NBwxxLl7zgCaxurRLwwwKzeAM=.91d9abbe-12e0-43af-8e5c-b13052629a56@github.com> > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with one additional commit since the last revision: fixing ws error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/13a10201..9d803146 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From rehn at openjdk.org Thu Aug 17 14:48:34 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 17 Aug 2023 14:48:34 GMT Subject: Integrated: 8314268: Missing include in assembler_riscv.hpp In-Reply-To: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> References: <_lmdoBPLSaog0QT4SOmn5yenC1G8aOJThkw5R-MY5Ho=.c2610b32-0c81-43c9-8686-676f3696f2b9@github.com> Message-ID: On Tue, 15 Aug 2023 11:50:17 GMT, Robbin Ehn wrote: > Hello, please consider. This pull request has now been integrated. Changeset: e8f6b3e4 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/e8f6b3e4970000e721da9312585e77de49bb8ed8 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8314268: Missing include in assembler_riscv.hpp Reviewed-by: shade, fyang ------------- PR: https://git.openjdk.org/jdk/pull/15285 From shade at openjdk.org Thu Aug 17 15:14:38 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 17 Aug 2023 15:14:38 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 10:48:49 GMT, Kimura Yukihiro wrote: > I would like to fix this issue because it is difficult for testers to understand why the test failed. > There is no risk as I just added an assertion message instead of exit code error. > I would appreciate it if someone could review the fix. Ah, I misread the intent of the fix, I think. I agree we should just check the output contains `Out of space in CodeCache for adapters`, and ignore the exit code. Or better: accept exit code 0; otherwise check the output contains the failing message. This would not fail the test if it does not reach the code cache exhaustion for some reason. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15329#issuecomment-1682456288 PR Comment: https://git.openjdk.org/jdk/pull/15329#issuecomment-1682458629 From bulasevich at openjdk.org Thu Aug 17 15:29:29 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 17 Aug 2023 15:29:29 GMT Subject: RFR: 8313419: Template interpreter produces no safepoint check for return bytecodes In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 13:49:55 GMT, Fredrik Bredberg wrote: > would appreciate if @RealFYang and @bulasevich could take it for a real test drive. Right now (after commit "8301996: Move field resolution" commit) arm32 is broken. I tested this change with the latest working revision: tier1-tier3 is fine. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15248#issuecomment-1682484190 From kvn at openjdk.org Thu Aug 17 16:25:32 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 17 Aug 2023 16:25:32 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22] In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 19:17:48 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> An explanation for the path chosen in the PR to bring in the SIMD Arrays.sort at the top level instead of only bringing in the smaller components from the algorithm is as follows: the key components of Arrays.sort are pivot selection, partitioning, partition sort. Among these, the two hottest components are partitioning and partition sort. Both could be individually accelerated using SIMD implementations. However, what we noticed was that just bringing in these two individual optimizations gave us half the performance gain versus bringing in the entire AVX512 SIMD sort. AVX512 SIMD sort implements a single-pivot quicksort algorithm (SPQS) by selecting a single pivot and then recursively partitioning the array into two smaller partitions using SIMD instructions. When the partition size becomes less than or equal to 128, it uses a SIMD bitonic sort using x86 AVX512 intrinsics to sort that partition. However, the default implementation of Arrays.sort() in Java is the dual pivot quick sort (DPQS) not the SPQS. If the partitioning in the DPQS is implemented using AVX512, it needs two passes of the single-pivot AVX512 partitioning function (instead of just one in the case of SPQS), thereby leading to loss of 50% performance. >> >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> ... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Fix preservation of NaNs for floats and doubles Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1682592597 From pchilanomate at openjdk.org Thu Aug 17 17:07:31 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 17 Aug 2023 17:07:31 GMT Subject: RFR: 8313419: Template interpreter produces no safepoint check for return bytecodes In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 13:22:19 GMT, Fredrik Bredberg wrote: > The template interpreter produces a safepoint check for return bytecodes (TemplateTable::_return(TosState state)) on x86, ppc64le and s390, but not on aarch64, arm32, and riscv64. > > This PR adds the missing safepoint check to aarch64, arm32, and riscv64. > > Tested tier1-tier7 on aarch64. Both arm32, and riscv64 was sanity tested using Qemu. Looks good to me. src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2206: > 2204: __ push(state); > 2205: __ push_cont_fastpath(rthread); > 2206: __ call_VM(noreg, CAST_FROM_FN_PTR(address, InterpreterRuntime::at_safepoint)); Looking at the code generated for the existing safepoint poll (`TemplateInterpreterGenerator::generate_safept_entry_for()`) I see we add a full memory barrier after the return from `InterpreterRuntime::at_safepoint()`. That would call for adding it here too although I don't see why we need that. The SafepointMechanism logic already executes the proper barriers after we process pending operations. Same thing for riscv. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15248#pullrequestreview-1583055387 PR Review Comment: https://git.openjdk.org/jdk/pull/15248#discussion_r1297495880 From jvernee at openjdk.org Thu Aug 17 17:15:36 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 17 Aug 2023 17:15:36 GMT Subject: RFR: 8314452: Explicitly indicate inlining success/failure in PrintInlining Message-ID: This patch proposes to add a `+` or `-` to messages produced by `PrintInlining`, to indicate whether inlining succeeded or failed. This makes it easier to find inlining failures in an inlining trace, without having to rely on the message to figure out whether inlining succeeded or failed. Looking at inlining failures is often useful for diagnosing the results of benchmarks, but it can be hard to find inlining failures in lengthy traces. A sample of what this looks like: +@ 0 java.lang.foreign.Arena::ofConfined (10 bytes) inline (hot) +@ 0 java.lang.Thread::currentThread (0 bytes) (intrinsic) +@ 3 jdk.internal.foreign.MemorySessionImpl::createConfined (9 bytes) inline (hot) +@ 5 jdk.internal.foreign.ConfinedSession:: (18 bytes) inline (hot) +@ 6 jdk.internal.foreign.ConfinedSession$ConfinedResourceList:: (5 bytes) inline (hot) +@ 1 jdk.internal.foreign.MemorySessionImpl$ResourceList:: (5 bytes) inline (hot) +@ 1 java.lang.Object:: (1 bytes) inline (hot) +@ 9 jdk.internal.foreign.MemorySessionImpl:: (20 bytes) inline (hot) +@ 1 java.lang.Object:: (1 bytes) inline (hot) +@ 6 jdk.internal.foreign.MemorySessionImpl::asArena (9 bytes) inline (hot) +@ 5 jdk.internal.foreign.MemorySessionImpl$1:: (10 bytes) inline (hot) +@ 6 java.lang.Object:: (1 bytes) inline (hot) -@ 8 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method Using `grep`/`sls` to find inlining failures: > Get-Content inlining_trace.txt | sls '-@' -@ 8 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method -@ 34 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method -@ 19 java.lang.invoke.MethodHandle::linkToNative(JJJL)D (0 bytes) native call -@ 95 java.lang.foreign.Arena::close (0 bytes) virtual call -@ 107 jdk.internal.foreign.MemorySessionImpl::release0 (0 bytes) virtual call -@ 14 jdk.internal.misc.Unsafe::freeMemory0 (0 bytes) native method Note on the implementation: I opted for an enum to indicate inlining success/failure. I was using `bool` first, but ran into issues in some cases because the 'message' pointer was being implicitly converted to `bool`, and since the message itself is optional (`nullptr` by default) this didn't result in compilation errors, but silently omitted the inlining message instead. Using an enum avoids that issue. It also makes the call site a little easier to read, since there are no more `true` and `false` literals. ------------- Commit messages: - polish - kind -> kind_of - Explicitly track inlining success/failure Changes: https://git.openjdk.org/jdk/pull/15315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15315&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314452 Stats: 51 lines in 10 files changed: 16 ins; 2 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/15315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15315/head:pull/15315 PR: https://git.openjdk.org/jdk/pull/15315 From duke at openjdk.org Thu Aug 17 17:27:29 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 17 Aug 2023 17:27:29 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22] In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 16:22:54 GMT, Vladimir Kozlov wrote: > Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. Hi Vladimir, Thank you for the suggestion! Currently, AVX512sort is doing well for Random, Repeated and Shuffle patterns of input data. The regressions are observed for Staggered (Wave) pattern of input data. Will investigate the regressions and adjust the implementations to address them. Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1682679067 From duke at openjdk.org Thu Aug 17 17:27:30 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 17 Aug 2023 17:27:30 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22] In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 17:23:01 GMT, Srinivas Vamsi Parasa wrote: >> Improvements are nice but it would not pay off if you have big regressions. >> I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. > >> Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. > > Hi Vladimir, > > Thank you for the suggestion! > Currently, AVX512sort is doing well for Random, Repeated and Shuffle patterns of input data. The regressions are observed for Staggered (Wave) pattern of input data. > Will investigate the regressions and adjust the implementations to address them. > > Thanks, > Vamsi > Hi @vamsi-parasa , If there are limitations to support this on windows kindly open a follow-up PR and add its link here. Hi Jatin, will open a follow-up PR for Windows and add a link soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1682680859 From shade at openjdk.org Thu Aug 17 17:34:27 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 17 Aug 2023 17:34:27 GMT Subject: RFR: 8314452: Explicitly indicate inlining success/failure in PrintInlining In-Reply-To: References: Message-ID: <-8U9iphRcMn7AM-y-mfg1XKv6hvDJK8ThAxAh2fqaKM=.88e215be-6952-4c90-9473-6f18dbc26a1c@github.com> On Wed, 16 Aug 2023 17:42:42 GMT, Jorn Vernee wrote: > This patch proposes to add a `+` or `-` to messages produced by `PrintInlining`, to indicate whether inlining succeeded or failed. This makes it easier to find inlining failures in an inlining trace, without having to rely on the message to figure out whether inlining succeeded or failed. Looking at inlining failures is often useful for diagnosing the results of benchmarks, but it can be hard to find inlining failures in lengthy traces. > > A sample of what this looks like: > > > +@ 0 java.lang.foreign.Arena::ofConfined (10 bytes) inline (hot) > +@ 0 java.lang.Thread::currentThread (0 bytes) (intrinsic) > +@ 3 jdk.internal.foreign.MemorySessionImpl::createConfined (9 bytes) inline (hot) > +@ 5 jdk.internal.foreign.ConfinedSession:: (18 bytes) inline (hot) > +@ 6 jdk.internal.foreign.ConfinedSession$ConfinedResourceList:: (5 bytes) inline (hot) > +@ 1 jdk.internal.foreign.MemorySessionImpl$ResourceList:: (5 bytes) inline (hot) > +@ 1 java.lang.Object:: (1 bytes) inline (hot) > +@ 9 jdk.internal.foreign.MemorySessionImpl:: (20 bytes) inline (hot) > +@ 1 java.lang.Object:: (1 bytes) inline (hot) > +@ 6 jdk.internal.foreign.MemorySessionImpl::asArena (9 bytes) inline (hot) > +@ 5 jdk.internal.foreign.MemorySessionImpl$1:: (10 bytes) inline (hot) > +@ 6 java.lang.Object:: (1 bytes) inline (hot) > -@ 8 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method > > > Using `grep`/`sls` to find inlining failures: > > >> Get-Content inlining_trace.txt | sls '-@' > -@ 8 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method > -@ 34 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method > -@ 19 java.lang.invoke.MethodHandle::linkToNative(JJJL)D (0 bytes) native call > -@ 95 java.lang.foreign.Arena::close (0 bytes) virtual call > ... The idea looks interesting, but I am not a fan of `+@` and `-@`. Yes, it makes convenient to grep, I can see that. We need to bikeshed this a bit :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15315#issuecomment-1682691300 From jvernee at openjdk.org Thu Aug 17 18:51:28 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 17 Aug 2023 18:51:28 GMT Subject: RFR: 8314452: Explicitly indicate inlining success/failure in PrintInlining In-Reply-To: <-8U9iphRcMn7AM-y-mfg1XKv6hvDJK8ThAxAh2fqaKM=.88e215be-6952-4c90-9473-6f18dbc26a1c@github.com> References: <-8U9iphRcMn7AM-y-mfg1XKv6hvDJK8ThAxAh2fqaKM=.88e215be-6952-4c90-9473-6f18dbc26a1c@github.com> Message-ID: <7hM_gRY3d7OcOOFDb82H3oOWHcOmmJau_3q8QSCOj9c=.1cd9f6a9-3062-480e-aee8-7c0c00949ac1@github.com> On Thu, 17 Aug 2023 17:32:09 GMT, Aleksey Shipilev wrote: > The idea looks interesting, but I am not a fan of `+@` and `-@`. Yes, it makes convenient to grep, I can see that. We need to bikeshed this a bit :) Yes... I wanted something that is easy to grep/crtrl+f for. Another option would be to add another column to the left of the inlining messages, similar to the one we have for ex handler/synchronize/has monitors. Then it's possible to search for just -/+, but I'm not sure if the rest of the message doesn't contain and `-` as hyphens for example. `-@` seems like it would be a pretty unique pattern on the other hand. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15315#issuecomment-1682792782 From duke at openjdk.org Thu Aug 17 19:57:31 2023 From: duke at openjdk.org (iaroslavski) Date: Thu, 17 Aug 2023 19:57:31 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22] In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 19:17:48 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> An explanation for the path chosen in the PR to bring in the SIMD Arrays.sort at the top level instead of only bringing in the smaller components from the algorithm is as follows: the key components of Arrays.sort are pivot selection, partitioning, partition sort. Among these, the two hottest components are partitioning and partition sort. Both could be individually accelerated using SIMD implementations. However, what we noticed was that just bringing in these two individual optimizations gave us half the performance gain versus bringing in the entire AVX512 SIMD sort. AVX512 SIMD sort implements a single-pivot quicksort algorithm (SPQS) by selecting a single pivot and then recursively partitioning the array into two smaller partitions using SIMD instructions. When the partition size becomes less than or equal to 128, it uses a SIMD bitonic sort using x86 AVX512 intrinsics to sort that partition. However, the default implementation of Arrays.sort() in Java is the dual pivot quick sort (DPQS) not the SPQS. If the partitioning in the DPQS is implemented using AVX512, it needs two passes of the single-pivot AVX512 partitioning function (instead of just one in the case of SPQS), thereby leading to loss of 50% performance. >> >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> ... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Fix preservation of NaNs for floats and doubles Hi Vamsi, You're right, the regressions are observed for STAGGER data - almost sorted data. Merging sort is applied on these arrays, which is several times faster than quicksort. It means the best way is to reuse the idea of merging sort, see DualPivotQuicksort class (take the latest version from the Laurent's PR https://github.com/openjdk/jdk/pull/13568). Pease be aware that merging sort from DualPivotQuicksort class is not Merge sort, see details in the class. What if you port code from Java to C++ except sorting of small arrays? Even more, you can try to port the code to C++ as it is (first version) and then try the second version - sorting of small arrays with Bitonic sorting network (as you did) + sorting of other data as it was done in DualPivotQuicksort. Also pleasee add new type of array to JMH class - Already Sorted (ascending and descending orders) and check how all implementations work. It would be nice to see benchmarking of both versions and compare with existing one. What do you think? Best regards, Vladimir (Yaroslavskiy) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1682888288 From iklam at openjdk.org Thu Aug 17 22:54:35 2023 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 17 Aug 2023 22:54:35 GMT Subject: RFR: 8314249: Refactor handling of invokedynamic in JVMCI ConstantPool In-Reply-To: References: <6pkU1viiXcWvXP5KwsD9-8HYEy_SoPkR6-Ea4l9OWhs=.dba83e93-e1e8-4778-b2af-f5a1b7792a73@github.com> Message-ID: On Wed, 16 Aug 2023 07:14:33 GMT, Doug Simon wrote: >> This PR is part of the clean up JVMCI to track [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993), where the constant pool cache is being removed (as of now, only method references use the CpCache). >> >> 1. `rawIndexToConstantPoolIndex()` is used only for the `invokedynamic` bytecode. It should be renamed to `indyIndexConstantPoolIndex()` >> >> 2. `rawIndexToConstantPoolCacheIndex()` should not be called for the `invokedynamic` bytecode, which doesn't use cpCache entries after [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995). >> >> 3. Some `cpi` parameters should be renamed to `rawIndex` or `which` >> >> 4. Added a test case for `ConstantPool.lookupAppendix()`, which was not tested in the JDK repo. >> >> I added comments about the 4 types of indices used in HotSpotConstantPool.java: `cpi`, `rawIndex`, `cpci` and `which`. The latter two types will be removed after [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993) is complete. >> >> Note that there are still some incorrect use of `cpi` in the implementation and test cases. Those will be cleaned up in [JDK-8314172](https://bugs.openjdk.org/browse/JDK-8314172) > > Thanks a lot for this cleanup and adding the extra tests. Thanks @dougxc @coleenp for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15297#issuecomment-1683078290 From iklam at openjdk.org Thu Aug 17 22:54:36 2023 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 17 Aug 2023 22:54:36 GMT Subject: Integrated: 8314249: Refactor handling of invokedynamic in JVMCI ConstantPool In-Reply-To: <6pkU1viiXcWvXP5KwsD9-8HYEy_SoPkR6-Ea4l9OWhs=.dba83e93-e1e8-4778-b2af-f5a1b7792a73@github.com> References: <6pkU1viiXcWvXP5KwsD9-8HYEy_SoPkR6-Ea4l9OWhs=.dba83e93-e1e8-4778-b2af-f5a1b7792a73@github.com> Message-ID: On Tue, 15 Aug 2023 20:03:20 GMT, Ioi Lam wrote: > This PR is part of the clean up JVMCI to track [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993), where the constant pool cache is being removed (as of now, only method references use the CpCache). > > 1. `rawIndexToConstantPoolIndex()` is used only for the `invokedynamic` bytecode. It should be renamed to `indyIndexConstantPoolIndex()` > > 2. `rawIndexToConstantPoolCacheIndex()` should not be called for the `invokedynamic` bytecode, which doesn't use cpCache entries after [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995). > > 3. Some `cpi` parameters should be renamed to `rawIndex` or `which` > > 4. Added a test case for `ConstantPool.lookupAppendix()`, which was not tested in the JDK repo. > > I added comments about the 4 types of indices used in HotSpotConstantPool.java: `cpi`, `rawIndex`, `cpci` and `which`. The latter two types will be removed after [JDK-8301993](https://bugs.openjdk.org/browse/JDK-8301993) is complete. > > Note that there are still some incorrect use of `cpi` in the implementation and test cases. Those will be cleaned up in [JDK-8314172](https://bugs.openjdk.org/browse/JDK-8314172) This pull request has now been integrated. Changeset: 0299364d Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/0299364d85a66c35e616148cbbde314b7d4fb05a Stats: 189 lines in 6 files changed: 121 ins; 23 del; 45 mod 8314249: Refactor handling of invokedynamic in JVMCI ConstantPool Reviewed-by: dnsimon, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/15297 From duke at openjdk.org Fri Aug 18 02:27:21 2023 From: duke at openjdk.org (Kimura Yukihiro) Date: Fri, 18 Aug 2023 02:27:21 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v2] In-Reply-To: References: Message-ID: > I would like to fix this issue because it is difficult for testers to understand why the test failed. > There is no risk as I just added an assertion message instead of exit code error. > I would appreciate it if someone could review the fix. Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15329/files - new: https://git.openjdk.org/jdk/pull/15329/files/01082f51..24874b1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15329&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15329&range=00-01 Stats: 7 lines in 1 file changed: 5 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15329/head:pull/15329 PR: https://git.openjdk.org/jdk/pull/15329 From duke at openjdk.org Fri Aug 18 02:27:22 2023 From: duke at openjdk.org (Kimura Yukihiro) Date: Fri, 18 Aug 2023 02:27:22 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 10:48:49 GMT, Kimura Yukihiro wrote: > I would like to fix this issue because it is difficult for testers to understand why the test failed. > There is no risk as I just added an assertion message instead of exit code error. > I would appreciate it if someone could review the fix. Hello everyone, Thank you for the review and the comments. I modified the test code. Could you please review the fix? Thanks, Kimura Yukihiro ------------- PR Comment: https://git.openjdk.org/jdk/pull/15329#issuecomment-1683237376 From fyang at openjdk.org Fri Aug 18 07:32:31 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 18 Aug 2023 07:32:31 GMT Subject: RFR: 8313419: Template interpreter produces no safepoint check for return bytecodes In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 13:49:55 GMT, Fredrik Bredberg wrote: > I've done basic testing on riscv64 and arm32 using Qemu, but would appreciate if @RealFYang and @bulasevich could take it for a real test drive. Hi, this has passed tier1-3 and hotspot:tier4 tests on linux-riscv64 platform. Hope that helps. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15248#issuecomment-1683487066 From duke at openjdk.org Fri Aug 18 10:19:13 2023 From: duke at openjdk.org (Kimura Yukihiro) Date: Fri, 18 Aug 2023 10:19:13 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v3] In-Reply-To: References: Message-ID: > I would like to fix this issue because it is difficult for testers to understand why the test failed. > There is no risk as I just added an assertion message instead of exit code error. > I would appreciate it if someone could review the fix. Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15329/files - new: https://git.openjdk.org/jdk/pull/15329/files/24874b1b..6b958a95 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15329&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15329&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15329/head:pull/15329 PR: https://git.openjdk.org/jdk/pull/15329 From duke at openjdk.org Fri Aug 18 10:26:26 2023 From: duke at openjdk.org (Kimura Yukihiro) Date: Fri, 18 Aug 2023 10:26:26 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v3] In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 10:19:13 GMT, Kimura Yukihiro wrote: >> I would like to fix this issue because it is difficult for testers to understand why the test failed. >> There is no risk as I just added an assertion message instead of exit code error. >> I would appreciate it if someone could review the fix. > > Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: > > 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError I fixed the code again, because the message is printed to stderr in the case of JBS. The message is output to stdout if VM has not initialized, but this test always throws VirtualMachineError and message is printed to stderr, if adapters could not be allocated. if (!is_init_completed()) { // Don't throw exceptions during VM initialization because java.lang.* classes // might not have been initialized, causing problems when constructing the // Java exception object. vm_exit_during_initialization("Out of space in CodeCache for adapters"); } else { THROW_MSG_NULL(vmSymbols::java_lang_VirtualMachineError(), "Out of space in CodeCache for adapters"); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/15329#issuecomment-1683700133 From thartmann at openjdk.org Fri Aug 18 10:31:27 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 18 Aug 2023 10:31:27 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v3] In-Reply-To: References: Message-ID: <-JbzqehKWFsHYKlWfSAMSCvYl2znLnBtT-lRNwmsZnU=.96ce6bdf-805c-4284-84ce-3c5b34fb730a@github.com> On Fri, 18 Aug 2023 10:19:13 GMT, Kimura Yukihiro wrote: >> I would like to fix this issue because it is difficult for testers to understand why the test failed. >> There is no risk as I just added an assertion message instead of exit code error. >> I would appreciate it if someone could review the fix. > > Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: > > 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java line 63: > 61: // Ignore adapter creation failures > 62: if (!oa.getStderr().contains("Out of space in CodeCache for adapters")) { > 63: throw new Exception("VM finished with exit code " + oa.getExitValue()); Please use `RuntimeException` like the code below does. I think you could also merge the two ifs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15329#discussion_r1298293757 From fbredberg at openjdk.org Fri Aug 18 12:14:30 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 18 Aug 2023 12:14:30 GMT Subject: RFR: 8313419: Template interpreter produces no safepoint check for return bytecodes In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 16:57:17 GMT, Patricio Chilano Mateo wrote: >> The template interpreter produces a safepoint check for return bytecodes (TemplateTable::_return(TosState state)) on x86, ppc64le and s390, but not on aarch64, arm32, and riscv64. >> >> This PR adds the missing safepoint check to aarch64, arm32, and riscv64. >> >> Tested tier1-tier7 on aarch64. Both arm32, and riscv64 was sanity tested using Qemu. > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2206: > >> 2204: __ push(state); >> 2205: __ push_cont_fastpath(rthread); >> 2206: __ call_VM(noreg, CAST_FROM_FN_PTR(address, InterpreterRuntime::at_safepoint)); > > Looking at the code generated for the existing safepoint poll (`TemplateInterpreterGenerator::generate_safept_entry_for()`) I see we add a full memory barrier after the return from `InterpreterRuntime::at_safepoint()`. That would call for adding it here too although I don't see why we need that. The SafepointMechanism logic already executes the proper barriers after we process pending operations. Same thing for riscv. That's an interesting find. I had a discussion with Erik (@fisk), and as you he didn't see any reason why it should be needed. He also very much doubted the need for the membar in `TemplateInterpreterGenerator::generate_safept_entry_for()`. In my view x86 is quite forgiving if you forget to add a membar, but PowerPC tend not to be. Since there's no membar in `generate_safept_entry_for()` on PowerPC and it still works ok, it does seem like it's not needed. So, for this reason I will not add any additional membar. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15248#discussion_r1298378973 From fbredberg at openjdk.org Fri Aug 18 12:34:26 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 18 Aug 2023 12:34:26 GMT Subject: RFR: 8313419: Template interpreter produces no safepoint check for return bytecodes In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 13:22:19 GMT, Fredrik Bredberg wrote: > The template interpreter produces a safepoint check for return bytecodes (TemplateTable::_return(TosState state)) on x86, ppc64le and s390, but not on aarch64, arm32, and riscv64. > > This PR adds the missing safepoint check to aarch64, arm32, and riscv64. > > Tested tier1-tier7 on aarch64. Both arm32, and riscv64 was sanity tested using Qemu. Thank you guys for review comments, and the help with testing. If no one else has anything to add, I'll integrate (as soon as I can convince a sponsor). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15248#issuecomment-1683852442 From duke at openjdk.org Fri Aug 18 13:08:28 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Fri, 18 Aug 2023 13:08:28 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v2] In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 06:49:26 GMT, Fei Yang wrote: >> Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: >> >> Change fsgnj_d(dst, src, src) to fmv_d(dst, src) > > src/hotspot/cpu/riscv/riscv.ad line 7706: > >> 7704: match(Set dst (RoundDoubleMode src rmode)); >> 7705: ins_cost(2 * XFER_COST + BRANCH_COST); >> 7706: effect(TEMP_DEF dst, TEMP tmp1, TEMP tmp2, TEMP tmp3, KILL cr); > > Do we kill `cr` anywhere in the assembly code? According to documentation we have situations when convert instruction can set an error flag in the status register: > All floating-point conversion instructions set the Inexact exception flag if the rounded result differs from the operand value and the Invalid exception flag is not set. [1] [1] https://five-embeddev.com/riscv-isa-manual/latest/f.html#single-precision-floating-point-conversion-and-move-instructions ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1298430830 From tholenstein at openjdk.org Fri Aug 18 13:40:06 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 18 Aug 2023 13:40:06 GMT Subject: RFR: JDK-8313626: C2 crash due to unexpected exception control flow Message-ID: # Problem The following JASM code: static Method test1:"()V" stack 1 { try t; invokestatic m:"()V"; return; catch t java/lang/Throwable; stack_map class java/lang/Throwable; athrow; endtry t; } produces this java bytecode static void m(); Code: 0: return static void test1(); Code: 0: invokestatic #4 // Method m:()V 3: return 4: athrow Exception table: from to target type 0 5 4 Class java/lang/Throwable from https://docs.oracle.com/javase/specs/jvms/se20/jvms20.pdf _exception_table[] (p.116)_ > The values of the two items start_pc and end_pc indicate the ranges in the code array at which the exception handler is active. The value of start_pc must be a valid index into the code array of the opcode of an instruction. The value of end_pc either must be a valid index into the code array of the opcode of an instruction or must be equal to code_length, the length of the code array. The value of start_pc must be less than the value of end_pc. > The start_pc is inclusive and end_pc is exclusive; that is, the exception handler must be active while the program counter is within the interval [start_pc, end_pc). > > handler_pc > The value of the handler_pc item indicates the start of the exception handler. The value of the item must be a valid index into the code array and must be the index of the opcode of an instruction. and from _?athrow (p.420)_ > The objectref must be of type reference and must refer to an object that is an instance of class Throwable or of a subclass of Throwable. It is popped from the operand stack. The objectref is then thrown by searching the current method (?2.6) for the first exception handler that matches the class of objectref, as given by the algorithm in ?2.10. > If an exception handler that matches objectref is found, it contains the location of the code intended to handle this exception. The pc register is reset to that location, the operand stack of the current frame is cleared, objectref is pushed back onto the operand stack, and execution continues. In out case: **[start_pc=0, end_pc=5)** and **handler_pc=4** and **objectref=Class java/lang/Throwable** By this definition we have indeed valid bytecode for `test1()`. Therefore we would expect C2 to create an infinite loop for 4: athrow The C2 graph indeed shows an infinite loop 92/81: graph1 During IGVN the graph degenerates: 1) graph2 2) graph3 3) graph4 And in the end we get an ` assert(false) failed: malformed control flow` # Solution We usually have a safepoint in infinite loops. The edge case that an exception can cause an infinite loop was not handled. With normal Java it is not possible to create such in infinite loop with try-catch, but with Jasm/bytecode it is allowed. Fix: By adding a safepoint to the backedge safepoint1 we prevent the infinite loop from being removed during IGVN safepoint2 ### Testing We also found some other test cases that are very similar; `test2` is similar to `test1`. The endless loop is from/to `5: athrow` static void test2(); Code: 0: invokestatic #6 // Method m:()V 3: return 4: return 5: athrow Exception table: from to target type 0 3 4 Class java/lang/Exception 0 6 5 Class java/lang/Throwable - in `test3` and `test4` ` th()` gets inlined `athrow` has then a backedge to `new` that creates an infinite loop and is missing a safepoint public static void th() throws java.lang.Exception; Code: 0: new #9 // class java/lang/Throwable 3: dup 4: invokespecial #3 // Method java/lang/Throwable."":()V 7: athrow static void test3(); Code: 0: invokestatic #6 // Method m:()V 3: iconst_1 4: istore_0 5: iconst_0 6: istore_1 7: return 8: invokestatic #4 // Method th:()V 11: return Exception table: from to target type 0 12 8 Class java/lang/Throwable static void test4(); Code: 0: invokestatic #6 // Method m:()V 3: iconst_1 4: istore_0 5: iconst_0 6: istore_1 7: return 8: iconst_1 9: istore_0 10: invokestatic #4 // Method th:()V 13: return Exception table: from to target type 0 14 8 Class java/lang/Throwable ------------- Commit messages: - Update TestMissingSafepointOnTryCatch - JDK-8313626: C2 crash due to unexpected exception control flow Changes: https://git.openjdk.org/jdk/pull/15292/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15292&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313626 Stats: 186 lines in 3 files changed: 186 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15292.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15292/head:pull/15292 PR: https://git.openjdk.org/jdk/pull/15292 From tholenstein at openjdk.org Fri Aug 18 13:49:59 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 18 Aug 2023 13:49:59 GMT Subject: RFR: JDK-8313626: C2 crash due to unexpected exception control flow [v2] In-Reply-To: References: Message-ID: > # Problem > The following JASM code: > > static Method test1:"()V" stack 1 { > try t; > invokestatic m:"()V"; > return; > > catch t java/lang/Throwable; > stack_map class java/lang/Throwable; > athrow; > endtry t; > } > > produces this java bytecode > > static void m(); > Code: > 0: return > > static void test1(); > Code: > 0: invokestatic #4 // Method m:()V > 3: return > 4: athrow > Exception table: > from to target type > 0 5 4 Class java/lang/Throwable > > > from https://docs.oracle.com/javase/specs/jvms/se20/jvms20.pdf _exception_table[] (p.116)_ > >> The values of the two items start_pc and end_pc indicate the ranges in the code array at which the exception handler is active. The value of start_pc must be a valid index into the code array of the opcode of an instruction. The value of end_pc either must be a valid index into the code array of the opcode of an instruction or must be equal to code_length, the length of the code array. The value of start_pc must be less than the value of end_pc. >> The start_pc is inclusive and end_pc is exclusive; that is, the exception handler must be active while the program counter is within the interval [start_pc, end_pc). >> >> handler_pc >> The value of the handler_pc item indicates the start of the exception handler. The value of the item must be a valid index into the code array and must be the index of the opcode of an instruction. > > and from _?athrow (p.420)_ > >> The objectref must be of type reference and must refer to an object that is an instance of class Throwable or of a subclass of Throwable. It is popped from the operand stack. The objectref is then thrown by searching the current method (?2.6) for the first exception handler that matches the class of objectref, as given by the algorithm in ?2.10. >> If an exception handler that matches objectref is found, it contains the location of the code intended to handle this exception. The pc register is reset to that location, the operand stack of the current frame is cleared, objectref is pushed back onto the operand stack, and execution continues. > > In out case: **[start_pc=0, end_pc=5)** and **handler_pc=4** and **objectref=Class java/lang/Throwable** > > By this definition we have indeed valid bytecode for `test1()`. Therefore we would expect C2 to create an infinite loop for > > 4: athrow > > > The C2 graph indeed shows an infinite loop 92/81: > graph1> try t; >> invokestatic m:"()V"; >> return; >> >> catch t java/lang/Throwable; >> stack_map class java/lang/Throwable; >> athrow; >> endtry t; >> } >> >> produces this java bytecode >> >> static void m(); >> Code: >> 0: return >> >> static void test1(); >> Code: >> 0: invokestatic #4 // Method m:()V >> 3: return >> 4: athrow >> Exception table: >> from to target type >> 0 5 4 Class java/lang/Throwable >> >> >> from https://docs.oracle.com/javase/specs/jvms/se20/jvms20.pdf _exception_table[] (p.116)_ >> >>> The values of the two items start_pc and end_pc indicate the ranges in the code array at which the exception handler is active. The value of start_pc must be a valid index into the code array of the opcode of an instruction. The value of end_pc either must be a valid index into the code array of the opcode of an instruction or must be equal to code_length, the length of the code array. The value of start_pc must be less than the value of end_pc. >>> The start_pc is inclusive and end_pc is exclusive; that is, the exception handler must be active while the program counter is within the interval [start_pc, end_pc). >>> >>> handler_pc >>> The value of the handler_pc item indicates the start of the exception handler. The value of the item must be a valid index into the code array and must be the index of the opcode of an instruction. >> >> and from _?athrow (p.420)_ >> >>> The objectref must be of type reference and must refer to an object that is an instance of class Throwable or of a subclass of Throwable. It is popped from the operand stack. The objectref is then thrown by searching the current method (?2.6) for the first exception handler that matches the class of objectref, as given by the algorithm in ?2.10. >>> If an exception handler that matches objectref is found, it contains the location of the code intended to handle this exception. The pc register is reset to that location, the operand stack of the current frame is cleared, objectref is pushed back onto the operand stack, and execution continues. >> >> In out case: **[start_pc=0, end_pc=5)** and **handler_pc=4** and **objectref=Class java/lang/Throwable** >> >> By this definition we have indeed valid bytecode for `test1()`. Therefore we would expect C2 to create an infinite loop for >> >> 4: athrow >> >> >> The C2 graph indeed shows a... > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - remove newlines > - remove newlines Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15292#pullrequestreview-1584618203 From tholenstein at openjdk.org Fri Aug 18 14:06:38 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 18 Aug 2023 14:06:38 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v5] In-Reply-To: <7RAXRCktzo9NArTyV_NBwxxLl7zgCaxurRLwwwKzeAM=.91d9abbe-12e0-43af-8e5c-b13052629a56@github.com> References: <7RAXRCktzo9NArTyV_NBwxxLl7zgCaxurRLwwwKzeAM=.91d9abbe-12e0-43af-8e5c-b13052629a56@github.com> Message-ID: On Thu, 17 Aug 2023 14:17:24 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with one additional commit since the last revision: > > fixing ws error Look good to me besides a few minor comments src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 131: > 129: * been inserted at that layer > 130: * > 131: * @param newNode The `newNode` is not needed, right? src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1255: > 1253: found = false; > 1254: > 1255: if (n.vertex == null && n.succs.size() <= 1 && n.preds.size() <= 1) { I think `n.vertex == null` is always true and can be omitted src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1543: > 1541: private boolean onSegment(Point p, Point q, Point r) { > 1542: if (q.x <= Math.max(p.x, r.x) && q.x >= Math.min(p.x, r.x) && > 1543: q.y <= Math.max(p.y, r.y) && q.y >= Math.min(p.y, r.y)) you can directly `return (q.x <= Math.max(p.x, r.x) && q.x >= Math.min(p.x, r.x) && q.y <= Math.max(p.y, r.y) && q.y >= Math.min(p.y, r.y))` src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1561: > 1559: > 1560: if (val == 0) > 1561: return 0; // collinear We usually use {} even for one line blocks src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1586: > 1584: // General case > 1585: if (o1 != o2 && o3 != o4) > 1586: return true; We usually use {} even for one line blocks src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1711: > 1709: * average > 1710: * > 1711: * @return I would leave away the `@return` here ------------- PR Review: https://git.openjdk.org/jdk/pull/14349#pullrequestreview-1584618859 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1298481362 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1298486238 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1298489193 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1298489669 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1298489797 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1298492081 From chagedorn at openjdk.org Fri Aug 18 14:10:29 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 18 Aug 2023 14:10:29 GMT Subject: RFR: JDK-8313626: C2 crash due to unexpected exception control flow [v2] In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 13:49:59 GMT, Tobias Holenstein wrote: >> # Problem >> The following JASM code: >> >> static Method test1:"()V" stack 1 { >> try t; >> invokestatic m:"()V"; >> return; >> >> catch t java/lang/Throwable; >> stack_map class java/lang/Throwable; >> athrow; >> endtry t; >> } >> >> produces this java bytecode >> >> static void m(); >> Code: >> 0: return >> >> static void test1(); >> Code: >> 0: invokestatic #4 // Method m:()V >> 3: return >> 4: athrow >> Exception table: >> from to target type >> 0 5 4 Class java/lang/Throwable >> >> >> from https://docs.oracle.com/javase/specs/jvms/se20/jvms20.pdf _exception_table[] (p.116)_ >> >>> The values of the two items start_pc and end_pc indicate the ranges in the code array at which the exception handler is active. The value of start_pc must be a valid index into the code array of the opcode of an instruction. The value of end_pc either must be a valid index into the code array of the opcode of an instruction or must be equal to code_length, the length of the code array. The value of start_pc must be less than the value of end_pc. >>> The start_pc is inclusive and end_pc is exclusive; that is, the exception handler must be active while the program counter is within the interval [start_pc, end_pc). >>> >>> handler_pc >>> The value of the handler_pc item indicates the start of the exception handler. The value of the item must be a valid index into the code array and must be the index of the opcode of an instruction. >> >> and from _?athrow (p.420)_ >> >>> The objectref must be of type reference and must refer to an object that is an instance of class Throwable or of a subclass of Throwable. It is popped from the operand stack. The objectref is then thrown by searching the current method (?2.6) for the first exception handler that matches the class of objectref, as given by the algorithm in ?2.10. >>> If an exception handler that matches objectref is found, it contains the location of the code intended to handle this exception. The pc register is reset to that location, the operand stack of the current frame is cleared, objectref is pushed back onto the operand stack, and execution continues. >> >> In out case: **[start_pc=0, end_pc=5)** and **handler_pc=4** and **objectref=Class java/lang/Throwable** >> >> By this definition we have indeed valid bytecode for `test1()`. Therefore we would expect C2 to create an infinite loop for >> >> 4: athrow >> >> >> The C2 graph indeed shows a... > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - remove newlines > - remove newlines That looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15292#pullrequestreview-1584644083 From duke at openjdk.org Fri Aug 18 14:12:51 2023 From: duke at openjdk.org (Kimura Yukihiro) Date: Fri, 18 Aug 2023 14:12:51 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v4] In-Reply-To: References: Message-ID: > I would like to fix this issue because it is difficult for testers to understand why the test failed. > There is no risk as I just added an assertion message instead of exit code error. > I would appreciate it if someone could review the fix. Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15329/files - new: https://git.openjdk.org/jdk/pull/15329/files/6b958a95..2e631363 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15329&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15329&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15329/head:pull/15329 PR: https://git.openjdk.org/jdk/pull/15329 From duke at openjdk.org Fri Aug 18 14:12:51 2023 From: duke at openjdk.org (Kimura Yukihiro) Date: Fri, 18 Aug 2023 14:12:51 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v3] In-Reply-To: <-JbzqehKWFsHYKlWfSAMSCvYl2znLnBtT-lRNwmsZnU=.96ce6bdf-805c-4284-84ce-3c5b34fb730a@github.com> References: <-JbzqehKWFsHYKlWfSAMSCvYl2znLnBtT-lRNwmsZnU=.96ce6bdf-805c-4284-84ce-3c5b34fb730a@github.com> Message-ID: <5aqMHvMv4slaMjFNnR_YvPB9pUZUvA_LeDv5dHdT3yQ=.938ebee4-970e-49f7-b1c3-88147bd701a9@github.com> On Fri, 18 Aug 2023 10:29:07 GMT, Tobias Hartmann wrote: >> Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: >> >> 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError > > test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java line 63: > >> 61: // Ignore adapter creation failures >> 62: if (!oa.getStderr().contains("Out of space in CodeCache for adapters")) { >> 63: throw new Exception("VM finished with exit code " + oa.getExitValue()); > > Please use `RuntimeException` like the code below does. I think you could also merge the two ifs. Hello Tobias, Thank you for pointing it out. I have modified some points you had mentioned. Thanks, Kimura Yukihiro ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15329#discussion_r1298497653 From thartmann at openjdk.org Fri Aug 18 14:18:28 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 18 Aug 2023 14:18:28 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v4] In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 14:12:51 GMT, Kimura Yukihiro wrote: >> I would like to fix this issue because it is difficult for testers to understand why the test failed. >> There is no risk as I just added an assertion message instead of exit code error. >> I would appreciate it if someone could review the fix. > > Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: > > 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError Thanks, the latest version looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15329#pullrequestreview-1584658763 From fbredberg at openjdk.org Fri Aug 18 14:36:37 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 18 Aug 2023 14:36:37 GMT Subject: Integrated: 8313419: Template interpreter produces no safepoint check for return bytecodes In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 13:22:19 GMT, Fredrik Bredberg wrote: > The template interpreter produces a safepoint check for return bytecodes (TemplateTable::_return(TosState state)) on x86, ppc64le and s390, but not on aarch64, arm32, and riscv64. > > This PR adds the missing safepoint check to aarch64, arm32, and riscv64. > > Tested tier1-tier7 on aarch64. Both arm32, and riscv64 was sanity tested using Qemu. This pull request has now been integrated. Changeset: bcba5e97 Author: Fredrik Bredberg Committer: Patricio Chilano Mateo URL: https://git.openjdk.org/jdk/commit/bcba5e97857fd57ea4571341ad40194bb823cd0b Stats: 35 lines in 3 files changed: 35 ins; 0 del; 0 mod 8313419: Template interpreter produces no safepoint check for return bytecodes Reviewed-by: pchilanomate ------------- PR: https://git.openjdk.org/jdk/pull/15248 From duke at openjdk.org Fri Aug 18 15:47:32 2023 From: duke at openjdk.org (emmyyin) Date: Fri, 18 Aug 2023 15:47:32 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v5] In-Reply-To: References: <7RAXRCktzo9NArTyV_NBwxxLl7zgCaxurRLwwwKzeAM=.91d9abbe-12e0-43af-8e5c-b13052629a56@github.com> Message-ID: On Fri, 18 Aug 2023 13:57:41 GMT, Tobias Holenstein wrote: >> emmyyin has updated the pull request incrementally with one additional commit since the last revision: >> >> fixing ws error > > src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1255: > >> 1253: found = false; >> 1254: >> 1255: if (n.vertex == null && n.succs.size() <= 1 && n.preds.size() <= 1) { > > I think `n.vertex == null` is always true and can be omitted This is to make the loop break once we hit the non-dummy node where the edge goes from. I.e. for the edge (u,v) with lots of dummy nodes in between nodes u and v, we only want to remove the dummy nodes and then break the loop as soon as we are at node u. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1298602092 From duke at openjdk.org Fri Aug 18 16:00:12 2023 From: duke at openjdk.org (emmyyin) Date: Fri, 18 Aug 2023 16:00:12 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v6] In-Reply-To: References: Message-ID: <5vxaxCfnd8iSTg8EO0_RlMC5WVze1xC0ETKephzR2L4=.29e8bdee-4d7e-42d9-9e67-31911b5c23d2@github.com> > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with one additional commit since the last revision: Fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/9d803146..900ac074 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=04-05 Stats: 13 lines in 1 file changed: 2 ins; 6 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From duke at openjdk.org Fri Aug 18 17:26:18 2023 From: duke at openjdk.org (emmyyin) Date: Fri, 18 Aug 2023 17:26:18 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: References: Message-ID: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with one additional commit since the last revision: fixing trailing ws ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/900ac074..97036439 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From dlong at openjdk.org Fri Aug 18 19:39:35 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 18 Aug 2023 19:39:35 GMT Subject: RFR: JDK-8313626: C2 crash due to unexpected exception control flow [v2] In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 13:49:59 GMT, Tobias Holenstein wrote: >> # Problem >> The following JASM code: >> >> static Method test1:"()V" stack 1 { >> try t; >> invokestatic m:"()V"; >> return; >> >> catch t java/lang/Throwable; >> stack_map class java/lang/Throwable; >> athrow; >> endtry t; >> } >> >> produces this java bytecode >> >> static void m(); >> Code: >> 0: return >> >> static void test1(); >> Code: >> 0: invokestatic #4 // Method m:()V >> 3: return >> 4: athrow >> Exception table: >> from to target type >> 0 5 4 Class java/lang/Throwable >> >> >> from https://docs.oracle.com/javase/specs/jvms/se20/jvms20.pdf _exception_table[] (p.116)_ >> >>> The values of the two items start_pc and end_pc indicate the ranges in the code array at which the exception handler is active. The value of start_pc must be a valid index into the code array of the opcode of an instruction. The value of end_pc either must be a valid index into the code array of the opcode of an instruction or must be equal to code_length, the length of the code array. The value of start_pc must be less than the value of end_pc. >>> The start_pc is inclusive and end_pc is exclusive; that is, the exception handler must be active while the program counter is within the interval [start_pc, end_pc). >>> >>> handler_pc >>> The value of the handler_pc item indicates the start of the exception handler. The value of the item must be a valid index into the code array and must be the index of the opcode of an instruction. >> >> and from _?athrow (p.420)_ >> >>> The objectref must be of type reference and must refer to an object that is an instance of class Throwable or of a subclass of Throwable. It is popped from the operand stack. The objectref is then thrown by searching the current method (?2.6) for the first exception handler that matches the class of objectref, as given by the algorithm in ?2.10. >>> If an exception handler that matches objectref is found, it contains the location of the code intended to handle this exception. The pc register is reset to that location, the operand stack of the current frame is cleared, objectref is pushed back onto the operand stack, and execution continues. >> >> In out case: **[start_pc=0, end_pc=5)** and **handler_pc=4** and **objectref=Class java/lang/Throwable** >> >> By this definition we have indeed valid bytecode for `test1()`. Therefore we would expect C2 to create an infinite loop for >> >> 4: athrow >> >> >> The C2 graph indeed shows a... > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - remove newlines > - remove newlines Can you add a JASM test that doesn't put the exception handler in unreachable code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15292#issuecomment-1684357437 From mdoerr at openjdk.org Fri Aug 18 20:24:44 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 18 Aug 2023 20:24:44 GMT Subject: RFR: 8299658: C1 compilation crashes in LinearScan::resolve_exception_edge Message-ID: This is a quick fix for the C1 problem described in the JBS issue. When we find an illegal operand (modelled by nullptr) while resolving an exception edge we can propagate this state to the phi function and skip the edge. If somebody finds a better way to propagate the "illegal" state to the phi function, I can change or close this PR. Please review. A nice regression test would be a good thing, but probably not easy to write. ------------- Commit messages: - 8299658: C1 compilation crashes in LinearScan::resolve_exception_edge Changes: https://git.openjdk.org/jdk/pull/15348/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15348&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8299658 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15348.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15348/head:pull/15348 PR: https://git.openjdk.org/jdk/pull/15348 From qamai at openjdk.org Sat Aug 19 08:09:21 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 19 Aug 2023 08:09:21 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v17] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 56 commits: - Merge branch 'master' into unsignedDiv - refactor calculations - consolidate constant calculations - address comments - Merge branch 'master' into unsignedDiv - Merge branch 'master' into unsignedDiv - missing java_negate - Merge branch 'master' into unsignedDiv - whitespace - move asserts to use sites - ... and 46 more: https://git.openjdk.org/jdk/compare/58f5826f...c48d96be ------------- Changes: https://git.openjdk.org/jdk/pull/9947/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=16 Stats: 2392 lines in 13 files changed: 1750 ins; 464 del; 178 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From qamai at openjdk.org Sat Aug 19 08:22:23 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 19 Aug 2023 08:22:23 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v18] In-Reply-To: References: Message-ID: <2GLfWpwzObvaPasLEim5TPbClAGinuJHQOp89Vdgt10=.7f0899a4-dc91-4a5b-8d9a-9a0d8125c2f4@github.com> > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9947/files - new: https://git.openjdk.org/jdk/pull/9947/files/c48d96be..1ae865f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=16-17 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From qamai at openjdk.org Sat Aug 19 08:42:39 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 19 Aug 2023 08:42:39 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v15] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 11:00:15 GMT, Emanuel Peter wrote: >> @eme64 Thanks a lot for taking a look at this patch, I will address your remaining comments soon. >> >> The basic idea of the transformation in `javaArithmetic.hpp` is to find `M` and `s` such that `x / c = floor(x * M / 2**s)` for every interesting value of `x`. The remaining transformation in `divnode.cpp` is to convert this calculation from integer arithmetic to modular arithmetic. This is easy if the representative in the congruence class of an operand is always equal to itself, in which case we can do the calculation directly. For other cases, we have to do additional calculation to take into consideration the difference between arithmetic calculations in 2 domains. > > @merykitty I'm mostly out of the office until September 9 (FYI). > > It would be really cool if this made it in. I'm currently playing with `MethodHandles.constant`, and it is really easy to have "random" compile time constants. @eme64 Thanks a lot for your support and sorry for the delay, I came across [this article](https://arxiv.org/abs/2012.12369) which motivated me to come up with a more general and optimal algorithm. This also consolidates the magic constant calculation across different division operations. The underlying theory is given in the function `magic_divide_constant`, which I will restate here. Given positive integers `d <= N`, call `v` the largest nonnegative integer not larger than `N` such that `v + 1` is divisible by `d` then: For all nonnegative integers `c`, `m` such that: m <= c * d < m + m / v We have: floor(x / d) = floor(x * c / m) for all integers x in [0, N] (1) For all nonnegative integers `c`, `m` such that: m < c * d <= m + m / v We have: ceil(x / d) = floor(x * c / m) for all integers x in [-N, 0) (2) As a result, to calculate the constant `c`, `m` corresponding to a division `x / d` with `x` in `[lo, hi]`, we divide the dividend range into negative and nonnegative intervals `[lo, 0)` and `[0, hi]`. Then, call `v_neg` the largest integer not larger than `-lo` such that `v_neg + 1` is divisible by `d`, and `v_pos` the largest integer not larger than `hi` such that `v_pos + 1` is divisible by `d`. We then need to find constant `c`, `m` such that: m <= c * d < m + m / v_pos m < c * d <= m + m / v_neg This is applicable for both signed and unsigned types (with unsigned types we do not need to consider the negative range). Substitute `x = v, x = v - d + 1` into (1), as well as `x = -v, x = -v + d - 1` into (2) shows that these bounds are indeed optimal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1684894863 From qamai at openjdk.org Sat Aug 19 08:51:42 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 19 Aug 2023 08:51:42 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v18] In-Reply-To: <2GLfWpwzObvaPasLEim5TPbClAGinuJHQOp89Vdgt10=.7f0899a4-dc91-4a5b-8d9a-9a0d8125c2f4@github.com> References: <2GLfWpwzObvaPasLEim5TPbClAGinuJHQOp89Vdgt10=.7f0899a4-dc91-4a5b-8d9a-9a0d8125c2f4@github.com> Message-ID: On Sat, 19 Aug 2023 08:22:23 GMT, Quan Anh Mai wrote: >> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. >> >> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: >> >> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) >> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) >> >> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. >> >> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: >> >> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) >> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) >> >> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. >> >> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. >> >> More tests are added to cover the possible patterns. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > whitespace I am working with adding more division tests in the java side as you have suggested, I think checking the result with different value ranges of the dividend is necessary, I am thinking of using `Min` and `Max` nodes for this, but there is currently no intrinsics for `Math::min(long, long)` and `Math::max(long, long)`, do you think I should add those to have easier time working on testing this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1684896485 From qamai at openjdk.org Sat Aug 19 09:03:14 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 19 Aug 2023 09:03:14 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v19] In-Reply-To: References: Message-ID: <1LdwV-tLo31b5zfkCnxrP0G3C6CIFRqneBrfWuQyzT8=.116e0954-7519-4acf-9a2b-2e56566f2df3@github.com> > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: do not return old node ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9947/files - new: https://git.openjdk.org/jdk/pull/9947/files/1ae865f0..6f8eea31 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=17-18 Stats: 14 lines in 2 files changed: 7 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From qamai at openjdk.org Sat Aug 19 17:21:56 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 19 Aug 2023 17:21:56 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v20] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9947/files - new: https://git.openjdk.org/jdk/pull/9947/files/6f8eea31..07343562 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=18-19 Stats: 8 lines in 4 files changed: 1 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From qamai at openjdk.org Sat Aug 19 17:38:57 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 19 Aug 2023 17:38:57 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v21] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix types ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9947/files - new: https://git.openjdk.org/jdk/pull/9947/files/07343562..fac857c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=19-20 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From duke at openjdk.org Sun Aug 20 07:44:25 2023 From: duke at openjdk.org (Kimura Yukihiro) Date: Sun, 20 Aug 2023 07:44:25 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v4] In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 14:12:51 GMT, Kimura Yukihiro wrote: >> I would like to fix this issue because it is difficult for testers to understand why the test failed. >> There is no risk as I just added an assertion message instead of exit code error. >> I would appreciate it if someone could review the fix. > > Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: > > 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError Let me ask you if I can type "integrate". Should I wait until some other work is done? Thanks, Kimura Yukihiro ------------- PR Comment: https://git.openjdk.org/jdk/pull/15329#issuecomment-1685212271 From qamai at openjdk.org Sun Aug 20 10:07:01 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 20 Aug 2023 10:07:01 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v22] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: - Merge branch 'master' into unsignedDiv - fix types - fixes - do not return old node - whitespace - Merge branch 'master' into unsignedDiv - refactor calculations - consolidate constant calculations - address comments - Merge branch 'master' into unsignedDiv - ... and 51 more: https://git.openjdk.org/jdk/compare/ed0f75f2...62943d1f ------------- Changes: https://git.openjdk.org/jdk/pull/9947/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=21 Stats: 2383 lines in 13 files changed: 1745 ins; 455 del; 183 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From thartmann at openjdk.org Mon Aug 21 05:24:34 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 21 Aug 2023 05:24:34 GMT Subject: RFR: JDK-8313626: C2 crash due to unexpected exception control flow [v2] In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 13:49:59 GMT, Tobias Holenstein wrote: >> # Problem >> The following JASM code: >> >> static Method test1:"()V" stack 1 { >> try t; >> invokestatic m:"()V"; >> return; >> >> catch t java/lang/Throwable; >> stack_map class java/lang/Throwable; >> athrow; >> endtry t; >> } >> >> produces this java bytecode >> >> static void m(); >> Code: >> 0: return >> >> static void test1(); >> Code: >> 0: invokestatic #4 // Method m:()V >> 3: return >> 4: athrow >> Exception table: >> from to target type >> 0 5 4 Class java/lang/Throwable >> >> >> from https://docs.oracle.com/javase/specs/jvms/se20/jvms20.pdf _exception_table[] (p.116)_ >> >>> The values of the two items start_pc and end_pc indicate the ranges in the code array at which the exception handler is active. The value of start_pc must be a valid index into the code array of the opcode of an instruction. The value of end_pc either must be a valid index into the code array of the opcode of an instruction or must be equal to code_length, the length of the code array. The value of start_pc must be less than the value of end_pc. >>> The start_pc is inclusive and end_pc is exclusive; that is, the exception handler must be active while the program counter is within the interval [start_pc, end_pc). >>> >>> handler_pc >>> The value of the handler_pc item indicates the start of the exception handler. The value of the item must be a valid index into the code array and must be the index of the opcode of an instruction. >> >> and from _?athrow (p.420)_ >> >>> The objectref must be of type reference and must refer to an object that is an instance of class Throwable or of a subclass of Throwable. It is popped from the operand stack. The objectref is then thrown by searching the current method (?2.6) for the first exception handler that matches the class of objectref, as given by the algorithm in ?2.10. >>> If an exception handler that matches objectref is found, it contains the location of the code intended to handle this exception. The pc register is reset to that location, the operand stack of the current frame is cleared, objectref is pushed back onto the operand stack, and execution continues. >> >> In out case: **[start_pc=0, end_pc=5)** and **handler_pc=4** and **objectref=Class java/lang/Throwable** >> >> By this definition we have indeed valid bytecode for `test1()`. Therefore we would expect C2 to create an infinite loop for >> >> 4: athrow >> >> >> The C2 graph indeed shows a... > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - remove newlines > - remove newlines Isn't `MissingSafepointOnTryCatch::testInfinite` such an example? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15292#issuecomment-1685659888 From dlong at openjdk.org Mon Aug 21 05:43:26 2023 From: dlong at openjdk.org (Dean Long) Date: Mon, 21 Aug 2023 05:43:26 GMT Subject: RFR: JDK-8313626: C2 crash due to unexpected exception control flow [v2] In-Reply-To: References: Message-ID: On Mon, 21 Aug 2023 05:21:15 GMT, Tobias Hartmann wrote: > Isn't `MissingSafepointOnTryCatch::testInfinite` such an example? Yes, my mistake. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15292#issuecomment-1685682659 From tholenstein at openjdk.org Mon Aug 21 06:42:46 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 21 Aug 2023 06:42:46 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: References: Message-ID: <_yzQ8GLHdGaW-__O2xZk9p5B5SaqbkoO3fbhOreGYF4=.57efcb63-9be7-4bc8-9d43-553f9a519341@github.com> On Fri, 18 Aug 2023 17:26:18 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with one additional commit since the last revision: > > fixing trailing ws looks good to me. Thanks for the work! :) ------------- Marked as reviewed by tholenstein (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14349#pullrequestreview-1586296001 From roland at openjdk.org Mon Aug 21 08:46:28 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 21 Aug 2023 08:46:28 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Wed, 21 Jun 2023 12:47:26 GMT, Roland Westrelin wrote: > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... Comment to keep alive. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-1685907129 From roland at openjdk.org Mon Aug 21 08:47:32 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 21 Aug 2023 08:47:32 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v9] In-Reply-To: <-ssaBgw9bGq2MyUaNq_LfEONlBAhkOedksLfu1J0Jbo=.bce452bf-3953-4242-91ba-c7a4baf3bdf4@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> <-ssaBgw9bGq2MyUaNq_LfEONlBAhkOedksLfu1J0Jbo=.bce452bf-3953-4242-91ba-c7a4baf3bdf4@github.com> Message-ID: On Wed, 19 Jul 2023 13:36:27 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - riscv support > - improvements to test > - Merge branch 'master' into JDK-8308869 > - never common SubTypeCheckNode nodes > - keep both ways of doing profile > - whitespace > - reworked change > - Merge branch 'master' into JDK-8308869 > - more test failures > - Merge branch 'master' into JDK-8308869 > - ... and 6 more: https://git.openjdk.org/jdk/compare/207e1637...8d9a08d1 Anyone else for a review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1685908797 From shade at openjdk.org Mon Aug 21 09:38:30 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Aug 2023 09:38:30 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v4] In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 14:12:51 GMT, Kimura Yukihiro wrote: >> I would like to fix this issue because it is difficult for testers to understand why the test failed. >> There is no risk as I just added an assertion message instead of exit code error. >> I would appreciate it if someone could review the fix. > > Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: > > 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError Changes requested by shade (Reviewer). test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java line 62: > 60: // Ignore adapter creation failures > 61: if (oa.getExitValue() != 0 && !oa.getStderr().contains("Out of space in CodeCache for adapters")) { > 62: throw new RuntimeException("VM finished with exit code " + oa.getExitValue()); We also need `oa.reportDiagnosticSummary();` in this block. Otherwise we lose debugging information when throwing this exception. ------------- PR Review: https://git.openjdk.org/jdk/pull/15329#pullrequestreview-1586604012 PR Review Comment: https://git.openjdk.org/jdk/pull/15329#discussion_r1299866761 From thartmann at openjdk.org Mon Aug 21 09:41:31 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 21 Aug 2023 09:41:31 GMT Subject: RFR: 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 12:43:56 GMT, Roberto Casta?eda Lozano wrote: > This changeset ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see [initial analysis](https://bugs.openjdk.org/browse/JDK-8312749?focusedCommentId=14600658&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14600658) by Erik ?sterlund and Stefan Karlsson and comments in the regression test included in this changeset. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Testing > > ##### Functionality > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64) > - tier4-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; ZGC-specific tests only) > - tier6-9 (linux-x64; ZGC-specific tests only) > - tier1-3, and a few custom examples, applying [JDK-8139457](https://github.com/openjdk/jdk/pull/11044) (under review) on top of this changeset > > ##### Performance > > Tested performance on the following set of OpenJDK micro-benchmarks, on linux-x64 (for both G1 and ZGC, using different ObjectAlignmentInBytes values): > > - `openjdk.bench.java.lang.ArrayClone.byteClone` > - `openjdk.bench.java.lang.ArrayClone.intClone` > - `openjdk.bench.java.lang.ArrayFiddle.simple_clone` > - `openjdk.bench.java.lang.Clone.cloneLarge` > - `openjdk.bench.java.lang.Clone.cloneThreeDifferent` > > No significant regression was observed. Looks good to me! src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 686: > 684: } > 685: payload_size = kit->gvn().transform(new URShiftXNode(payload_size, kit->intcon(LogBytesPerLong))); > 686: ArrayCopyNode* ac = ArrayCopyNode::make(kit, false, src_base, offset, dst_base, offset, payload_size, true, false); Suggestion: ArrayCopyNode* ac = ArrayCopyNode::make(kit, false, src_base, offset, dst_base, offset, payload_size, true, false); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15288#pullrequestreview-1586603759 PR Review Comment: https://git.openjdk.org/jdk/pull/15288#discussion_r1299866583 From thartmann at openjdk.org Mon Aug 21 09:48:25 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 21 Aug 2023 09:48:25 GMT Subject: RFR: 8314452: Explicitly indicate inlining success/failure in PrintInlining In-Reply-To: References: Message-ID: <68cnReOy5vHWDIXC2XLw-uKOQkrPHw7FtggrkAh74ik=.86c263b6-c757-49bc-8dd3-6e00fc381436@github.com> On Wed, 16 Aug 2023 17:42:42 GMT, Jorn Vernee wrote: > This patch proposes to add a `+` or `-` to messages produced by `PrintInlining`, to indicate whether inlining succeeded or failed. This makes it easier to find inlining failures in an inlining trace, without having to rely on the message to figure out whether inlining succeeded or failed. Looking at inlining failures is often useful for diagnosing the results of benchmarks, but it can be hard to find inlining failures in lengthy traces. > > A sample of what this looks like: > > > +@ 0 java.lang.foreign.Arena::ofConfined (10 bytes) inline (hot) > +@ 0 java.lang.Thread::currentThread (0 bytes) (intrinsic) > +@ 3 jdk.internal.foreign.MemorySessionImpl::createConfined (9 bytes) inline (hot) > +@ 5 jdk.internal.foreign.ConfinedSession:: (18 bytes) inline (hot) > +@ 6 jdk.internal.foreign.ConfinedSession$ConfinedResourceList:: (5 bytes) inline (hot) > +@ 1 jdk.internal.foreign.MemorySessionImpl$ResourceList:: (5 bytes) inline (hot) > +@ 1 java.lang.Object:: (1 bytes) inline (hot) > +@ 9 jdk.internal.foreign.MemorySessionImpl:: (20 bytes) inline (hot) > +@ 1 java.lang.Object:: (1 bytes) inline (hot) > +@ 6 jdk.internal.foreign.MemorySessionImpl::asArena (9 bytes) inline (hot) > +@ 5 jdk.internal.foreign.MemorySessionImpl$1:: (10 bytes) inline (hot) > +@ 6 java.lang.Object:: (1 bytes) inline (hot) > -@ 8 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method > > > Using `grep`/`sls` to find inlining failures: > > >> Get-Content inlining_trace.txt | sls '-@' > -@ 8 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method > -@ 34 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method > -@ 19 java.lang.invoke.MethodHandle::linkToNative(JJJL)D (0 bytes) native call > -@ 95 java.lang.foreign.Arena::close (0 bytes) virtual call > ... Why not simply add a "failed to inline:" message? Something like: @ 8 java.lang.foreign.SegmentAllocator::allocate (24 bytes) failed to inline: already compiled into a big method @ 34 java.lang.foreign.SegmentAllocator::allocate (24 bytes) failed to inline: already compiled into a big method @ 19 java.lang.invoke.MethodHandle::linkToNative(JJJL)D (0 bytes) failed to inline: native call @ 95 java.lang.foreign.Arena::close (0 bytes) failed to inline: virtual call @ 107 jdk.internal.foreign.MemorySessionImpl::release0 (0 bytes) failed to inline: virtual call @ 14 jdk.internal.misc.Unsafe::freeMemory0 (0 bytes) failed to inline: native method ------------- PR Comment: https://git.openjdk.org/jdk/pull/15315#issuecomment-1686002253 From thartmann at openjdk.org Mon Aug 21 09:52:30 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 21 Aug 2023 09:52:30 GMT Subject: RFR: 8299658: C1 compilation crashes in LinearScan::resolve_exception_edge In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 20:17:52 GMT, Martin Doerr wrote: > This is a quick fix for the C1 problem described in the JBS issue. > When we find an illegal operand (modelled by nullptr) while resolving an exception edge we can propagate this state to the phi function and skip the edge. > > If somebody finds a better way to propagate the "illegal" state to the phi function, I can change or close this PR. > > Please review. A nice regression test would be a good thing, but probably not easy to write. That looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15348#pullrequestreview-1586629742 From ayang at openjdk.org Mon Aug 21 10:25:28 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 21 Aug 2023 10:25:28 GMT Subject: RFR: 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 12:43:56 GMT, Roberto Casta?eda Lozano wrote: > This changeset ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see [initial analysis](https://bugs.openjdk.org/browse/JDK-8312749?focusedCommentId=14600658&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14600658) by Erik ?sterlund and Stefan Karlsson and comments in the regression test included in this changeset. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Testing > > ##### Functionality > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64) > - tier4-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; ZGC-specific tests only) > - tier6-9 (linux-x64; ZGC-specific tests only) > - tier1-3, and a few custom examples, applying [JDK-8139457](https://github.com/openjdk/jdk/pull/11044) (under review) on top of this changeset > > ##### Performance > > Tested performance on the following set of OpenJDK micro-benchmarks, on linux-x64 (for both G1 and ZGC, using different ObjectAlignmentInBytes values): > > - `openjdk.bench.java.lang.ArrayClone.byteClone` > - `openjdk.bench.java.lang.ArrayClone.intClone` > - `openjdk.bench.java.lang.ArrayFiddle.simple_clone` > - `openjdk.bench.java.lang.Clone.cloneLarge` > - `openjdk.bench.java.lang.Clone.cloneThreeDifferent` > > No significant regression was observed. If I understand it correctly, much of the diff is to ensure that `ArrayCopyNode::make` (in `BarrierSetC2::clone`) gets the correct value for the `length` arg, calculated as `align_up(array-length * elem-size, word-size) / word-size`. I wonder if it's possible to pass the actual array length (#slots) as `length` and move the merge-bytes-to-words-copying optimization to a lower level, e.g. inside `conjoint_jbytes`. Ofc, `BarrierSetC2::clone_at_expansion` and its derived siblings need to be adjusted accordingly, e.g. to use the actual elem-type. (Preexisting: having `ArrayCopyNode` to cover both array and instance cloning hinders the readability, IMO.) ------------- PR Review: https://git.openjdk.org/jdk/pull/15288#pullrequestreview-1586685925 From rcastanedalo at openjdk.org Mon Aug 21 11:32:42 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 21 Aug 2023 11:32:42 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 17:26:18 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with one additional commit since the last revision: > > fixing trailing ws Looks good! I have tested the patch and checked that no major performance regression is introduced in the existing sea-of-nodes view. I only have a few minor comments about the code. src/utils/IdealGraphVisualizer/Graph/src/main/java/com/sun/hotspot/igv/graph/Figure.java line 394: > 392: return getInputNode().equals(((Figure)o).getInputNode()); > 393: } > 394: @Override Nit: please insert an empty line between these two methods. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 39: > 37: public static final int X_OFFSET = 8; > 38: public static final int LAYER_OFFSET = 8; > 39: // Algorithm global datastructures Suggestion: // Algorithm global data structures src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 125: > 123: return values[values.length / 2]; > 124: } > 125: } This method is duplicated from `HierarchicalLayoutManager`, could you make it static and extract it into some class (or possibly a new one, e.g. `Math.java` or `Statistics.java`) in the `Util` module? src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 131: > 129: * been inserted at that layer > 130: * > 131: * @param layer No need to list the parameters if there is no associated description of each of them, in my opinion. The same holds for all other cases in this file. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 156: > 154: > 155: /** > 156: * Ensure that the datastructures nodes and layerNodes are consistent Suggestion: * Ensure that the data structures nodes and layerNodes are consistent src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 542: > 540: private class BuildDatastructure { > 541: > 542: // In case there are changes in the node size, it's layer must be updated Suggestion: // In case there are changes in the node size, its layer must be updated src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 669: > 667: if (layers.keySet().contains(layer - 1)) { > 668: List predNodes = layers.get(layer - 1); > 669: // For each link with an end point in vertex, check how many edges crosses it Suggestion: // For each link with an end point in vertex, check how many edges cross it src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 706: > 704: if (layers.keySet().contains(layer + 1)) { > 705: List succsNodes = layers.get(layer + 1); > 706: // For each link with an end point in vertex, check how many edges crosses it Suggestion: // For each link with an end point in vertex, check how many edges cross it src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1104: > 1102: * Calculate which layer the given vertex should be inserted at to minimize > 1103: * reversed edges and edge lengths > 1104: * If there are multiple options, choose the bottom most layer Suggestion: * If there are multiple options, choose the bottom-most layer src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1122: > 1120: int layer = -1; > 1121: for (int i = 0; i < layers.keySet().size(); i++) { > 1122: // System.out.println("Testing layer " + i); Please remove this line. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1533: > 1531: private class ComputeLayoutScore { > 1532: /** > 1533: * https://www.geeksforgeeks.org/check-if-two-given-line-segments-intersect/ Please replace this reference with a standalone comment describing the method. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1547: > 1545: /** > 1546: * To find orientation of ordered triplet (p, q, r). > 1547: * https://www.geeksforgeeks.org/check-if-two-given-line-segments-intersect/ Please remove this reference. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1564: > 1562: > 1563: /** > 1564: * https://www.geeksforgeeks.org/check-if-two-given-line-segments-intersect/ Please remove this reference. src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java line 200: > 198: > 199: diagramViewModel.getGraphChangedEvent().addListener(model -> { > 200: // HierarchicalStableLayoutManager is not stable for difference graphs Perhaps use a different term than "stable" here to avoid confusion (e.g. "reliable"). ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14349#pullrequestreview-1586749389 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299961862 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299968734 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299973675 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299975429 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299975759 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299976753 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299977675 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299978015 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299979189 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299979475 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299981492 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299981974 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299982124 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1299967688 From duke at openjdk.org Mon Aug 21 14:26:51 2023 From: duke at openjdk.org (Kimura Yukihiro) Date: Mon, 21 Aug 2023 14:26:51 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v5] In-Reply-To: References: Message-ID: > I would like to fix this issue because it is difficult for testers to understand why the test failed. > There is no risk as I just added an assertion message instead of exit code error. > I would appreciate it if someone could review the fix. Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15329/files - new: https://git.openjdk.org/jdk/pull/15329/files/2e631363..9a409727 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15329&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15329&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15329/head:pull/15329 PR: https://git.openjdk.org/jdk/pull/15329 From shade at openjdk.org Mon Aug 21 14:26:52 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 21 Aug 2023 14:26:52 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v5] In-Reply-To: References: Message-ID: <0gmlQgoEv1W0xW4cLP58iZLssDzI8o_i_pfrvREV0wM=.11f59018-0881-47cc-96a5-4e07ee90efcb@github.com> On Mon, 21 Aug 2023 14:21:51 GMT, Kimura Yukihiro wrote: >> I would like to fix this issue because it is difficult for testers to understand why the test failed. >> There is no risk as I just added an assertion message instead of exit code error. >> I would appreciate it if someone could review the fix. > > Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: > > 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15329#pullrequestreview-1587113611 From duke at openjdk.org Mon Aug 21 14:26:53 2023 From: duke at openjdk.org (Kimura Yukihiro) Date: Mon, 21 Aug 2023 14:26:53 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v4] In-Reply-To: References: Message-ID: On Mon, 21 Aug 2023 09:35:43 GMT, Aleksey Shipilev wrote: >> Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: >> >> 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError > > test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java line 62: > >> 60: // Ignore adapter creation failures >> 61: if (oa.getExitValue() != 0 && !oa.getStderr().contains("Out of space in CodeCache for adapters")) { >> 62: throw new RuntimeException("VM finished with exit code " + oa.getExitValue()); > > We also need `oa.reportDiagnosticSummary();` in this block. Otherwise we lose debugging information when throwing this exception. Thank you for your advice. I have modified the point you had mentioned. Thanks, Kimura Yukihiro ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15329#discussion_r1300179444 From thartmann at openjdk.org Mon Aug 21 15:26:30 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 21 Aug 2023 15:26:30 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v5] In-Reply-To: References: Message-ID: On Mon, 21 Aug 2023 14:26:51 GMT, Kimura Yukihiro wrote: >> I would like to fix this issue because it is difficult for testers to understand why the test failed. >> There is no risk as I just added an assertion message instead of exit code error. >> I would appreciate it if someone could review the fix. > > Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: > > 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15329#pullrequestreview-1587273214 From cslucas at openjdk.org Mon Aug 21 18:29:26 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 21 Aug 2023 18:29:26 GMT Subject: RFR: JDK-8313689 : C2: compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java fails intermittently with -XX:-TieredCompilation Message-ID: <7HK_tchvEKiw1Ig6FVvN_9ABONpcG2jd9PsPCBTE1oE=.67e77ff4-4eb4-4073-a21f-c4e36fdab83e@github.com> Please see the JBS work item for more context. These adjustments are necessary to make the IR graph shape more stable across executions of the tests. Also, tries to force inline of some important methods. Tested locally on Linux x64 and Mac AArch64. ------------- Commit messages: - Improvements to RAM IR Tests Changes: https://git.openjdk.org/jdk/pull/15367/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15367&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313689 Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15367/head:pull/15367 PR: https://git.openjdk.org/jdk/pull/15367 From kvn at openjdk.org Mon Aug 21 19:00:25 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 21 Aug 2023 19:00:25 GMT Subject: RFR: JDK-8313689 : C2: compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java fails intermittently with -XX:-TieredCompilation In-Reply-To: <7HK_tchvEKiw1Ig6FVvN_9ABONpcG2jd9PsPCBTE1oE=.67e77ff4-4eb4-4073-a21f-c4e36fdab83e@github.com> References: <7HK_tchvEKiw1Ig6FVvN_9ABONpcG2jd9PsPCBTE1oE=.67e77ff4-4eb4-4073-a21f-c4e36fdab83e@github.com> Message-ID: On Mon, 21 Aug 2023 18:21:38 GMT, Cesar Soares Lucas wrote: > Please see the JBS work item for more context. > > These adjustments are necessary to make the IR graph shape more stable across executions of the tests. Also, tries to force inline of some important methods. > > Tested locally on Linux x64 and Mac AArch64. Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15367#pullrequestreview-1587676067 From xliu at openjdk.org Tue Aug 22 03:01:49 2023 From: xliu at openjdk.org (Xin Liu) Date: Tue, 22 Aug 2023 03:01:49 GMT Subject: RFR: 8314319: LogCompilation doesn't reset lateInlining when it encounters a failure. Message-ID: <63KaTi-Lq_cMNUNr25_oi_iUHVG3tC7kTxmtb0h_Q1s=.cfc158e8-7c3f-49a5-a97e-4efd248da9f6@github.com> This patch fixed a bug in LogCompilation. A compilation may encounter a failure after it processes '' tag. Sometimes, C2 compiler would retry after tweaking options. In this case, it would retry it without subsume_load. If we don't reset lateInlining, we may have trouble in the retry run. We also develop a unittest to verify that. A strip jit.xml is placed in test/resources/ directory. It's worth noting that 'mvn test' reports the 2 tests passed even without this patch. We can see the stacktrace of exceptions. This isn't an accident. There are 2 reasons: 1. LogParser::parse swallows any throwable in its exception handler. 2. surefire runs in parallel and can't capture the failure. I am not sure they are by design. I manage to fix those 2 problems, but fixing them is beyond the scope of this patch. I would like to hear reviewer's feedbacks first. ------------- Commit messages: - remove machine-specific information from jit log - 8314319: LogCompilation doesn't reset lateInlining when it encounters a failure. Changes: https://git.openjdk.org/jdk/pull/15375/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15375&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314319 Stats: 614 lines in 3 files changed: 614 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15375/head:pull/15375 PR: https://git.openjdk.org/jdk/pull/15375 From duke at openjdk.org Tue Aug 22 03:38:47 2023 From: duke at openjdk.org (xpbob) Date: Tue, 22 Aug 2023 03:38:47 GMT Subject: RFR: 8314688: VM build without C1 fails after JDK-8313372 Message-ID: [8314688: VM build without C1 fails after JDK-8313372] ./configure --with-jvm-features=-compiler1 --with-debug-level=release make images JOBS=32 jvmciCompilerToVMInit.o:make/hotspot/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp:252: more undefined references to `Compiler::is_intrinsic_supported(vmIntrinsicID)' follow ------------- Commit messages: - 8314688: VM build without C1 fails after JDK-8313372 Changes: https://git.openjdk.org/jdk/pull/15376/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15376&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314688 Stats: 18 lines in 1 file changed: 16 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15376.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15376/head:pull/15376 PR: https://git.openjdk.org/jdk/pull/15376 From duke at openjdk.org Tue Aug 22 03:47:51 2023 From: duke at openjdk.org (xpbob) Date: Tue, 22 Aug 2023 03:47:51 GMT Subject: RFR: 8314688: VM build without C1 fails after JDK-8313372 [v2] In-Reply-To: References: Message-ID: > [8314688: VM build without C1 fails after JDK-8313372] > > ./configure --with-jvm-features=-compiler1 --with-debug-level=release > make images JOBS=32 > > > jvmciCompilerToVMInit.o:make/hotspot/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp:252: more undefined references to `Compiler::is_intrinsic_supported(vmIntrinsicID)' follow xpbob has updated the pull request incrementally with one additional commit since the last revision: code format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15376/files - new: https://git.openjdk.org/jdk/pull/15376/files/97efc583..6937badb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15376&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15376&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15376.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15376/head:pull/15376 PR: https://git.openjdk.org/jdk/pull/15376 From yzheng at openjdk.org Tue Aug 22 06:30:27 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 22 Aug 2023 06:30:27 GMT Subject: RFR: 8314688: VM build without C1 fails after JDK-8313372 [v2] In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 03:47:51 GMT, xpbob wrote: >> [8314688: VM build without C1 fails after JDK-8313372] >> >> ./configure --with-jvm-features=-compiler1 --with-debug-level=release >> make images JOBS=32 >> >> >> jvmciCompilerToVMInit.o:make/hotspot/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp:252: more undefined references to `Compiler::is_intrinsic_supported(vmIntrinsicID)' follow > > xpbob has updated the pull request incrementally with one additional commit since the last revision: > > code format Looks good to me. Thanks for fixing this! ------------- Marked as reviewed by yzheng (no project role). PR Review: https://git.openjdk.org/jdk/pull/15376#pullrequestreview-1588504298 From haosun at openjdk.org Tue Aug 22 06:34:27 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 22 Aug 2023 06:34:27 GMT Subject: RFR: 8314688: VM build without C1 fails after JDK-8313372 [v2] In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 03:47:51 GMT, xpbob wrote: >> [8314688: VM build without C1 fails after JDK-8313372] >> >> ./configure --with-jvm-features=-compiler1 --with-debug-level=release >> make images JOBS=32 >> >> >> jvmciCompilerToVMInit.o:make/hotspot/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp:252: more undefined references to `Compiler::is_intrinsic_supported(vmIntrinsicID)' follow > > xpbob has updated the pull request incrementally with one additional commit since the last revision: > > code format src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp line 230: > 228: static jboolean is_c1_supported(vmIntrinsics::ID id){ > 229: jboolean supported = false; > 230: #ifdef COMPILER1 Header `c1/c1_Compiler.hpp` at line 25 was introduced in JDK-8313372. I suggest adding `#ifdef COMPILER1` directive for this header as well. The same to the C2 case, i.e. header `opto/c2compiler.hpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15376#discussion_r1301035371 From duke at openjdk.org Tue Aug 22 06:59:58 2023 From: duke at openjdk.org (xpbob) Date: Tue, 22 Aug 2023 06:59:58 GMT Subject: RFR: 8314688: VM build without C1 fails after JDK-8313372 [v3] In-Reply-To: References: Message-ID: > [8314688: VM build without C1 fails after JDK-8313372] > > ./configure --with-jvm-features=-compiler1 --with-debug-level=release > make images JOBS=32 > > > jvmciCompilerToVMInit.o:make/hotspot/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp:252: more undefined references to `Compiler::is_intrinsic_supported(vmIntrinsicID)' follow xpbob has updated the pull request incrementally with one additional commit since the last revision: add ifdef for header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15376/files - new: https://git.openjdk.org/jdk/pull/15376/files/6937badb..584aa573 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15376&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15376&range=01-02 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15376.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15376/head:pull/15376 PR: https://git.openjdk.org/jdk/pull/15376 From duke at openjdk.org Tue Aug 22 07:07:28 2023 From: duke at openjdk.org (xpbob) Date: Tue, 22 Aug 2023 07:07:28 GMT Subject: RFR: 8314688: VM build without C1 fails after JDK-8313372 [v2] In-Reply-To: References: Message-ID: <-0DMjS0b2mu1NEai0S_i5AF19RlPgPucejSJVx_e-qw=.639c8c71-45f4-4c88-a601-3f242201a62e@github.com> On Tue, 22 Aug 2023 06:27:15 GMT, Yudi Zheng wrote: >> xpbob has updated the pull request incrementally with one additional commit since the last revision: >> >> code format > > Looks good to me. Thanks for fixing this! @mur47x111 @shqking Thanks for the review,The code has been updated add #ifdef for the header ------------- PR Comment: https://git.openjdk.org/jdk/pull/15376#issuecomment-1687593620 From roland at openjdk.org Tue Aug 22 07:36:57 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 22 Aug 2023 07:36:57 GMT Subject: RFR: 8313262: C2: Sinking node may cause required cast to be dropped Message-ID: When a node is sunk out of a loop a cast node is created to pin the node out of the loop. When a chain of nodes is sunk, we don't want a cast node per node in the chain but rather one to pin the last of the chain. So the logic for sinking nodes looks for unneeded cast nodes. The test for what makes a cast unneeded is incorrect and causes a cast to not null to be wrongly removed. ------------- Commit messages: - test - fix Changes: https://git.openjdk.org/jdk/pull/15380/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15380&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313262 Stats: 69 lines in 2 files changed: 68 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15380.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15380/head:pull/15380 PR: https://git.openjdk.org/jdk/pull/15380 From rcastanedalo at openjdk.org Tue Aug 22 07:53:03 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Aug 2023 07:53:03 GMT Subject: RFR: 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) [v2] In-Reply-To: References: Message-ID: > This changeset ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see [initial analysis](https://bugs.openjdk.org/browse/JDK-8312749?focusedCommentId=14600658&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14600658) by Erik ?sterlund and Stefan Karlsson and comments in the regression test included in this changeset. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Testing > > ##### Functionality > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64) > - tier4-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; ZGC-specific tests only) > - tier6-9 (linux-x64; ZGC-specific tests only) > - tier1-3, and a few custom examples, applying [JDK-8139457](https://github.com/openjdk/jdk/pull/11044) (under review) on top of this changeset > > ##### Performance > > Tested performance on the following set of OpenJDK micro-benchmarks, on linux-x64 (for both G1 and ZGC, using different ObjectAlignmentInBytes values): > > - `openjdk.bench.java.lang.ArrayClone.byteClone` > - `openjdk.bench.java.lang.ArrayClone.intClone` > - `openjdk.bench.java.lang.ArrayFiddle.simple_clone` > - `openjdk.bench.java.lang.Clone.cloneLarge` > - `openjdk.bench.java.lang.Clone.cloneThreeDifferent` > > No significant regression was observed. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Remove extra whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15288/files - new: https://git.openjdk.org/jdk/pull/15288/files/5c56a5e5..9b60e679 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15288&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15288&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15288/head:pull/15288 PR: https://git.openjdk.org/jdk/pull/15288 From rcastanedalo at openjdk.org Tue Aug 22 07:53:04 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 22 Aug 2023 07:53:04 GMT Subject: RFR: 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: <7QMMwkGUvN7YnPhwa8y5cb6lruJJnCWeR6D50_GWdXY=.e8f1936f-e4e4-4464-877c-9c045415823e@github.com> On Tue, 15 Aug 2023 12:43:56 GMT, Roberto Casta?eda Lozano wrote: > This changeset ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see [initial analysis](https://bugs.openjdk.org/browse/JDK-8312749?focusedCommentId=14600658&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14600658) by Erik ?sterlund and Stefan Karlsson and comments in the regression test included in this changeset. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Testing > > ##### Functionality > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64) > - tier4-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; ZGC-specific tests only) > - tier6-9 (linux-x64; ZGC-specific tests only) > - tier1-3, and a few custom examples, applying [JDK-8139457](https://github.com/openjdk/jdk/pull/11044) (under review) on top of this changeset > > ##### Performance > > Tested performance on the following set of OpenJDK micro-benchmarks, on linux-x64 (for both G1 and ZGC, using different ObjectAlignmentInBytes values): > > - `openjdk.bench.java.lang.ArrayClone.byteClone` > - `openjdk.bench.java.lang.ArrayClone.intClone` > - `openjdk.bench.java.lang.ArrayFiddle.simple_clone` > - `openjdk.bench.java.lang.Clone.cloneLarge` > - `openjdk.bench.java.lang.Clone.cloneThreeDifferent` > > No significant regression was observed. Thanks for reviewing, Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15288#issuecomment-1687653525 From thartmann at openjdk.org Tue Aug 22 08:01:35 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 22 Aug 2023 08:01:35 GMT Subject: RFR: JDK-8313689 : C2: compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java fails intermittently with -XX:-TieredCompilation In-Reply-To: <7HK_tchvEKiw1Ig6FVvN_9ABONpcG2jd9PsPCBTE1oE=.67e77ff4-4eb4-4073-a21f-c4e36fdab83e@github.com> References: <7HK_tchvEKiw1Ig6FVvN_9ABONpcG2jd9PsPCBTE1oE=.67e77ff4-4eb4-4073-a21f-c4e36fdab83e@github.com> Message-ID: <2U_B7MLXaJi_ORsnt4BOnFfYRc4ugakaNASYJTPM7uE=.2b5aea45-2776-4143-8ad2-bde9196f2e85@github.com> On Mon, 21 Aug 2023 18:21:38 GMT, Cesar Soares Lucas wrote: > Please see the JBS work item for more context. > > These adjustments are necessary to make the IR graph shape more stable across executions of the tests. Also, tries to force inline of some important methods. > > Tested locally on Linux x64 and Mac AArch64. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15367#pullrequestreview-1588711297 From cslucas at openjdk.org Tue Aug 22 08:01:36 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 22 Aug 2023 08:01:36 GMT Subject: Integrated: JDK-8313689 : C2: compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java fails intermittently with -XX:-TieredCompilation In-Reply-To: <7HK_tchvEKiw1Ig6FVvN_9ABONpcG2jd9PsPCBTE1oE=.67e77ff4-4eb4-4073-a21f-c4e36fdab83e@github.com> References: <7HK_tchvEKiw1Ig6FVvN_9ABONpcG2jd9PsPCBTE1oE=.67e77ff4-4eb4-4073-a21f-c4e36fdab83e@github.com> Message-ID: On Mon, 21 Aug 2023 18:21:38 GMT, Cesar Soares Lucas wrote: > Please see the JBS work item for more context. > > These adjustments are necessary to make the IR graph shape more stable across executions of the tests. Also, tries to force inline of some important methods. > > Tested locally on Linux x64 and Mac AArch64. This pull request has now been integrated. Changeset: 02ef859f Author: Cesar Soares Lucas Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/02ef859f79cbc2e6225998001af299ba36fe991b Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod 8313689: C2: compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java fails intermittently with -XX:-TieredCompilation Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/15367 From dnsimon at openjdk.org Tue Aug 22 08:40:28 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 22 Aug 2023 08:40:28 GMT Subject: RFR: 8314688: VM build without C1 fails after JDK-8313372 [v3] In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 06:59:58 GMT, xpbob wrote: >> [8314688: VM build without C1 fails after JDK-8313372] >> >> ./configure --with-jvm-features=-compiler1 --with-debug-level=release >> make images JOBS=32 >> >> >> jvmciCompilerToVMInit.o:make/hotspot/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp:252: more undefined references to `Compiler::is_intrinsic_supported(vmIntrinsicID)' follow > > xpbob has updated the pull request incrementally with one additional commit since the last revision: > > add ifdef for header Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15376#pullrequestreview-1588825458 From haosun at openjdk.org Tue Aug 22 08:51:31 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 22 Aug 2023 08:51:31 GMT Subject: RFR: 8314688: VM build without C1 fails after JDK-8313372 [v3] In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 06:59:58 GMT, xpbob wrote: >> [8314688: VM build without C1 fails after JDK-8313372] >> >> ./configure --with-jvm-features=-compiler1 --with-debug-level=release >> make images JOBS=32 >> >> >> jvmciCompilerToVMInit.o:make/hotspot/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp:252: more undefined references to `Compiler::is_intrinsic_supported(vmIntrinsicID)' follow > > xpbob has updated the pull request incrementally with one additional commit since the last revision: > > add ifdef for header Thanks for your update. LGTM. ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/15376#pullrequestreview-1588854306 From duke at openjdk.org Tue Aug 22 08:51:31 2023 From: duke at openjdk.org (xpbob) Date: Tue, 22 Aug 2023 08:51:31 GMT Subject: RFR: 8314688: VM build without C1 fails after JDK-8313372 [v3] In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 08:37:29 GMT, Doug Simon wrote: >> xpbob has updated the pull request incrementally with one additional commit since the last revision: >> >> add ifdef for header > > Marked as reviewed by dnsimon (Reviewer). @dougxc Thanks for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/15376#issuecomment-1687752636 From gbarany at openjdk.org Tue Aug 22 09:10:56 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Tue, 22 Aug 2023 09:10:56 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 Message-ID: The EnableVectorSupport flag is declared in `opto/c2_globals.hpp`, which is not included if `COMPILER2` is not set. But after my changes for [JDK-8312579](https://bugs.openjdk.org/browse/JDK-8312579) we try to access this flag in some places guarded by `#if COMPILER2_OR_JVMCI`. This PR moves some flags from `c2_globals.hpp` to the shared `compiler_globals.hpp`, so that they are accessible even if C2 is disabled but JVMCI is enabled. ------------- Commit messages: - 8313530: VM build without C2 fails after JDK-8312579 Changes: https://git.openjdk.org/jdk/pull/15384/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15384&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313530 Stats: 26 lines in 2 files changed: 14 ins; 12 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15384.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15384/head:pull/15384 PR: https://git.openjdk.org/jdk/pull/15384 From duke at openjdk.org Tue Aug 22 09:24:36 2023 From: duke at openjdk.org (xpbob) Date: Tue, 22 Aug 2023 09:24:36 GMT Subject: Integrated: 8314688: VM build without C1 fails after JDK-8313372 In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 03:33:03 GMT, xpbob wrote: > [8314688: VM build without C1 fails after JDK-8313372] > > ./configure --with-jvm-features=-compiler1 --with-debug-level=release > make images JOBS=32 > > > jvmciCompilerToVMInit.o:make/hotspot/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp:252: more undefined references to `Compiler::is_intrinsic_supported(vmIntrinsicID)' follow This pull request has now been integrated. Changeset: 3e1b1bf9 Author: bobpengxie Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/3e1b1bf94e7acf9717b837085e61fc05a7765de4 Stats: 22 lines in 1 file changed: 20 ins; 0 del; 2 mod 8314688: VM build without C1 fails after JDK-8313372 Reviewed-by: yzheng, dnsimon, haosun ------------- PR: https://git.openjdk.org/jdk/pull/15376 From shade at openjdk.org Tue Aug 22 10:49:28 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 22 Aug 2023 10:49:28 GMT Subject: RFR: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError [v4] In-Reply-To: References: Message-ID: <1Ve6CmBtg7L_zLvy3SzB3rMA8M7O9IbpctljGSqenok=.af14332c-b9a6-4177-9a91-302f32ff0ca9@github.com> On Sun, 20 Aug 2023 07:41:46 GMT, Kimura Yukihiro wrote: >> Kimura Yukihiro has updated the pull request incrementally with one additional commit since the last revision: >> >> 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError > > Let me ask you if I can type "integrate". > Should I wait until some other work is done? > > Thanks, > Kimura Yukihiro @yukikimmura, you can `/integrate` now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15329#issuecomment-1687950485 From fjiang at openjdk.org Tue Aug 22 12:31:30 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 22 Aug 2023 12:31:30 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v2] In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 13:06:05 GMT, Ilya Gavrilin wrote: >> src/hotspot/cpu/riscv/riscv.ad line 7706: >> >>> 7704: match(Set dst (RoundDoubleMode src rmode)); >>> 7705: ins_cost(2 * XFER_COST + BRANCH_COST); >>> 7706: effect(TEMP_DEF dst, TEMP tmp1, TEMP tmp2, TEMP tmp3, KILL cr); >> >> Do we kill `cr` anywhere in the assembly code? > > According to documentation we have situations when convert instruction can set an error flag in the status register: >> All floating-point conversion instructions set the Inexact exception flag if the rounded result differs from the operand value and the Invalid exception flag is not set. [1] > > [1] https://five-embeddev.com/riscv-isa-manual/latest/f.html#single-precision-floating-point-conversion-and-move-instructions There is no dedicated flag register on risc-v. We choose `t1` as the flag register to bridge the RegFlag semantics in share and opto. Kill `cr` here is not needed since nowhere uses `t1` as tmp register. https://github.com/openjdk/jdk/blob/3e1b1bf94e7acf9717b837085e61fc05a7765de4/src/hotspot/cpu/riscv/riscv.ad#L404-L407 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1301571670 From dnsimon at openjdk.org Tue Aug 22 13:14:29 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 22 Aug 2023 13:14:29 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 09:04:32 GMT, Gerg? Barany wrote: > The EnableVectorSupport flag is declared in `opto/c2_globals.hpp`, which is not included if `COMPILER2` is not set. But after my changes for [JDK-8312579](https://bugs.openjdk.org/browse/JDK-8312579) we try to access this flag in some places guarded by `#if COMPILER2_OR_JVMCI`. > > This PR moves some flags from `c2_globals.hpp` to the shared `compiler_globals.hpp`, so that they are accessible even if C2 is disabled but JVMCI is enabled. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15384#pullrequestreview-1589412470 From chagedorn at openjdk.org Tue Aug 22 13:14:44 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Aug 2023 13:14:44 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 17:26:18 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with one additional commit since the last revision: > > fixing trailing ws Great work Emmy! I've played around with some graphs and it works quite well for small sets of nodes. While clicking around in IGV, I've noticed that sometimes it takes a long time to open a new graph. Example: 1. Open IGV 2. Run: `java -XX:CompileOnly=*::putUTF8 -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=3 HelloWorld.java` 3. Open the graph `PhaseCCP1` of the `putUTF8` compilation. 4. Switch to `stable sea of nodes` layout. 5. Open the graph `Before RemoveUseless` -> hangs for ~1min I've had a closer look at that example with a profiler and it seems that over 90% of the time is spent in `sanityCheckNodesAndLayerNodes()` (~66%) and `sanityCheckEdges()` (~25%) which seems quite a lot for just sanity checks. @robcasloz double checked the example and then ran the same steps from above with the two sanity check methods disabled. On his machine, the time went down from 64s to 2s which is a huge improvement. We therefore suggest to either completely disable sanity checking with these two methods or limit it to a few places (not sure how easy that is). If we disable it, we could still think about sanity checking with them in IGV unit tests only. I'm not very familiar with the IGV code base, so I only have some more general code comments. Thanks, Christian src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 77: > 75: } > 76: > 77: private class LinkAction { Can be made static: Suggestion: private static class LinkAction { src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 97: > 95: } > 96: > 97: private int calculateOptimalBoth(LayoutNode n) { `calculateOptimalBoth()` in `HierarchicalLayoutManager` is now ununsed and can be removed. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 98: > 96: > 97: private int calculateOptimalBoth(LayoutNode n) { > 98: if (n.preds.size() == 0 && n.succs.size() == 0) { Suggestion: if (n.preds.isEmpty() && n.succs.isEmpty()) { There are also other locations where you can replace `size() >/== 0` by `!empty()/empty()`. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 189: > 187: assert e.to.layer == n.layer + 1; > 188: } else { > 189: n.succs.remove(e); This removal and the one in the next loop seem unexpected being in a sanity check method where the expectation would be to only query and not modify. Do we really need these removals for the correctness of the algorithm? src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 265: > 263: if (!addedVertices.contains(to) && !addedVertices.contains(from)) { > 264: linkActions.add(a); > 265: } This code looks identical to the code on L269-281. Could this be shared? src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 454: > 452: */ > 453: private void copyOldNodes() { > 454: oldNodes.clear(); Is `oldNodes` a leftover? It does not seem to be used. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 511: > 509: if (shouldComputeLayoutScore) { > 510: new ComputeLayoutScore().run(); > 511: } Since `shouldCompuateLayoutScore` is always false, do we still need this code? src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 757: > 755: */ > 756: private void insertNode(LayoutNode node, int layer) { > 757: assert layers.keySet().contains(layer) || layer == 0; Suggestion: assert layers.containsKey(layer) || layer == 0; There are also other locations where you could replace `keySet().contains()` with `containsKey()`. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 807: > 805: n.preds.add(result); > 806: result.to = n; > 807: result.relativeTo = n.width / 2; `n.width` equals `DUMMY_WIDTH` here which is 1. `n.width / 2` is therefore always zero. Is it intended to set `result.relativeTo` and `e.relativeFrom` to zero? Same further down in `expandNewLayerBeneath()`. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1450: > 1448: } > 1449: > 1450: public void insert(LayoutNode n, int pos) { This method and also other code is duplicated from `HierarchicalLayoutManager`. Could the code be shared somehow? ------------- PR Review: https://git.openjdk.org/jdk/pull/14349#pullrequestreview-1589076106 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1301423595 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1301428472 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1301424349 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1301476336 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1301421264 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1301419240 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1301416659 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1301514857 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1301513317 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1301532721 From chagedorn at openjdk.org Tue Aug 22 13:14:46 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 22 Aug 2023 13:14:46 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v5] In-Reply-To: References: <7RAXRCktzo9NArTyV_NBwxxLl7zgCaxurRLwwwKzeAM=.91d9abbe-12e0-43af-8e5c-b13052629a56@github.com> Message-ID: On Fri, 18 Aug 2023 15:44:50 GMT, emmyyin wrote: >> src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1255: >> >>> 1253: found = false; >>> 1254: >>> 1255: if (n.vertex == null && n.succs.size() <= 1 && n.preds.size() <= 1) { >> >> I think `n.vertex == null` is always true and can be omitted > > This is to make the loop break once we hit the non-dummy node where the edge goes from. I.e. for the edge (u,v) with lots of dummy nodes in between nodes u and v, we only want to remove the dummy nodes and then break the loop as soon as we are at node u. I still don't understand why you need `n.vertex == null` here. If `n.vertex != null`, then the loop continuation test `n.vertex == null && found` will be false and we will not perform another iteration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1301527867 From duke at openjdk.org Tue Aug 22 17:24:24 2023 From: duke at openjdk.org (Swati Sharma) Date: Tue, 22 Aug 2023 17:24:24 GMT Subject: RFR: 8314085: Fixing scope from benchmark to thread for JMH tests having shared state In-Reply-To: References: Message-ID: <5K5iYGm0ryQwxnOj-e0NlMsCPYMpzZtAiVXYubl7pLM=.65ae1221-e573-4666-911a-ef4e3a271927@github.com> On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote: > In addition to the issue [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing the scope from benchmark to thread for below benchmark files having shared state, also which fixes few of the benchmarks scalability problems. > > org/openjdk/bench/java/io/DataInputStreamTest.java > org/openjdk/bench/java/lang/ArrayClone.java > org/openjdk/bench/java/lang/StringCompareToDifferentLength.java > org/openjdk/bench/java/lang/StringCompareToIgnoreCase.java > org/openjdk/bench/java/lang/StringComparisons.java > org/openjdk/bench/java/lang/StringEquals.java > org/openjdk/bench/java/lang/StringFormat.java > org/openjdk/bench/java/lang/StringReplace.java > org/openjdk/bench/java/lang/StringSubstring.java > org/openjdk/bench/java/lang/StringTemplateFMT.java > org/openjdk/bench/java/lang/constant/MethodTypeDescFactories.java > org/openjdk/bench/java/lang/constant/ReferenceClassDescResolve.java > org/openjdk/bench/java/lang/invoke/MethodHandlesConstant.java > org/openjdk/bench/java/lang/invoke/MethodHandlesIdentity.java > org/openjdk/bench/java/lang/invoke/MethodHandlesThrowException.java > org/openjdk/bench/java/lang/invoke/MethodTypeAppendParams.java > org/openjdk/bench/java/lang/invoke/MethodTypeChangeParam.java > org/openjdk/bench/java/lang/invoke/MethodTypeChangeReturn.java > org/openjdk/bench/java/lang/invoke/MethodTypeDropParams.java > org/openjdk/bench/java/lang/invoke/MethodTypeGenerify.java > org/openjdk/bench/java/lang/invoke/MethodTypeInsertParams.java > org/openjdk/bench/java/security/CipherSuiteBench.java > org/openjdk/bench/java/time/GetYearBench.java > org/openjdk/bench/java/time/InstantBench.java > org/openjdk/bench/java/time/format/DateTimeFormatterWithPaddingBench.java > org/openjdk/bench/java/util/ListArgs.java > org/openjdk/bench/java/util/LocaleDefaults.java > org/openjdk/bench/java/util/TestAdler32.java > org/openjdk/bench/java/util/TestCRC32.java > org/openjdk/bench/java/util/TestCRC32C.java > org/openjdk/bench/java/util/regex/Exponential.java > org/openjdk/bench/java/util/regex/Primality.java > org/openjdk/bench/java/util/regex/Trim.java > org/openjdk/bench/javax/crypto/AESReinit.java > org/openjdk/bench/jdk/incubator/vector/LoadMaskedIOOBEBenchmark.java > org/openjdk/bench/vm/compiler/Rotation.java > org/openjdk/bench/vm/compiler/x86/ConvertF2I.java > org/openjdk/bench/vm/compiler/x86/BasicRules.java > > Please review and provide your feedback. > > Thanks, > Swati Could you please review this ? @ericcaspole ------------- PR Comment: https://git.openjdk.org/jdk/pull/15230#issuecomment-1688597226 From epeter at openjdk.org Tue Aug 22 17:29:49 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Aug 2023 17:29:49 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v18] In-Reply-To: References: <2GLfWpwzObvaPasLEim5TPbClAGinuJHQOp89Vdgt10=.7f0899a4-dc91-4a5b-8d9a-9a0d8125c2f4@github.com> Message-ID: On Sat, 19 Aug 2023 08:48:35 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> whitespace > > I am working with adding more division tests in the java side as you have suggested, I think checking the result with different value ranges of the dividend is necessary, I am thinking of using `Min` and `Max` nodes for this, but there is currently no intrinsics for `Math::min(long, long)` and `Math::max(long, long)`, do you think I should add those to have easier time working on testing this patch? @merykitty Why do you need the intrinsic for `Math::min/max(long,long)`? I would just write two methods: one that is compiled (maybe add IR rule that checks that division is present in earlier compile phase, but not present later), and another method that is excluded from compilation. Then you can just generate random inputs to the division and compare outputs. But sure, adding the intrinsics for long min/max is on my list, because I want it to vectorize with SuperWord. So go ahead with it in a separate RFE if you want to use it for testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1688591232 From qamai at openjdk.org Tue Aug 22 17:29:50 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 22 Aug 2023 17:29:50 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v18] In-Reply-To: References: <2GLfWpwzObvaPasLEim5TPbClAGinuJHQOp89Vdgt10=.7f0899a4-dc91-4a5b-8d9a-9a0d8125c2f4@github.com> Message-ID: On Tue, 22 Aug 2023 17:02:28 GMT, Emanuel Peter wrote: >> I am working with adding more division tests in the java side as you have suggested, I think checking the result with different value ranges of the dividend is necessary, I am thinking of using `Min` and `Max` nodes for this, but there is currently no intrinsics for `Math::min(long, long)` and `Math::max(long, long)`, do you think I should add those to have easier time working on testing this patch? > > @merykitty Why do you need the intrinsic for `Math::min/max(long,long)`? I would just write two methods: one that is compiled (maybe add IR rule that checks that division is present in earlier compile phase, but not present later), and another method that is excluded from compilation. Then you can just generate random inputs to the division and compare outputs. > > But sure, adding the intrinsics for long min/max is on my list, because I want it to vectorize with SuperWord. So go ahead with it in a separate RFE if you want to use it for testing. @eme64 Because the division is transformed before it even appears on the graph, so the most simple solution to have an arbitrary value range for the dividend is to use a pair of min/max nodes to clamp the input and inform the compiler about the value set of the dividend. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1688606053 From kvn at openjdk.org Tue Aug 22 17:39:53 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 22 Aug 2023 17:39:53 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 09:04:32 GMT, Gerg? Barany wrote: > The EnableVectorSupport flag is declared in `opto/c2_globals.hpp`, which is not included if `COMPILER2` is not set. But after my changes for [JDK-8312579](https://bugs.openjdk.org/browse/JDK-8312579) we try to access this flag in some places guarded by `#if COMPILER2_OR_JVMCI`. > > This PR moves some flags from `c2_globals.hpp` to the shared `compiler_globals.hpp`, so that they are accessible even if C2 is disabled but JVMCI is enabled. Instead of putting these flags to shared code consider doing it similar to C2 intrinsic flags in jvmci_globals.hpp: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/jvmci/jvmci_globals.hpp#L165C31-L165C56 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15384#issuecomment-1688615217 From epeter at openjdk.org Tue Aug 22 17:45:21 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 22 Aug 2023 17:45:21 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v18] In-Reply-To: References: <2GLfWpwzObvaPasLEim5TPbClAGinuJHQOp89Vdgt10=.7f0899a4-dc91-4a5b-8d9a-9a0d8125c2f4@github.com> Message-ID: On Tue, 22 Aug 2023 17:13:42 GMT, Quan Anh Mai wrote: >> @merykitty Why do you need the intrinsic for `Math::min/max(long,long)`? I would just write two methods: one that is compiled (maybe add IR rule that checks that division is present in earlier compile phase, but not present later), and another method that is excluded from compilation. Then you can just generate random inputs to the division and compare outputs. >> >> But sure, adding the intrinsics for long min/max is on my list, because I want it to vectorize with SuperWord. So go ahead with it in a separate RFE if you want to use it for testing. > > @eme64 Because the division is transformed before it even appears on the graph, so the most simple solution to have an arbitrary value range for the dividend is to use a pair of min/max nodes to clamp the input and inform the compiler about the value set of the dividend. Thanks. @merykitty another way to easily get a value range it to use a `Phi` node which merges two constants. Have you tried that? long x; if (flag) { x = 10; } else { x = 100; } // Phi for x should have range long:10..100 ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1688618393 From dnsimon at openjdk.org Tue Aug 22 21:46:17 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 22 Aug 2023 21:46:17 GMT Subject: RFR: 8314819: [JVMCI] HotSpotJVMCIRuntime.lookupType throws unexpected ClassNotFoundException Message-ID: This PR restores the expected behavior prior to [JDK-8313421](https://bugs.openjdk.org/browse/JDK-8313421) whereby `HotSpotJVMCIRuntime.lookupType` throws `NoClassDefFoundError` instead of `ClassNotFoundException`. ------------- Commit messages: - CompilerToVM.lookupType must throw NoClassDefFoundError instead of ClassNotFoundException Changes: https://git.openjdk.org/jdk/pull/15393/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15393&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314819 Stats: 5 lines in 2 files changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15393/head:pull/15393 PR: https://git.openjdk.org/jdk/pull/15393 From duke at openjdk.org Tue Aug 22 23:38:47 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 22 Aug 2023 23:38:47 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v23] In-Reply-To: References: Message-ID: <1yCo55YXhhweh_3xXTORwBCZNnQjYneqD3xMxV_SbQE=.b1e286fa-a5fd-4236-84d8-255b62f1b627@github.com> > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > An explanation for the path chosen in the PR to bring in the SIMD Arrays.sort at the top level instead of only bringing in the smaller components from the algorithm is as follows: the key components of Arrays.sort are pivot selection, partitioning, partition sort. Among these, the two hottest components are partitioning and partition sort. Both could be individually accelerated using SIMD implementations. However, what we noticed was that just bringing in these two individual optimizations gave us half the performance gain versus bringing in the entire AVX512 SIMD sort. AVX512 SIMD sort implements a single-pivot quicksort algorithm (SPQS) by selecting a single pivot and then recursively partitioning the array into two smaller partitions using SIMD instructions. When the partition size becomes less than or equal to 128, it uses a SIMD bitonic sort using x86 AVX512 intrinsics to sort that partition. However, the default implementation of Arrays.sort() in Java is the dual pivot quick sort (DPQS) not the SPQS. If the partitioning in the DPQS is implemented using AVX512, it needs two passes of the single-pivot AVX512 partitioning function (instead of just one in the case of SPQS), thereby leading to loss of 50% performance. > > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Decomposed DPQS using AVX512 partitioning and AVX512 sort (for small arrays). Works for serial and parallel sort. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/07349ec3..9153059a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=21-22 Stats: 1487 lines in 20 files changed: 853 ins; 337 del; 297 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From never at openjdk.org Wed Aug 23 00:23:21 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 23 Aug 2023 00:23:21 GMT Subject: RFR: 8314819: [JVMCI] HotSpotJVMCIRuntime.lookupType throws unexpected ClassNotFoundException In-Reply-To: References: Message-ID: <763VIkdu5hSOm3jAGR0qs3NDdbOCiCn43lDgqKBa2F8=.3d2f9c00-6474-43bf-84a3-b4c92049b1b4@github.com> On Tue, 22 Aug 2023 21:01:26 GMT, Doug Simon wrote: > This PR restores the expected behavior prior to [JDK-8313421](https://bugs.openjdk.org/browse/JDK-8313421) whereby `HotSpotJVMCIRuntime.lookupType` throws `NoClassDefFoundError` instead of `ClassNotFoundException`. Looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15393#pullrequestreview-1590518452 From dnsimon at openjdk.org Wed Aug 23 09:11:33 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 23 Aug 2023 09:11:33 GMT Subject: RFR: 8314819: [JVMCI] HotSpotJVMCIRuntime.lookupType throws unexpected ClassNotFoundException [v2] In-Reply-To: References: Message-ID: > This PR restores the expected behavior prior to [JDK-8313421](https://bugs.openjdk.org/browse/JDK-8313421) whereby `HotSpotJVMCIRuntime.lookupType` throws `NoClassDefFoundError` instead of `ClassNotFoundException`. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fixed and expanded testing related to CompilerToVM.lookupType ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15393/files - new: https://git.openjdk.org/jdk/pull/15393/files/c62ba346..adf985d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15393&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15393&range=00-01 Stats: 64 lines in 3 files changed: 41 ins; 4 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/15393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15393/head:pull/15393 PR: https://git.openjdk.org/jdk/pull/15393 From thartmann at openjdk.org Wed Aug 23 09:11:38 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Aug 2023 09:11:38 GMT Subject: RFR: 8314819: [JVMCI] HotSpotJVMCIRuntime.lookupType throws unexpected ClassNotFoundException [v2] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 08:26:31 GMT, Doug Simon wrote: >> This PR restores the expected behavior prior to [JDK-8313421](https://bugs.openjdk.org/browse/JDK-8313421) whereby `HotSpotJVMCIRuntime.lookupType` throws `NoClassDefFoundError` instead of `ClassNotFoundException`. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed and expanded testing related to CompilerToVM.lookupType Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15393#pullrequestreview-1590780972 From thartmann at openjdk.org Wed Aug 23 09:11:46 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Aug 2023 09:11:46 GMT Subject: RFR: 8314819: [JVMCI] HotSpotJVMCIRuntime.lookupType throws unexpected ClassNotFoundException In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 21:01:26 GMT, Doug Simon wrote: > This PR restores the expected behavior prior to [JDK-8313421](https://bugs.openjdk.org/browse/JDK-8313421) whereby `HotSpotJVMCIRuntime.lookupType` throws `NoClassDefFoundError` instead of `ClassNotFoundException`. Just wondering, should there be a regression test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15393#issuecomment-1689308605 From dnsimon at openjdk.org Wed Aug 23 09:11:50 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 23 Aug 2023 09:11:50 GMT Subject: RFR: 8314819: [JVMCI] HotSpotJVMCIRuntime.lookupType throws unexpected ClassNotFoundException In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 05:34:57 GMT, Tobias Hartmann wrote: > Just wondering, should there be a regression test? Yes. The existing tests for `CompilerToVM.lookupType` needed to be adjusted and I expanded them to also test HotSpotJVMCIRuntime.lookupType`: adf985d4c9ba35d9d9d586fad503e0660eee9f91 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15393#issuecomment-1689512846 From gbarany at openjdk.org Wed Aug 23 09:11:49 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Wed, 23 Aug 2023 09:11:49 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 [v2] In-Reply-To: References: Message-ID: > The EnableVectorSupport flag is declared in `opto/c2_globals.hpp`, which is not included if `COMPILER2` is not set. But after my changes for [JDK-8312579](https://bugs.openjdk.org/browse/JDK-8312579) we try to access this flag in some places guarded by `#if COMPILER2_OR_JVMCI`. > > This PR moves some flags from `c2_globals.hpp` to the shared `compiler_globals.hpp`, so that they are accessible even if C2 is disabled but JVMCI is enabled. Gerg? Barany has updated the pull request incrementally with two additional commits since the last revision: - Add copies of Vector API flags in jvmci_globals.hpp - Revert "8313530: VM build without C2 fails after JDK-8312579" This reverts commit d82e89c469e91f78f9c2e5b28c725b0e1ba0fb8c. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15384/files - new: https://git.openjdk.org/jdk/pull/15384/files/d82e89c4..278e5e51 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15384&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15384&range=00-01 Stats: 39 lines in 3 files changed: 24 ins; 14 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15384.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15384/head:pull/15384 PR: https://git.openjdk.org/jdk/pull/15384 From haosun at openjdk.org Wed Aug 23 09:11:57 2023 From: haosun at openjdk.org (Hao Sun) Date: Wed, 23 Aug 2023 09:11:57 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 [v2] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 08:24:08 GMT, Gerg? Barany wrote: >> The EnableVectorSupport flag is declared in `opto/c2_globals.hpp`, which is not included if `COMPILER2` is not set. But after my changes for [JDK-8312579](https://bugs.openjdk.org/browse/JDK-8312579) we try to access this flag in some places guarded by `#if COMPILER2_OR_JVMCI`. >> >> This PR moves some flags from `c2_globals.hpp` to the shared `compiler_globals.hpp`, so that they are accessible even if C2 is disabled but JVMCI is enabled. > > Gerg? Barany has updated the pull request incrementally with two additional commits since the last revision: > > - Add copies of Vector API flags in jvmci_globals.hpp > - Revert "8313530: VM build without C2 fails after JDK-8312579" > > This reverts commit d82e89c469e91f78f9c2e5b28c725b0e1ba0fb8c. Verified with Linux/AArch64 and Linux/x86_64 that VM build without C2 is passed now. Marked as reviewed by haosun (Committer). src/hotspot/share/jvmci/jvmci_globals.hpp line 185: > 183: \ > 184: NOT_COMPILER2(product(bool, UseVectorStubs, false, EXPERIMENTAL, \ > 185: "Use stubs for vector transcendental operations")) \ nit: remove the backslash? Suggestion: "Use stubs for vector transcendental operations")) ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/15384#pullrequestreview-1590961843 PR Review: https://git.openjdk.org/jdk/pull/15384#pullrequestreview-1590987883 PR Review Comment: https://git.openjdk.org/jdk/pull/15384#discussion_r1302617993 From jiefu at openjdk.org Wed Aug 23 09:12:04 2023 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 23 Aug 2023 09:12:04 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 [v2] In-Reply-To: References: Message-ID: <_k17BuvlRCMbYs5kASctKz6scGvrlL_1drHR-FEKC00=.596c2fca-7b3b-453c-bda2-24cd42d62994@github.com> On Wed, 23 Aug 2023 08:24:08 GMT, Gerg? Barany wrote: >> The EnableVectorSupport flag is declared in `opto/c2_globals.hpp`, which is not included if `COMPILER2` is not set. But after my changes for [JDK-8312579](https://bugs.openjdk.org/browse/JDK-8312579) we try to access this flag in some places guarded by `#if COMPILER2_OR_JVMCI`. >> >> This PR moves some flags from `c2_globals.hpp` to the shared `compiler_globals.hpp`, so that they are accessible even if C2 is disabled but JVMCI is enabled. > > Gerg? Barany has updated the pull request incrementally with two additional commits since the last revision: > > - Add copies of Vector API flags in jvmci_globals.hpp > - Revert "8313530: VM build without C2 fails after JDK-8312579" > > This reverts commit d82e89c469e91f78f9c2e5b28c725b0e1ba0fb8c. LGTM ------------- Marked as reviewed by jiefu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15384#pullrequestreview-1591016279 From dnsimon at openjdk.org Wed Aug 23 09:12:10 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 23 Aug 2023 09:12:10 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 [v2] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 08:24:08 GMT, Gerg? Barany wrote: >> The EnableVectorSupport flag is declared in `opto/c2_globals.hpp`, which is not included if `COMPILER2` is not set. But after my changes for [JDK-8312579](https://bugs.openjdk.org/browse/JDK-8312579) we try to access this flag in some places guarded by `#if COMPILER2_OR_JVMCI`. >> >> This PR moves some flags from `c2_globals.hpp` to the shared `compiler_globals.hpp`, so that they are accessible even if C2 is disabled but JVMCI is enabled. > > Gerg? Barany has updated the pull request incrementally with two additional commits since the last revision: > > - Add copies of Vector API flags in jvmci_globals.hpp > - Revert "8313530: VM build without C2 fails after JDK-8312579" > > This reverts commit d82e89c469e91f78f9c2e5b28c725b0e1ba0fb8c. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15384#pullrequestreview-1591033917 From gbarany at openjdk.org Wed Aug 23 09:12:21 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Wed, 23 Aug 2023 09:12:21 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 17:20:58 GMT, Vladimir Kozlov wrote: > Instead of putting these flags to shared code consider doing it similar to C2 intrinsic flags in jvmci_globals.hpp: Done, please take another look. @shqking could you confirm that this fixes your build problem? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15384#issuecomment-1689440441 From gbarany at openjdk.org Wed Aug 23 09:12:35 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Wed, 23 Aug 2023 09:12:35 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 [v2] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 07:45:52 GMT, Hao Sun wrote: >> Gerg? Barany has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add copies of Vector API flags in jvmci_globals.hpp >> - Revert "8313530: VM build without C2 fails after JDK-8312579" >> >> This reverts commit d82e89c469e91f78f9c2e5b28c725b0e1ba0fb8c. > > src/hotspot/share/jvmci/jvmci_globals.hpp line 185: > >> 183: \ >> 184: NOT_COMPILER2(product(bool, UseVectorStubs, false, EXPERIMENTAL, \ >> 185: "Use stubs for vector transcendental operations")) \ > > nit: remove the backslash? > Suggestion: > > "Use stubs for vector transcendental operations")) I added the backslash for consistency with C2: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/c2_globals.hpp#L776-L778. It also makes future changes nicer, since appending more flags will not need to touch unrelated lines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15384#discussion_r1302631905 From haosun at openjdk.org Wed Aug 23 09:12:42 2023 From: haosun at openjdk.org (Hao Sun) Date: Wed, 23 Aug 2023 09:12:42 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 [v2] In-Reply-To: References: Message-ID: <3Cu73MPcDkwI58GvO2FAg4zzGD_rjxCUybwPCZ0UZ6M=.95cfee0c-a31e-46dc-b799-60b8f58e9f7e@github.com> On Wed, 23 Aug 2023 07:57:09 GMT, Gerg? Barany wrote: >> src/hotspot/share/jvmci/jvmci_globals.hpp line 185: >> >>> 183: \ >>> 184: NOT_COMPILER2(product(bool, UseVectorStubs, false, EXPERIMENTAL, \ >>> 185: "Use stubs for vector transcendental operations")) \ >> >> nit: remove the backslash? >> Suggestion: >> >> "Use stubs for vector transcendental operations")) > > I added the backslash for consistency with C2: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/c2_globals.hpp#L776-L778. It also makes future changes nicer, since appending more flags will not need to touch unrelated lines. Yes. Agree. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15384#discussion_r1302634925 From thartmann at openjdk.org Wed Aug 23 09:13:49 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Aug 2023 09:13:49 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v9] In-Reply-To: <-ssaBgw9bGq2MyUaNq_LfEONlBAhkOedksLfu1J0Jbo=.bce452bf-3953-4242-91ba-c7a4baf3bdf4@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> <-ssaBgw9bGq2MyUaNq_LfEONlBAhkOedksLfu1J0Jbo=.bce452bf-3953-4242-91ba-c7a4baf3bdf4@github.com> Message-ID: On Wed, 19 Jul 2023 13:36:27 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - riscv support > - improvements to test > - Merge branch 'master' into JDK-8308869 > - never common SubTypeCheckNode nodes > - keep both ways of doing profile > - whitespace > - reworked change > - Merge branch 'master' into JDK-8308869 > - more test failures > - Merge branch 'master' into JDK-8308869 > - ... and 6 more: https://git.openjdk.org/jdk/compare/674d5f17...8d9a08d1 I didn't get to review this yet but I plan to - probably only after a short vacation next week. I did run some performance and correctness testing though. Performance looks good (neutral). Correctness testing looks good too but found this single failure: `ProfileAtTypeCheck` fails IR verification with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`: Failed IR Rules (11) of Methods (9) ----------------------------------- 1) Method "public static void compiler.c2.irTests.ProfileAtTypeCheck.test10(boolean)" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={ITER_GVN1}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#SUBTYPE_CHECK#_", "1"}, failOn={}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "Iter GVN 1": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(SubTypeCheck.*)+(\\s){2}===.*)" - Failed comparison: [found] 2 = 1 [given] - Matched nodes (2): * 45 SubTypeCheck === _ 36 27 [[ 130 ]] profiled at: compiler.c2.irTests.ProfileAtTypeCheck::test10:3 !jvms: ProfileAtTypeCheck::test10 @ bci:3 (line 270) * 90 SubTypeCheck === _ 82 27 [[ 126 ]] profiled at: compiler.c2.irTests.ProfileAtTypeCheck::test10:16 !jvms: ProfileAtTypeCheck::test10 @ bci:16 (line 272) 2) Method "public static void compiler.c2.irTests.ProfileAtTypeCheck.test12()" - [Failed IR rules: 2]: * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={AFTER_PARSING}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#SUBTYPE_CHECK#_", "3"}, failOn={}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "After Parsing": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(SubTypeCheck.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 3 [given] - No nodes matched! * @IR rule 2: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={PHASEIDEALLOOP_ITERATIONS}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#SUBTYPE_CHECK#_", "1"}, failOn={}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "PhaseIdealLoop iterations": - NO compilation output found for this phase! Make sure this phase is emitted or remove it from the list of compile phases in the @IR rule to match on. 3) Method "public static void compiler.c2.irTests.ProfileAtTypeCheck.test15(java.lang.Object)" - [Failed IR rules: 1]: * @IR rule 3: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={MACRO_EXPANSION}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#CMP_P#_", "5", "_#LOAD_KLASS#_", "1", "_#LOAD_NKLASS#_", "1", "_#PARTIAL_SUBTYPE_CHECK#_", "1"}, failOn={}, applyIfAnd={"UseCompressedClassPointers", "true", "UseParallelGC", "true"}, applyIfOr={}, applyIfNot={})" > Phase "Macro expand": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(CmpP.*)+(\\s){2}===.*)" - Failed comparison: [found] 3 = 5 [given] - Matched nodes (3): * 39 CmpP === _ 10 38 [[ 40 ]] !jvms: ProfileAtTypeCheck::test15 @ bci:5 (line 429) * 117 CmpP === _ 116 38 [[ 118 ]] * 123 CmpP === _ 97 35 [[ 125 ]] !orig=[100] 4) Method "public static void compiler.c2.irTests.ProfileAtTypeCheck.test2(java.lang.Object)" - [Failed IR rules: 2]: * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={AFTER_PARSING}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#SUBTYPE_CHECK#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "After Parsing": - failOn: Graph contains forbidden nodes: * Constraint 1: "(\\d+(\\s){2}(SubTypeCheck.*)+(\\s){2}===.*)" - Matched forbidden node: * 39 SubTypeCheck === _ 30 21 [[ 53 ]] profiled at: compiler.c2.irTests.ProfileAtTypeCheck::test2:1 !jvms: ProfileAtTypeCheck::test2 @ bci:1 (line 102) * @IR rule 3: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={AFTER_PARSING}, applyIf={"UseCompressedClassPointers", "true"}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#CMP_P#_", "2", "_#LOAD_NKLASS#_", "1"}, failOn={}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "After Parsing": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(CmpP.*)+(\\s){2}===.*)" - Failed comparison: [found] 1 = 2 [given] - Matched node: * 25 CmpP === _ 10 24 [[ 26 ]] !jvms: ProfileAtTypeCheck::test2 @ bci:1 (line 102) * Constraint 2: "(\\d+(\\s){2}(LoadNKlass.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 5) Method "public static void compiler.c2.irTests.ProfileAtTypeCheck.test3(java.lang.Object)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={AFTER_PARSING}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#SUBTYPE_CHECK#_", "1"}, failOn={}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "After Parsing": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(SubTypeCheck.*)+(\\s){2}===.*)" - Failed comparison: [found] 2 = 1 [given] - Matched nodes (2): * 40 SubTypeCheck === _ 30 21 [[ 46 ]] profiled at: compiler.c2.irTests.ProfileAtTypeCheck::test3:1 !jvms: ProfileAtTypeCheck::test3 @ bci:1 (line 120) * 62 SubTypeCheck === _ 30 21 [[ 67 ]] profiled at: compiler.c2.irTests.ProfileAtTypeCheck::test3:8 !jvms: ProfileAtTypeCheck::test3 @ bci:8 (line 121) 6) Method "public static void compiler.c2.irTests.ProfileAtTypeCheck.test5(java.lang.Object)" - [Failed IR rules: 1]: * @IR rule 3: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={MACRO_EXPANSION}, applyIf={"UseCompressedClassPointers", "true"}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#CMP_P#_", "5", "_#LOAD_KLASS#_", "1", "_#LOAD_NKLASS#_", "1", "_#PARTIAL_SUBTYPE_CHECK#_", "1"}, failOn={}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "Macro expand": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(CmpP.*)+(\\s){2}===.*)" - Failed comparison: [found] 3 = 5 [given] - Matched nodes (3): * 25 CmpP === _ 10 24 [[ 26 ]] !jvms: ProfileAtTypeCheck::test5 @ bci:1 (line 161) * 108 CmpP === _ 107 24 [[ 109 ]] * 114 CmpP === _ 88 21 [[ 116 ]] !orig=[91] 7) Method "public static boolean compiler.c2.irTests.ProfileAtTypeCheck.test7(java.lang.Object)" - [Failed IR rules: 1]: * @IR rule 3: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={MACRO_EXPANSION}, applyIf={"UseCompressedClassPointers", "true"}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#CMP_P#_", "5", "_#LOAD_KLASS#_", "1", "_#LOAD_NKLASS#_", "1", "_#PARTIAL_SUBTYPE_CHECK#_", "1"}, failOn={}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "Macro expand": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(CmpP.*)+(\\s){2}===.*)" - Failed comparison: [found] 3 = 5 [given] - Matched nodes (3): * 26 CmpP === _ 10 25 [[ 27 ]] !jvms: ProfileAtTypeCheck::test7 @ bci:1 (line 198) * 86 CmpP === _ 85 25 [[ 87 ]] * 92 CmpP === _ 66 22 [[ 94 ]] !orig=[69] 8) Method "public static void compiler.c2.irTests.ProfileAtTypeCheck.test8(java.lang.Object)" - [Failed IR rules: 1]: * @IR rule 3: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={MACRO_EXPANSION}, applyIf={"UseCompressedClassPointers", "true"}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#CMP_P#_", "5", "_#LOAD_KLASS#_", "1", "_#LOAD_NKLASS#_", "1", "_#PARTIAL_SUBTYPE_CHECK#_", "1"}, failOn={}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "Macro expand": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(CmpP.*)+(\\s){2}===.*)" - Failed comparison: [found] 3 = 5 [given] - Matched nodes (3): * 25 CmpP === _ 10 24 [[ 26 ]] !jvms: ProfileAtTypeCheck::test8 @ bci:1 (line 216) * 108 CmpP === _ 107 24 [[ 109 ]] * 114 CmpP === _ 88 21 [[ 116 ]] !orig=[91] 9) Method "public static void compiler.c2.irTests.ProfileAtTypeCheck.test9(boolean,boolean,java.lang.Object,java.lang.Object)" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(applyIfCPUFeatureAnd={}, phase={PHASEIDEALLOOP1}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={"_#SUBTYPE_CHECK#_", "2"}, failOn={}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "PhaseIdealLoop 1": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(SubTypeCheck.*)+(\\s){2}===.*)" - Failed comparison: [found] 1 = 2 [given] - Matched node: * 104 SubTypeCheck === _ 166 87 [[ 140 ]] profiled at: compiler.c2.irTests.ProfileAtTypeCheck::test9:56 !jvms: ProfileAtTypeCheck::test9 @ bci:56 (line 253) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1689324669 From duke at openjdk.org Wed Aug 23 09:17:16 2023 From: duke at openjdk.org (Kimura Yukihiro) Date: Wed, 23 Aug 2023 09:17:16 GMT Subject: Integrated: 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 10:48:49 GMT, Kimura Yukihiro wrote: > I would like to fix this issue because it is difficult for testers to understand why the test failed. > There is no risk as I just added an assertion message instead of exit code error. > I would appreciate it if someone could review the fix. This pull request has now been integrated. Changeset: d1de3d08 Author: Kimura Yukihiro Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/d1de3d082ef9b83aaa68664e653ab09feb8bad87 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod 8313901: [TESTBUG] test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java fails with java.lang.VirtualMachineError Reviewed-by: shade, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/15329 From thartmann at openjdk.org Wed Aug 23 09:17:08 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 23 Aug 2023 09:17:08 GMT Subject: RFR: 8313262: C2: Sinking node may cause required cast to be dropped In-Reply-To: References: Message-ID: <23AyOBpeG-s_6LOj535F4sVvjMiaFVHYveWENigEBdQ=.a3d2ab4c-9cf3-4ab7-a6d1-a679937dc0ea@github.com> On Tue, 22 Aug 2023 07:29:13 GMT, Roland Westrelin wrote: > When a node is sunk out of a loop a cast node is created to pin the > node out of the loop. When a chain of nodes is sunk, we don't want a > cast node per node in the chain but rather one to pin the last of the > chain. So the logic for sinking nodes looks for unneeded cast > nodes. The test for what makes a cast unneeded is incorrect and causes > a cast to not null to be wrongly removed. Looks good to me too. All tests passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15380#pullrequestreview-1590984275 From tholenstein at openjdk.org Wed Aug 23 09:17:18 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 23 Aug 2023 09:17:18 GMT Subject: Integrated: JDK-8313626: C2 crash due to unexpected exception control flow In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 15:41:50 GMT, Tobias Holenstein wrote: > # Problem > The following JASM code: > > static Method test1:"()V" stack 1 { > try t; > invokestatic m:"()V"; > return; > > catch t java/lang/Throwable; > stack_map class java/lang/Throwable; > athrow; > endtry t; > } > > produces this java bytecode > > static void m(); > Code: > 0: return > > static void test1(); > Code: > 0: invokestatic #4 // Method m:()V > 3: return > 4: athrow > Exception table: > from to target type > 0 5 4 Class java/lang/Throwable > > > from https://docs.oracle.com/javase/specs/jvms/se20/jvms20.pdf _exception_table[] (p.116)_ > >> The values of the two items start_pc and end_pc indicate the ranges in the code array at which the exception handler is active. The value of start_pc must be a valid index into the code array of the opcode of an instruction. The value of end_pc either must be a valid index into the code array of the opcode of an instruction or must be equal to code_length, the length of the code array. The value of start_pc must be less than the value of end_pc. >> The start_pc is inclusive and end_pc is exclusive; that is, the exception handler must be active while the program counter is within the interval [start_pc, end_pc). >> >> handler_pc >> The value of the handler_pc item indicates the start of the exception handler. The value of the item must be a valid index into the code array and must be the index of the opcode of an instruction. > > and from _?athrow (p.420)_ > >> The objectref must be of type reference and must refer to an object that is an instance of class Throwable or of a subclass of Throwable. It is popped from the operand stack. The objectref is then thrown by searching the current method (?2.6) for the first exception handler that matches the class of objectref, as given by the algorithm in ?2.10. >> If an exception handler that matches objectref is found, it contains the location of the code intended to handle this exception. The pc register is reset to that location, the operand stack of the current frame is cleared, objectref is pushed back onto the operand stack, and execution continues. > > In out case: **[start_pc=0, end_pc=5)** and **handler_pc=4** and **objectref=Class java/lang/Throwable** > > By this definition we have indeed valid bytecode for `test1()`. Therefore we would expect C2 to create an infinite loop for > > 4: athrow > > > The C2 graph indeed shows an infinite loop 92/81: > graph1 > This reverts commit d82e89c469e91f78f9c2e5b28c725b0e1ba0fb8c. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15384#pullrequestreview-1592116436 From kvn at openjdk.org Wed Aug 23 17:52:20 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 23 Aug 2023 17:52:20 GMT Subject: RFR: 8314024: SIGSEGV in PhaseIdealLoop::build_loop_late_post_work due to bad immediate dominator info In-Reply-To: References: Message-ID: <-TuLBE3P2ZiYFLrIqOs1XlktXjcEnXkzR_0BH2OVuN4=.21592b78-7dac-465b-ae16-b8daf326f130@github.com> On Wed, 23 Aug 2023 09:15:38 GMT, Roland Westrelin wrote: > A node is sunk from the pre loop into the main loop. That node, in the > main loop, feeds into a test. When the node is sunk it is pinned > between the main and pre loop. The test it feeds into is then > eliminated by range check elimination: the sunk node becomes input to > an expression that computes the new bound of the pre loop. The > resulting graph is broken because the sunk node is pinned below the > pre loop but used by the exit test of the pre loop. > > The fix I propose is in `PhaseIdealLoop::try_sink_out_of_loop()`, to > skip nodes in pre loops that have a use in the companion main loop. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15399#pullrequestreview-1592119387 From kvn at openjdk.org Wed Aug 23 17:53:28 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 23 Aug 2023 17:53:28 GMT Subject: RFR: 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) [v2] In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 07:53:03 GMT, Roberto Casta?eda Lozano wrote: >> This changeset ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see [initial analysis](https://bugs.openjdk.org/browse/JDK-8312749?focusedCommentId=14600658&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14600658) by Erik ?sterlund and Stefan Karlsson and comments in the regression test included in this changeset. >> >> As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64) >> - tier4-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; ZGC-specific tests only) >> - tier6-9 (linux-x64; ZGC-specific tests only) >> - tier1-3, and a few custom examples, applying [JDK-8139457](https://github.com/openjdk/jdk/pull/11044) (under review) on top of this changeset >> >> ##### Performance >> >> Tested performance on the following set of OpenJDK micro-benchmarks, on linux-x64 (for both G1 and ZGC, using different ObjectAlignmentInBytes values): >> >> - `openjdk.bench.java.lang.ArrayClone.byteClone` >> - `openjdk.bench.java.lang.ArrayClone.intClone` >> - `openjdk.bench.java.lang.ArrayFiddle.simple_clone` >> - `openjdk.bench.java.lang.Clone.cloneLarge` >> - `openjdk.bench.java.lang.Clone.cloneThreeDifferent` >> >> No significant regression was observed. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove extra whitespace Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15288#pullrequestreview-1592121689 From duke at openjdk.org Wed Aug 23 19:20:31 2023 From: duke at openjdk.org (iaroslavski) Date: Wed, 23 Aug 2023 19:20:31 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v24] In-Reply-To: <6rPlyMxOjFDokiGkgh19dqzEqLak_g2yvEHECBcABm0=.4d751045-6e0c-48b6-a442-96b46b32bb64@github.com> References: <6rPlyMxOjFDokiGkgh19dqzEqLak_g2yvEHECBcABm0=.4d751045-6e0c-48b6-a442-96b46b32bb64@github.com> Message-ID: <5zAmfd1sE_GUThVqykIxfFLHWL8YqmL9b52Fr7r06j0=.9b01b427-2d89-408f-86fc-681b3b05b933@github.com> On Wed, 23 Aug 2023 12:57:19 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> An explanation for the path chosen in the PR to bring in the SIMD Arrays.sort at the top level instead of only bringing in the smaller components from the algorithm is as follows: the key components of Arrays.sort are pivot selection, partitioning, partition sort. Among these, the two hottest components are partitioning and partition sort. Both could be individually accelerated using SIMD implementations. However, what we noticed was that just bringing in these two individual optimizations gave us half the performance gain versus bringing in the entire AVX512 SIMD sort. AVX512 SIMD sort implements a single-pivot quicksort algorithm (SPQS) by selecting a single pivot and then recursively partitioning the array into two smaller partitions using SIMD instructions. When the partition size becomes less than or equal to 128, it uses a SIMD bitonic sort using x86 AVX512 intrinsics to sort that partition. However, the default implementation of Arrays.sort() in Java is the dual pivot quick sort (DPQS) not the SPQS. If the partitioning in the DPQS is implemented using AVX512, it needs two passes of the single-pivot AVX512 partitioning function (instead of just one in the case of SPQS), thereby leading to loss of 50% performance. >> >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> ... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update avx512-common-qsort.h And please add already sorted arrays (ascending and descending) to benchmarking ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1690508956 From duke at openjdk.org Wed Aug 23 23:05:55 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 23 Aug 2023 23:05:55 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v25] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > An explanation for the path chosen in the PR to bring in the SIMD Arrays.sort at the top level instead of only bringing in the smaller components from the algorithm is as follows: the key components of Arrays.sort are pivot selection, partitioning, partition sort. Among these, the two hottest components are partitioning and partition sort. Both could be individually accelerated using SIMD implementations. However, what we noticed was that just bringing in these two individual optimizations gave us half the performance gain versus bringing in the entire AVX512 SIMD sort. AVX512 SIMD sort implements a single-pivot quicksort algorithm (SPQS) by selecting a single pivot and then recursively partitioning the array into two smaller partitions using SIMD instructions. When the partition size becomes less than or equal to 128, it uses a SIMD bitonic sort using x86 AVX512 intrinsics to sort that partition. However, the default implementation of Arrays.sort() in Java is the dual pivot quick sort (DPQS) not the SPQS. If the partitioning in the DPQS is implemented using AVX512, it needs two passes of the single-pivot AVX512 partitioning function (instead of just one in the case of SPQS), thereby leading to loss of 50% performance. > > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update copyright for DPQS.java; replace avx512 pivot calculation with scalar version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/8b80b80b..96cdd190 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=23-24 Stats: 82 lines in 5 files changed: 17 ins; 45 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Wed Aug 23 23:05:55 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 23 Aug 2023 23:05:55 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v23] In-Reply-To: References: <1yCo55YXhhweh_3xXTORwBCZNnQjYneqD3xMxV_SbQE=.b1e286fa-a5fd-4236-84d8-255b62f1b627@github.com> Message-ID: On Wed, 23 Aug 2023 11:42:44 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Decomposed DPQS using AVX512 partitioning and AVX512 sort (for small arrays). Works for serial and parallel sort. > > src/java.base/share/classes/java/util/DualPivotQuicksort.java line 27: > >> 25: >> 26: package java.util; >> 27: > > Please update copyright header for this file. Please see the updated the copyright. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1303623870 From duke at openjdk.org Wed Aug 23 23:29:47 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 23 Aug 2023 23:29:47 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v26] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > > **Arrays.sort performance data using JMH benchmarks for arrays with random data** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | > | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | > | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | > | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | > | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | > | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | > | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | > | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | > | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | > | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | > | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | > | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | > | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | > | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | > | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | > | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | > | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | > | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | > | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | > | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | > | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | > | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | > | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | > | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | > | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | > | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | > | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | > | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | > | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | > | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | > | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | > | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | > | ArraysSort.longSort | 1000 | 10.449 | 6.239 | 1.7 | > | ArraysSort.longSort | 10000 | 307.074 | 70.284 | **4.4** | > | ArraysSor... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add parallelSort benchmarking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/96cdd190..51738491 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=24-25 Stats: 24 lines in 1 file changed: 24 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From cslucas at openjdk.org Wed Aug 23 23:51:26 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 23 Aug 2023 23:51:26 GMT Subject: RFR: 8314452: Explicitly indicate inlining success/failure in PrintInlining In-Reply-To: <68cnReOy5vHWDIXC2XLw-uKOQkrPHw7FtggrkAh74ik=.86c263b6-c757-49bc-8dd3-6e00fc381436@github.com> References: <68cnReOy5vHWDIXC2XLw-uKOQkrPHw7FtggrkAh74ik=.86c263b6-c757-49bc-8dd3-6e00fc381436@github.com> Message-ID: <6WVM06Xu2TAVhgYi6ElUTMXJCtqNCH6GIOUVxQBwRuY=.23376624-dd8b-47f9-8f1c-ed760bd312b7@github.com> On Mon, 21 Aug 2023 09:45:16 GMT, Tobias Hartmann wrote: > Why not simply add a "failed to inline:" message? Something like: +1 to this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15315#issuecomment-1690785151 From duke at openjdk.org Thu Aug 24 06:26:42 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 24 Aug 2023 06:26:42 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22] In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 17:23:01 GMT, Srinivas Vamsi Parasa wrote: > Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. Hello Vladimir (@vnkozlov) , As per your suggestion, the implementation was adjusted to address the regressions caused for STAGGER and REPEATED type of input data patterns. Please see below the new JMH performance data using the adjusted implementation. In the new implementation, we don't call the AVX512 sort intrinsic at the top level (`Arrays.sort()`) . Instead, we take a decomposed or a 'building blocks' approach where we only intrinsify (using AVX512 instructions) the partitioning and small array sort functions used in the current baseline ( `DualPivotQuickSort.sort()` ). Since the current baseline has logic to detect and sort special input patterns like STAGGER, we fallback to the current baseline instead of using AVX512 partitioning and sorting (which works best for RANDOM, REPEATED and SHUFFLE patterns). Data below shows `Arrays.sort()` performance comparison between the current **Java baseline (DPQS)** vs. **AVX512 sort** (this PR) using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568) - The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort` - The scores shown are the average time (us/op), thus lower is better. The last column towards the right shows the speedup. | Benchmark | Mode | Size | Baseline DPQS (us/op) | AVX512 partitioning & sort (us/op) | Speedup | | --- | --- | --- | --- | --- | --- | | ArraysSort.Double.testSort | RANDOM | 800 | 6.7 | 4.8 | 1.39x | | ArraysSort.Double.testSort | RANDOM | 7000 | 234.1 | 51.5 | **4.55x** | | ArraysSort.Double.testSort | RANDOM | 50000 | 2155.9 | 470.0 | **4.59x** | | ArraysSort.Double.testSort | RANDOM | 300000 | 15076.3 | 3391.3 | **4.45x** | | ArraysSort.Double.testSort | RANDOM | 2000000 | 116445.5 | 27491.7 | **4.24x** | | ArraysSort.Double.testSort | REPEATED | 800 | 2.3 | 1.7 | 1.35x | | ArraysSort.Double.testSort | REPEATED | 7000 | 23.3 | 12.1 | **1.92x** | | ArraysSort.Double.testSort | REPEATED | 50000 | 460.9 | 151.9 | **3.03x** | | ArraysSort.Double.testSort | REPEATED | 300000 | 2935.1 | 1082.5 | **2.71x** | | ArraysSort.Double.testSort | REPEATED | 2000000 | 19533.8 | 8158.6 | **2.39x** | | ArraysSort.Double.testSort | SHUFFLE | 800 | 4.6 | 4.4 | 1.04x | | ArraysSort.Double.testSort | SHUFFLE | 7000 | 86.7 | 48.7 | **1.78x** | | ArraysSort.Double.testSort | SHUFFLE | 50000 | 839.0 | 436.4 | **1.92x** | | ArraysSort.Double.testSort | SHUFFLE | 300000 | 5761.0 | 3036.9 | **1.90x** | | ArraysSort.Double.testSort | SHUFFLE | 2000000 | 38044.3 | 23917.2 | 1.59x | | ArraysSort.Double.testSort | STAGGER | 800 | 2.5 | 2.6 | 0.96x | | ArraysSort.Double.testSort | STAGGER | 7000 | 28.6 | 22.7 | 1.26x | | ArraysSort.Double.testSort | STAGGER | 50000 | 141.4 | 142.6 | 0.99x | | ArraysSort.Double.testSort | STAGGER | 300000 | 1069.9 | 892.6 | 1.20x | | ArraysSort.Double.testSort | STAGGER | 2000000 | 6693.0 | 6686.1 | 1.00x | | ArraysSort.Float.testSort | RANDOM | 800 | 6.8 | 4.1 | 1.66x | | ArraysSort.Float.testSort | RANDOM | 7000 | 233.0 | 38.7 | **6.02x** | | ArraysSort.Float.testSort | RANDOM | 50000 | 2202.2 | 353.9 | **6.22x** | | ArraysSort.Float.testSort | RANDOM | 300000 | 15084.3 | 2374.9 | **6.35x** | | ArraysSort.Float.testSort | RANDOM | 2000000 | 117961.2 | 17431.5 | **6.77x** | | ArraysSort.Float.testSort | REPEATED | 800 | 2.3 | 1.5 | 1.53x | | ArraysSort.Float.testSort | REPEATED | 7000 | 28.2 | 8.7 | **3.24x** | | ArraysSort.Float.testSort | REPEATED | 50000 | 467.5 | 118.4 | **3.95x** | | ArraysSort.Float.testSort | REPEATED | 300000 | 2976.0 | 974.8 | **3.05x** | | ArraysSort.Float.testSort | REPEATED | 2000000 | 18910.0 | 6574.2 | **2.88x** | | ArraysSort.Float.testSort | SHUFFLE | 800 | 4.6 | 3.5 | 1.30x | | ArraysSort.Float.testSort | SHUFFLE | 7000 | 84.2 | 36.3 | **2.32x** | | ArraysSort.Float.testSort | SHUFFLE | 50000 | 838.9 | 323.9 | **2.59x** | | ArraysSort.Float.testSort | SHUFFLE | 300000 | 5704.2 | 2246.8 | **2.54x** | | ArraysSort.Float.testSort | SHUFFLE | 2000000 | 37341.8 | 16043.9 | **2.33x** | | ArraysSort.Float.testSort | STAGGER | 800 | 2.2 | 2.3 | 0.99x | | ArraysSort.Float.testSort | STAGGER | 7000 | 20.5 | 20.9 | 0.98x | | ArraysSort.Float.testSort | STAGGER | 50000 | 132.1 | 130.4 | 1.01x | | ArraysSort.Float.testSort | STAGGER | 300000 | 802.9 | 836.3 | 0.96x | | ArraysSort.Float.testSort | STAGGER | 2000000 | 5584.2 | 5587.9 | 1.00x | | ArraysSort.Int.testSort | RANDOM | 800 | 6.2 | 3.4 | 1.84x | | ArraysSort.Int.testSort | RANDOM | 7000 | 210.0 | 31.9 | **6.59x** | | ArraysSort.Int.testSort | RANDOM | 50000 | 2068.5 | 297.9 | **6.94x** | | ArraysSort.Int.testSort | RANDOM | 300000 | 14058.4 | 2104.9 | **6.68x** | | ArraysSort.Int.testSort | RANDOM | 2000000 | 114645.1 | 15266.0 | **7.51x** | | ArraysSort.Int.testSort | REPEATED | 800 | 1.6 | 0.9 | 1.76x | | ArraysSort.Int.testSort | REPEATED | 7000 | 25.2 | 3.5 | **7.15x** | | ArraysSort.Int.testSort | REPEATED | 50000 | 332.4 | 26.8 | **12.39x** | | ArraysSort.Int.testSort | REPEATED | 300000 | 2012.2 | 147.5 | **13.64x** | | ArraysSort.Int.testSort | REPEATED | 2000000 | 11870.5 | 1099.9 | **10.79x** | | ArraysSort.Int.testSort | SHUFFLE | 800 | 4.4 | 2.9 | 1.53x | | ArraysSort.Int.testSort | SHUFFLE | 7000 | 79.5 | 30.4 | **2.61x** | | ArraysSort.Int.testSort | SHUFFLE | 50000 | 771.2 | 275.6 | **2.80x** | | ArraysSort.Int.testSort | SHUFFLE | 300000 | 5140.2 | 1995.7 | **2.58x** | | ArraysSort.Int.testSort | SHUFFLE | 2000000 | 34605.7 | 14190.4 | **2.44x** | | ArraysSort.Int.testSort | STAGGER | 800 | 1.7 | 1.7 | 0.99x | | ArraysSort.Int.testSort | STAGGER | 7000 | 15.8 | 15.9 | 1.00x | | ArraysSort.Int.testSort | STAGGER | 50000 | 97.3 | 96.9 | 1.00x | | ArraysSort.Int.testSort | STAGGER | 300000 | 588.9 | 596.9 | 0.99x | | ArraysSort.Int.testSort | STAGGER | 2000000 | 3940.4 | 4006.4 | 0.98x | | ArraysSort.Long.testSort | RANDOM | 800 | 6.4 | 4.9 | 1.30x | | ArraysSort.Long.testSort | RANDOM | 7000 | 205.4 | 53.3 | **3.85x** | | ArraysSort.Long.testSort | RANDOM | 50000 | 2015.6 | 483.1 | **4.17x** | | ArraysSort.Long.testSort | RANDOM | 300000 | 14100.0 | 3485.8 | **4.04x** | | ArraysSort.Long.testSort | RANDOM | 2000000 | 108740.4 | 27978.6 | **3.89x** | | ArraysSort.Long.testSort | REPEATED | 800 | 1.6 | 1.2 | 1.33x | | ArraysSort.Long.testSort | REPEATED | 7000 | 15.9 | 7.4 | **2.13x** | | ArraysSort.Long.testSort | REPEATED | 50000 | 287.8 | 44.9 | **6.41x** | | ArraysSort.Long.testSort | REPEATED | 300000 | 1969.6 | 287.2 | **6.86x** | | ArraysSort.Long.testSort | REPEATED | 2000000 | 12095.4 | 2821.4 | **4.29x** | | ArraysSort.Long.testSort | SHUFFLE | 800 | 4.2 | 4.5 | 0.95x | | ArraysSort.Long.testSort | SHUFFLE | 7000 | 82.7 | 51.2 | 1.62x | | ArraysSort.Long.testSort | SHUFFLE | 50000 | 757.1 | 443.3 | 1.71x | | ArraysSort.Long.testSort | SHUFFLE | 300000 | 5196.6 | 3140.8 | 1.65x | | ArraysSort.Long.testSort | SHUFFLE | 2000000 | 34367.7 | 24646.1 | 1.39x | | ArraysSort.Long.testSort | STAGGER | 800 | 1.9 | 2.0 | 0.98x | | ArraysSort.Long.testSort | STAGGER | 7000 | 20.4 | 21.2 | 0.96x | | ArraysSort.Long.testSort | STAGGER | 50000 | 111.0 | 109.3 | 1.02x | | ArraysSort.Long.testSort | STAGGER | 300000 | 681.4 | 699.8 | 0.97x | | ArraysSort.Long.testSort | STAGGER | 2000000 | 5127.1 | 5040.3 | 1.02x | Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1691073640 From gbarany at openjdk.org Thu Aug 24 07:25:30 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Thu, 24 Aug 2023 07:25:30 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 [v2] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 09:11:49 GMT, Gerg? Barany wrote: >> The EnableVectorSupport flag is declared in `opto/c2_globals.hpp`, which is not included if `COMPILER2` is not set. But after my changes for [JDK-8312579](https://bugs.openjdk.org/browse/JDK-8312579) we try to access this flag in some places guarded by `#if COMPILER2_OR_JVMCI`. >> >> This PR moves some flags from `c2_globals.hpp` to the shared `compiler_globals.hpp`, so that they are accessible even if C2 is disabled but JVMCI is enabled. > > Gerg? Barany has updated the pull request incrementally with two additional commits since the last revision: > > - Add copies of Vector API flags in jvmci_globals.hpp > - Revert "8313530: VM build without C2 fails after JDK-8312579" > > This reverts commit d82e89c469e91f78f9c2e5b28c725b0e1ba0fb8c. Thanks everyone for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15384#issuecomment-1691142787 From roland at openjdk.org Thu Aug 24 08:03:52 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 24 Aug 2023 08:03:52 GMT Subject: RFR: 8314580: PhaseIdealLoop::transform_long_range_checks fails with assert "was tested before" Message-ID: For long counted loops, `PhaseIdealLoop::create_loop_nest()` first goes over the loop body to collect range checks, then transforms the long counted loop into a loop nest and then goes over the list of range checks it collected to transfrom them. For that last step, `PhaseIdealLoop::transform_long_range_checks()` needs to extract the parameters of the range check from the range check expression. It should still recognize the range check expression even though the loop was transformed in the meantime. That's what fails here. The reason is that the range check expression uses the long loop increment as input which, in the creation of the loop nest, is transformed to `outer phi + inner incr`. That breaks pattern matching of the range check expression. I propose removing the transformation: incr=>(outer_phi+inner_incr) entireley. After looking at this code again, I don't think it's needed. The transformation: phi=>(outer_phi+inner_phi) should be all that's needed to correctly transform the loop. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/15411/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15411&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314580 Stats: 52 lines in 2 files changed: 46 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15411.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15411/head:pull/15411 PR: https://git.openjdk.org/jdk/pull/15411 From gbarany at openjdk.org Thu Aug 24 08:08:39 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Thu, 24 Aug 2023 08:08:39 GMT Subject: Integrated: 8313530: VM build without C2 fails after JDK-8312579 In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 09:04:32 GMT, Gerg? Barany wrote: > The EnableVectorSupport flag is declared in `opto/c2_globals.hpp`, which is not included if `COMPILER2` is not set. But after my changes for [JDK-8312579](https://bugs.openjdk.org/browse/JDK-8312579) we try to access this flag in some places guarded by `#if COMPILER2_OR_JVMCI`. > > This PR moves some flags from `c2_globals.hpp` to the shared `compiler_globals.hpp`, so that they are accessible even if C2 is disabled but JVMCI is enabled. This pull request has now been integrated. Changeset: c418933d Author: Gerg? Barany Committer: Jie Fu URL: https://git.openjdk.org/jdk/commit/c418933d32a4e158f0e526d1be27b4b00f0c08a6 Stats: 13 lines in 1 file changed: 12 ins; 0 del; 1 mod 8313530: VM build without C2 fails after JDK-8312579 Reviewed-by: dnsimon, haosun, jiefu, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15384 From duke at openjdk.org Thu Aug 24 08:31:32 2023 From: duke at openjdk.org (emmyyin) Date: Thu, 24 Aug 2023 08:31:32 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 11:38:40 GMT, Christian Hagedorn wrote: >> emmyyin has updated the pull request incrementally with one additional commit since the last revision: >> >> fixing trailing ws > > src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 807: > >> 805: n.preds.add(result); >> 806: result.to = n; >> 807: result.relativeTo = n.width / 2; > > `n.width` equals `DUMMY_WIDTH` here which is 1. `n.width / 2` is therefore always zero. Is it intended to set `result.relativeTo` and `e.relativeFrom` to zero? Same further down in `expandNewLayerBeneath()`. Yes since `relativeTo` and `relativeFrom` refers to the ports on the node, which is irrelevant for the dummy nodes. Could definitely be done differently, but this is how it is in `HierarchicalLayoutManager` and I thought it would be better to be consistent across the layout managers ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1303987350 From duke at openjdk.org Thu Aug 24 08:50:28 2023 From: duke at openjdk.org (emmyyin) Date: Thu, 24 Aug 2023 08:50:28 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v8] In-Reply-To: References: Message-ID: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with one additional commit since the last revision: Code fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/97036439..4b933be0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=06-07 Stats: 379 lines in 6 files changed: 48 ins; 307 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From duke at openjdk.org Thu Aug 24 09:00:22 2023 From: duke at openjdk.org (emmyyin) Date: Thu, 24 Aug 2023 09:00:22 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v9] In-Reply-To: References: Message-ID: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with two additional commits since the last revision: - removing trailing ws - removing redundant code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/4b933be0..c0eec085 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=07-08 Stats: 21 lines in 2 files changed: 0 ins; 20 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From duke at openjdk.org Thu Aug 24 09:02:35 2023 From: duke at openjdk.org (emmyyin) Date: Thu, 24 Aug 2023 09:02:35 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v5] In-Reply-To: References: <7RAXRCktzo9NArTyV_NBwxxLl7zgCaxurRLwwwKzeAM=.91d9abbe-12e0-43af-8e5c-b13052629a56@github.com> Message-ID: On Tue, 22 Aug 2023 11:52:46 GMT, Christian Hagedorn wrote: >> This is to make the loop break once we hit the non-dummy node where the edge goes from. I.e. for the edge (u,v) with lots of dummy nodes in between nodes u and v, we only want to remove the dummy nodes and then break the loop as soon as we are at node u. > > I still don't understand why you need `n.vertex == null` here. If `n.vertex != null`, then the loop continuation test `n.vertex == null && found` will be false and we will not perform another iteration. Exactly, if `n.vertex == null` we want to break the loop. There are two cases we need to consider when removing the dummy nodes: 1) there is a long chain of dummy nodes between node u and v, and 2) the edge is part of an edge concentration of multiple edges. In case 1) we just traverse the edge with all the dummy nodes and remove them as we go until we hit node u (`n.vertex != null`). In case 2) we traverse along the edge until we find the anchor node (which is a dummy node with one ingoing edge and multiple outgoing edges), and break the loop when we reach the anchor node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1304025731 From chagedorn at openjdk.org Thu Aug 24 09:06:30 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 24 Aug 2023 09:06:30 GMT Subject: RFR: 8314024: SIGSEGV in PhaseIdealLoop::build_loop_late_post_work due to bad immediate dominator info In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 09:15:38 GMT, Roland Westrelin wrote: > A node is sunk from the pre loop into the main loop. That node, in the > main loop, feeds into a test. When the node is sunk it is pinned > between the main and pre loop. The test it feeds into is then > eliminated by range check elimination: the sunk node becomes input to > an expression that computes the new bound of the pre loop. The > resulting graph is broken because the sunk node is pinned below the > pre loop but used by the exit test of the pre loop. > > The fix I propose is in `PhaseIdealLoop::try_sink_out_of_loop()`, to > skip nodes in pre loops that have a use in the companion main loop. Looks good but I'm wondering if we could also bail out in Range Check Elimination instead, if we find that `get_ctrl()` of one of the involved data nodes does not dominate the pre loop exit test. What do you think? test/hotspot/jtreg/compiler/loopopts/TestNodeSunkFromPreLoop.java line 28: > 26: * @bug 8314024 > 27: * @summary Node used in check in main loop sunk from pre loop before RC elimination > 28: * @run main/othervm -XX:-BackgroundCompilation -XX:-UseLoopPredicate TestNodeSunkFromPreLoop You should add a `@requires vm.compiler2.enabled` since `UseLoopPredicate` is a C2 only flag. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15399#pullrequestreview-1593102702 PR Review Comment: https://git.openjdk.org/jdk/pull/15399#discussion_r1304014502 From rcastanedalo at openjdk.org Thu Aug 24 09:13:30 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 24 Aug 2023 09:13:30 GMT Subject: RFR: 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) [v2] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 17:50:44 GMT, Vladimir Kozlov wrote: > Looks good. Thanks for reviewing, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15288#issuecomment-1691308343 From dnsimon at openjdk.org Thu Aug 24 10:15:37 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 24 Aug 2023 10:15:37 GMT Subject: RFR: 8314819: [JVMCI] HotSpotJVMCIRuntime.lookupType throws unexpected ClassNotFoundException [v2] In-Reply-To: References: Message-ID: <50C6i8OX3DkF74vH7x09KUDVK_OiAziFsl1GpYl3EYo=.4817de99-a8e3-47cf-a4d4-6bccbaa8159b@github.com> On Wed, 23 Aug 2023 09:11:33 GMT, Doug Simon wrote: >> This PR restores the expected behavior prior to [JDK-8313421](https://bugs.openjdk.org/browse/JDK-8313421) whereby `HotSpotJVMCIRuntime.lookupType` throws `NoClassDefFoundError` instead of `ClassNotFoundException`. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed and expanded testing related to CompilerToVM.lookupType Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15393#issuecomment-1691398807 From dnsimon at openjdk.org Thu Aug 24 10:15:39 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 24 Aug 2023 10:15:39 GMT Subject: Integrated: 8314819: [JVMCI] HotSpotJVMCIRuntime.lookupType throws unexpected ClassNotFoundException In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 21:01:26 GMT, Doug Simon wrote: > This PR restores the expected behavior prior to [JDK-8313421](https://bugs.openjdk.org/browse/JDK-8313421) whereby `HotSpotJVMCIRuntime.lookupType` throws `NoClassDefFoundError` instead of `ClassNotFoundException`. This pull request has now been integrated. Changeset: 75e19e0d Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/75e19e0d5e6a705bcd10a9f9afbb6fdc3939adbb Stats: 69 lines in 5 files changed: 41 ins; 7 del; 21 mod 8314819: [JVMCI] HotSpotJVMCIRuntime.lookupType throws unexpected ClassNotFoundException Reviewed-by: never, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/15393 From chagedorn at openjdk.org Thu Aug 24 10:55:34 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 24 Aug 2023 10:55:34 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: References: Message-ID: <92KNVrJzmYZRK8wdxR487MYQmKotzFwAA0BucY9ERfw=.f1ced558-d440-4c66-af7f-a40a8a09ed03@github.com> On Thu, 24 Aug 2023 08:28:44 GMT, emmyyin wrote: >> src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 807: >> >>> 805: n.preds.add(result); >>> 806: result.to = n; >>> 807: result.relativeTo = n.width / 2; >> >> `n.width` equals `DUMMY_WIDTH` here which is 1. `n.width / 2` is therefore always zero. Is it intended to set `result.relativeTo` and `e.relativeFrom` to zero? Same further down in `expandNewLayerBeneath()`. > > Yes since `relativeTo` and `relativeFrom` refers to the ports on the node, which is irrelevant for the dummy nodes. Could definitely be done differently, but this is how it is in `HierarchicalLayoutManager` and I thought it would be better to be consistent across the layout managers Okay, thanks for the explanation. The code in `HierarchicalLayoutManager` looks very similar. So, you could also change `relativeTo` and `relativeFrom` to zero there. Or even better share the code somehow (if I see that correctly, the only difference is how to insert the node - `nodes.add(n)` vs. `insertNode(n, layer)`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1304147633 From chagedorn at openjdk.org Thu Aug 24 10:55:36 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 24 Aug 2023 10:55:36 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v5] In-Reply-To: References: <7RAXRCktzo9NArTyV_NBwxxLl7zgCaxurRLwwwKzeAM=.91d9abbe-12e0-43af-8e5c-b13052629a56@github.com> Message-ID: On Thu, 24 Aug 2023 08:59:58 GMT, emmyyin wrote: >> I still don't understand why you need `n.vertex == null` here. If `n.vertex != null`, then the loop continuation test `n.vertex == null && found` will be false and we will not perform another iteration. > > Exactly, if `n.vertex == null` we want to break the loop. There are two cases we need to consider when removing the dummy nodes: 1) there is a long chain of dummy nodes between node u and v, and 2) the edge is part of an edge concentration of multiple edges. In case 1) we just traverse the edge with all the dummy nodes and remove them as we go until we hit node u (`n.vertex != null`). In case 2) we traverse along the edge until we find the anchor node (which is a dummy node with one ingoing edge and multiple outgoing edges), and break the loop when we reach the anchor node. The `found` variable is to ensure the dummy node is actually connected to something. Not sure if that part is actually needed Thanks for explaining it in more details. We probably had a misunderstanding here, though. Toby and I were referring to this `n.vertex == null` here: if (n.vertex == null && n.succs.size() <= 1 && n.preds.size() <= 1) and not the one in while (n.vertex == null && found) >From your explanation it makes sense to keep the one in the `while` but the other one in the `if` always seems to be true and could thus be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1304150782 From rcastanedalo at openjdk.org Thu Aug 24 11:45:28 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 24 Aug 2023 11:45:28 GMT Subject: RFR: 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) [v2] In-Reply-To: References: Message-ID: On Mon, 21 Aug 2023 10:23:06 GMT, Albert Mingkun Yang wrote: > If I understand it correctly, much of the diff is to ensure that `ArrayCopyNode::make` (in `BarrierSetC2::clone`) gets the correct value for the `length` arg, calculated as `align_up(array-length * elem-size, word-size) / word-size`. > > I wonder if it's possible to pass the actual array length (#slots) as `length` and move the merge-bytes-to-words-copying optimization to a lower level, e.g. inside `conjoint_jbytes`. Ofc, `BarrierSetC2::clone_at_expansion` and its derived siblings need to be adjusted accordingly, e.g. to use the actual elem-type. > > (Preexisting: having `ArrayCopyNode` to cover both array and instance cloning hinders the readability, IMO.) Thanks for looking at this, Albert! I agree that the code could benefit from some clean-up, and postponing the merge-bytes-to-words-copying optimization to at least BarrierSetC2::clone_at_expansion() is worth exploring. However, your suggested refactoring would not be trivial, so I suggest to integrate this fix as-is and address the simplification in a separate RFE. Please let me know if you agree, and, if so, I will create a separate RFE for your suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15288#issuecomment-1691520338 From chagedorn at openjdk.org Thu Aug 24 11:57:53 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 24 Aug 2023 11:57:53 GMT Subject: RFR: 8314513: [IR Framework] Some internal IR Framework tests are failing after JDK-8310308 on PPC and Cascade Lake Message-ID: This patch fixes some internal IR framework failures after [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308): - `testlibrary_tests/ir_framework/tests/TestBadFormat.java` on Linux ppc64le: - `applyIfCPUFeature` clauses are false and the rule is not run. We will therefore not hit the format violations which the test expects to find. The fix here is to remove the `applyIfCPUFeature` constraints as the test is only interested in properly reporting format violations. - `testlibrary_tests/ir_framework/examples/IRExample.java` on Cascade Lake x86_64: - On Cascade Lake, `failOn` constraints need the same "always true" handling as `counts` constraints. This was missed in JDK-8310308. I've added the same `try-catch` handling as in `RawCountsConstraint::parse()`: https://github.com/openjdk/jdk/blob/97b94cb1cdeba00f4bba7326a300c0336950f3ec/test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java#L97-L104 Thanks to @MBaesken and @TheRealMDoerr for reporting this and helping with some pre-PR testing. Would you like to rerun your testing on PPC and Cascade Lake again? Thanks, Christian ------------- Commit messages: - 8314513: [IR Framework] Some internal IR Framework tests are failing after JDK-8310308 on PPC and Cascade Lake Changes: https://git.openjdk.org/jdk/pull/15415/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15415&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314513 Stats: 30 lines in 2 files changed: 5 ins; 12 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/15415.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15415/head:pull/15415 PR: https://git.openjdk.org/jdk/pull/15415 From mdoerr at openjdk.org Thu Aug 24 14:05:43 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 24 Aug 2023 14:05:43 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API Message-ID: I've found a way to solve the remaining FFI problem on linux PPC64 Big Endian. Large structs (>8 Bytes) which are passed in registers or on stack require shifting the Bytes in the last slot if the size is not a multiple of 8. This PR adds the required functionality to the Java code. Please review and provide feedback. There may be better ways to implement it. I just found one which works and makes the tests pass: Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/jdk/java/foreign 88 88 0 0 Note: This PR should be considered as preparation work for AIX which also uses ABIv1. ------------- Commit messages: - 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API Changes: https://git.openjdk.org/jdk/pull/15417/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15417&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314949 Stats: 235 lines in 10 files changed: 225 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/15417.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15417/head:pull/15417 PR: https://git.openjdk.org/jdk/pull/15417 From jiefu at openjdk.org Thu Aug 24 14:24:39 2023 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 24 Aug 2023 14:24:39 GMT Subject: RFR: 8314951: VM build without C2 still fails after JDK-8313530 Message-ID: JDK-8313530 fixed the release VM build but the debug VM build would still fail. This patch fix the debug VM build failure. Thanks. ------------- Commit messages: - 8314951: VM build without C2 still fails after JDK-8313530 Changes: https://git.openjdk.org/jdk/pull/15419/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15419&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314951 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15419.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15419/head:pull/15419 PR: https://git.openjdk.org/jdk/pull/15419 From jiefu at openjdk.org Thu Aug 24 14:26:38 2023 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 24 Aug 2023 14:26:38 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 [v2] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 07:50:37 GMT, Hao Sun wrote: > Verified with Linux/AArch64 and Linux/x86_64 that VM build without C2 is passed now. The debug build still fails: https://github.com/openjdk/jdk/pull/15419 . So I assume the verified builds were all release builds, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15384#issuecomment-1691784655 From dnsimon at openjdk.org Thu Aug 24 14:35:28 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 24 Aug 2023 14:35:28 GMT Subject: RFR: 8314951: VM build without C2 still fails after JDK-8313530 In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 14:16:38 GMT, Jie Fu wrote: > JDK-8313530 fixed the release VM build but the debug VM build would still fail. > This patch fix the debug VM build failure. > Thanks. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15419#pullrequestreview-1593762549 From qamai at openjdk.org Thu Aug 24 16:02:57 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 24 Aug 2023 16:02:57 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved [v4] In-Reply-To: References: Message-ID: > Hi, > > This patch removes the early return in `AddNode::Value` in case one of the inputs is a bottom, which may affect the value calculation of nodes such as `Min/MaxNode`. > > Please kindly review, thanks very much. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into values - address review - fix min/maxfp nodes - AddNode::Value should not return early - AddNode::Value should not return early ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15021/files - new: https://git.openjdk.org/jdk/pull/15021/files/4ae1ad36..0a117343 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15021&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15021&range=02-03 Stats: 36909 lines in 1458 files changed: 18937 ins; 7251 del; 10721 mod Patch: https://git.openjdk.org/jdk/pull/15021.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15021/head:pull/15021 PR: https://git.openjdk.org/jdk/pull/15021 From kvn at openjdk.org Thu Aug 24 20:26:08 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 24 Aug 2023 20:26:08 GMT Subject: RFR: 8314951: VM build without C2 still fails after JDK-8313530 In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 14:16:38 GMT, Jie Fu wrote: > JDK-8313530 fixed the release VM build but the debug VM build would still fail. > This patch fix the debug VM build failure. > Thanks. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15419#pullrequestreview-1594399395 From ayang at openjdk.org Thu Aug 24 20:42:11 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 24 Aug 2023 20:42:11 GMT Subject: RFR: 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) [v2] In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 07:53:03 GMT, Roberto Casta?eda Lozano wrote: >> This changeset ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see [initial analysis](https://bugs.openjdk.org/browse/JDK-8312749?focusedCommentId=14600658&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14600658) by Erik ?sterlund and Stefan Karlsson and comments in the regression test included in this changeset. >> >> As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64) >> - tier4-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; ZGC-specific tests only) >> - tier6-9 (linux-x64; ZGC-specific tests only) >> - tier1-3, and a few custom examples, applying [JDK-8139457](https://github.com/openjdk/jdk/pull/11044) (under review) on top of this changeset >> >> ##### Performance >> >> Tested performance on the following set of OpenJDK micro-benchmarks, on linux-x64 (for both G1 and ZGC, using different ObjectAlignmentInBytes values): >> >> - `openjdk.bench.java.lang.ArrayClone.byteClone` >> - `openjdk.bench.java.lang.ArrayClone.intClone` >> - `openjdk.bench.java.lang.ArrayFiddle.simple_clone` >> - `openjdk.bench.java.lang.Clone.cloneLarge` >> - `openjdk.bench.java.lang.Clone.cloneThreeDifferent` >> >> No significant regression was observed. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove extra whitespace > I suggest to integrate this fix as-is and address the simplification in a separate RFE Sounds reasonable. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15288#pullrequestreview-1594421310 From sviswanathan at openjdk.org Thu Aug 24 22:04:19 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 24 Aug 2023 22:04:19 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v26] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 23:29:47 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add parallelSort benchmarking src/java.base/share/classes/java/util/Arrays.java line 102: > 100: static void arraySort(Class elemType, Object array, long offset, int fromIndex, int toIndex, int end) { > 101: DualPivotQuicksort.smallArraySort(array, fromIndex, toIndex, end); > 102: } The arraySort and arrayPartition methods could be part of DualPivotQuicksort.java now. Then we can also remove the extra indirection from arraySort to smallArraySort and from arrayPartition to partitionSinglePivot and partitionDualPivot. src/java.base/share/classes/java/util/Arrays.java line 398: > 396: */ > 397: public static void sort(double[] a) { > 398: DualPivotQuicksort.sort(a, 0, 0, a.length); Extra blank space before DualPivotQuicksort. src/java.base/share/classes/java/util/DualPivotQuicksort.java line 2801: > 2799: Arrays.arrayPartition(float.class, a, baseOffset, low, high, pivotIndices, Unsafe.ARRAY_INT_BASE_OFFSET, isDualPivot); > 2800: lower = pivotIndices[0]; > 2801: upper = pivotIndices[1]; lower and upper are not used and overwritten on line 2822-2823. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1304694940 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1304672283 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1304838101 From duke at openjdk.org Thu Aug 24 23:34:53 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 24 Aug 2023 23:34:53 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v27] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > > **Arrays.sort performance data using JMH benchmarks for arrays with random data** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | > | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | > | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | > | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | > | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | > | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | > | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | > | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | > | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | > | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | > | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | > | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | > | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | > | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | > | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | > | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | > | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | > | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | > | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | > | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | > | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | > | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | > | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | > | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | > | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | > | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | > | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | > | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | > | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | > | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | > | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | > | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | > | ArraysSort.longSort | 1000 | 10.449 | 6.239 | 1.7 | > | ArraysSort.longSort | 10000 | 307.074 | 70.284 | **4.4** | > | ArraysSor... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Fix unused assignment in DPQS.java and space in Arrays.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/51738491..df17b3e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=25-26 Stats: 9 lines in 2 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Thu Aug 24 23:34:58 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 24 Aug 2023 23:34:58 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v26] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 17:45:00 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> add parallelSort benchmarking > > src/java.base/share/classes/java/util/Arrays.java line 398: > >> 396: */ >> 397: public static void sort(double[] a) { >> 398: DualPivotQuicksort.sort(a, 0, 0, a.length); > > Extra blank space before DualPivotQuicksort. Please see this fixed in the latest commit. > src/java.base/share/classes/java/util/DualPivotQuicksort.java line 2801: > >> 2799: Arrays.arrayPartition(float.class, a, baseOffset, low, high, pivotIndices, Unsafe.ARRAY_INT_BASE_OFFSET, isDualPivot); >> 2800: lower = pivotIndices[0]; >> 2801: upper = pivotIndices[1]; > > lower and upper are not used and overwritten on line 2822-2823. Please see this fixed in the latest commit. Unlike the baseline, the variables `low` and `end` don't have to be initialized in this implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1304964322 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1304963795 From sviswanathan at openjdk.org Thu Aug 24 23:35:32 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 24 Aug 2023 23:35:32 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v26] In-Reply-To: References: Message-ID: <7A2PZv2zQQIKsK21NB7EbotGOdHfSN4EO_ahEHmmtag=.3b036891-60f6-4585-8a9b-f2b176768f78@github.com> On Wed, 23 Aug 2023 23:29:47 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add parallelSort benchmarking src/java.base/share/classes/java/util/DualPivotQuicksort.java line 389: > 387: */ > 388: pivotIndices = new int[] {e1, e5}; > 389: Arrays.arrayPartition(int.class, a, baseOffset, low, high, pivotIndices, Unsafe.ARRAY_INT_BASE_OFFSET, isDualPivot); The Unsafe.ARRAY_INT_BASE_OFFSET parameter after pivotIndices is not needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1304932474 From mcimadamore at openjdk.org Thu Aug 24 23:36:10 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 24 Aug 2023 23:36:10 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: <4zrTfeu8tY86-yVJXBoqQ-gsWxGvdogZLTxfScBR7wU=.4f27a1f5-8334-497b-8a7d-756c95ad33ea@github.com> On Thu, 24 Aug 2023 13:56:12 GMT, Martin Doerr wrote: > I've found a way to solve the remaining FFI problem on linux PPC64 Big Endian. Large structs (>8 Bytes) which are passed in registers or on stack require shifting the Bytes in the last slot if the size is not a multiple of 8. This PR adds the required functionality to the Java code. > > Please review and provide feedback. There may be better ways to implement it. I just found one which works and makes the tests pass: > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/jdk/java/foreign 88 88 0 0 > > > Note: This PR should be considered as preparation work for AIX which also uses ABIv1. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/ABIv1CallArranger.java line 33: > 31: * PPC64 CallArranger specialized for ABI v1. > 32: */ > 33: public class ABIv1CallArranger extends CallArranger { Wouldn't it be more natural for CallArranger to have an abstract method (or even a kind() accessor for the different kinds of ABI supported) and then have these specialized subclasses return the correct kind? It seems to me that setting the `useXYZAbi` flag using an instanceof test is a little dirty. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1304966196 From mcimadamore at openjdk.org Thu Aug 24 23:41:10 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 24 Aug 2023 23:41:10 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: <-H8TLXCvHxTIMGSl0vfnuPByyVX1olhlQzNtKut6aa8=.1094aa6b-60c9-430c-99f2-9af72f994191@github.com> On Thu, 24 Aug 2023 13:56:12 GMT, Martin Doerr wrote: > I've found a way to solve the remaining FFI problem on linux PPC64 Big Endian. Large structs (>8 Bytes) which are passed in registers or on stack require shifting the Bytes in the last slot if the size is not a multiple of 8. This PR adds the required functionality to the Java code. > > Please review and provide feedback. There may be better ways to implement it. I just found one which works and makes the tests pass: > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/jdk/java/foreign 88 88 0 0 > > > Note: This PR should be considered as preparation work for AIX which also uses ABIv1. Overall these changes look good - as commented I'd like to learn a bit more of the underlying ABI, to get a sense of whether adding a new binding is ok. But overall it's great to see support for a big-endian ABI - apart from the linker, I am pleased to see that you did not encounter too many issues in the memory-side of the FFM API. src/java.base/share/classes/jdk/internal/foreign/abi/Binding.java line 695: > 693: * Negative [shiftAmount] shifts right and converts to int if needed. > 694: */ > 695: record ShiftLeft(int shiftAmount, Class type) implements Binding { Given the situation you are facing, perhaps adding the new binding here is unavoidable. Let's wait to hear from @JornVernee. In the meantime, can you point me to a document which explains this behavior? I'm curious and I'd like to know more :-) ------------- PR Review: https://git.openjdk.org/jdk/pull/15417#pullrequestreview-1594603192 PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1304967438 From mcimadamore at openjdk.org Thu Aug 24 23:58:08 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 24 Aug 2023 23:58:08 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: <-H8TLXCvHxTIMGSl0vfnuPByyVX1olhlQzNtKut6aa8=.1094aa6b-60c9-430c-99f2-9af72f994191@github.com> References: <-H8TLXCvHxTIMGSl0vfnuPByyVX1olhlQzNtKut6aa8=.1094aa6b-60c9-430c-99f2-9af72f994191@github.com> Message-ID: On Thu, 24 Aug 2023 23:36:22 GMT, Maurizio Cimadamore wrote: >> I've found a way to solve the remaining FFI problem on linux PPC64 Big Endian. Large structs (>8 Bytes) which are passed in registers or on stack require shifting the Bytes in the last slot if the size is not a multiple of 8. This PR adds the required functionality to the Java code. >> >> Please review and provide feedback. There may be better ways to implement it. I just found one which works and makes the tests pass: >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/jdk/java/foreign 88 88 0 0 >> >> >> Note: This PR should be considered as preparation work for AIX which also uses ABIv1. > > src/java.base/share/classes/jdk/internal/foreign/abi/Binding.java line 695: > >> 693: * Negative [shiftAmount] shifts right and converts to int if needed. >> 694: */ >> 695: record ShiftLeft(int shiftAmount, Class type) implements Binding { > > Given the situation you are facing, perhaps adding the new binding here is unavoidable. Let's wait to hear from @JornVernee. In the meantime, can you point me to a document which explains this behavior? I'm curious and I'd like to know more :-) Maybe I'm starting to see it - it's not a special rule, as much as it is a consequence of the endianness. E.g. if you have a struct that is 64 + 32 bytes, you can store the first 64 bytes as a long. Then, there's an issue as we have to fill another long, but we have only 32 bits of value. Is it the problem that if we just copy the value into the long word "as is" it will be stored in the "wrong" 32 bits? So the shift takes care of that, I guess? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1304981242 From mcimadamore at openjdk.org Fri Aug 25 00:01:10 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 25 Aug 2023 00:01:10 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: References: <-H8TLXCvHxTIMGSl0vfnuPByyVX1olhlQzNtKut6aa8=.1094aa6b-60c9-430c-99f2-9af72f994191@github.com> Message-ID: On Thu, 24 Aug 2023 23:55:28 GMT, Maurizio Cimadamore wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/Binding.java line 695: >> >>> 693: * Negative [shiftAmount] shifts right and converts to int if needed. >>> 694: */ >>> 695: record ShiftLeft(int shiftAmount, Class type) implements Binding { >> >> Given the situation you are facing, perhaps adding the new binding here is unavoidable. Let's wait to hear from @JornVernee. In the meantime, can you point me to a document which explains this behavior? I'm curious and I'd like to know more :-) > > Maybe I'm starting to see it - it's not a special rule, as much as it is a consequence of the endianness. E.g. if you have a struct that is 64 + 32 bytes, you can store the first 64 bytes as a long. Then, there's an issue as we have to fill another long, but we have only 32 bits of value. Is it the problem that if we just copy the value into the long word "as is" it will be stored in the "wrong" 32 bits? So the shift takes care of that, I guess? If my assumption above is correct, then maybe another way to solve the problem, would be to, instead of adding a new shift binding, to generalize the VM store binding we have to allow writing a smaller value into a bigger storage, with an offset. Correct? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1304982593 From mcimadamore at openjdk.org Fri Aug 25 00:13:09 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 25 Aug 2023 00:13:09 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 13:56:12 GMT, Martin Doerr wrote: > I've found a way to solve the remaining FFI problem on linux PPC64 Big Endian. Large structs (>8 Bytes) which are passed in registers or on stack require shifting the Bytes in the last slot if the size is not a multiple of 8. This PR adds the required functionality to the Java code. > > Please review and provide feedback. There may be better ways to implement it. I just found one which works and makes the tests pass: > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/jdk/java/foreign 88 88 0 0 > > > Note: This PR should be considered as preparation work for AIX which also uses ABIv1. src/java.base/share/classes/jdk/internal/foreign/abi/Binding.java line 717: > 715: public void interpret(Deque stack, StoreFunc storeFunc, > 716: LoadFunc loadFunc, SegmentAllocator allocator) { > 717: if (shiftAmount > 0) { Why do we assume we can only deal with ints or longs? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1304987417 From qamai at openjdk.org Fri Aug 25 01:12:15 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 25 Aug 2023 01:12:15 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved [v4] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 16:02:57 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch removes the early return in `AddNode::Value` in case one of the inputs is a bottom, which may affect the value calculation of nodes such as `Min/MaxNode`. >> >> Please kindly review, thanks very much. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into values > - address review > - fix min/maxfp nodes > - AddNode::Value should not return early > - AddNode::Value should not return early Hi, may I have a second review, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15021#issuecomment-1692612410 From kvn at openjdk.org Fri Aug 25 01:36:15 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Aug 2023 01:36:15 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved [v4] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 16:02:57 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch removes the early return in `AddNode::Value` in case one of the inputs is a bottom, which may affect the value calculation of nodes such as `Min/MaxNode`. >> >> Please kindly review, thanks very much. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into values > - address review > - fix min/maxfp nodes > - AddNode::Value should not return early > - AddNode::Value should not return early I am fine with fix and cleaning you did. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15021#pullrequestreview-1594675974 From kvn at openjdk.org Fri Aug 25 01:38:16 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Aug 2023 01:38:16 GMT Subject: RFR: 8314513: [IR Framework] Some internal IR Framework tests are failing after JDK-8310308 on PPC and Cascade Lake In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 11:51:11 GMT, Christian Hagedorn wrote: > This patch fixes some internal IR framework failures after [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308): > - `testlibrary_tests/ir_framework/tests/TestBadFormat.java` on Linux ppc64le: > - `applyIfCPUFeature` clauses are false and the rule is not run. We will therefore not hit the format violations which the test expects to find. The fix here is to remove the `applyIfCPUFeature` constraints as the test is only interested in properly reporting format violations. > - `testlibrary_tests/ir_framework/examples/IRExample.java` on Cascade Lake x86_64: > - On Cascade Lake, `failOn` constraints need the same "always true" handling as `counts` constraints. This was missed in JDK-8310308. I've added the same `try-catch` handling as in `RawCountsConstraint::parse()`: > https://github.com/openjdk/jdk/blob/97b94cb1cdeba00f4bba7326a300c0336950f3ec/test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java#L97-L104 > > Thanks to @MBaesken and @TheRealMDoerr for reporting this and helping with some pre-PR testing. Would you like to rerun your testing on PPC and Cascade Lake again? > > Thanks, > Christian Okay ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15415#pullrequestreview-1594677164 From duke at openjdk.org Fri Aug 25 01:51:12 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 25 Aug 2023 01:51:12 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v28] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > > **Arrays.sort performance data using JMH benchmarks for arrays with random data** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | > | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | > | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | > | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | > | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | > | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | > | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | > | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | > | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | > | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | > | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | > | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | > | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | > | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | > | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | > | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | > | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | > | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | > | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | > | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | > | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | > | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | > | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | > | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | > | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | > | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | > | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | > | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | > | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | > | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | > | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | > | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | > | ArraysSort.longSort | 1000 | 10.449 | 6.239 | 1.7 | > | ArraysSort.longSort | 10000 | 307.074 | 70.284 | **4.4** | > | ArraysSor... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Move sort and partition intrinsics from Arrays.java to DPQS.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/df17b3e2..f3b5fcf5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=26-27 Stats: 132 lines in 4 files changed: 48 ins; 63 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Fri Aug 25 01:51:15 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 25 Aug 2023 01:51:15 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v26] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 18:08:44 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> add parallelSort benchmarking > > src/java.base/share/classes/java/util/Arrays.java line 102: > >> 100: static void arraySort(Class elemType, Object array, long offset, int fromIndex, int toIndex, int end) { >> 101: DualPivotQuicksort.smallArraySort(array, fromIndex, toIndex, end); >> 102: } > > The arraySort and arrayPartition methods could be part of DualPivotQuicksort.java now. > Then we can also remove the extra indirection from arraySort to smallArraySort and from arrayPartition to partitionSinglePivot and partitionDualPivot. Please see arraySort and arrayPartition methods moved to DualPivotQuicksort.java in the latest commit. > src/java.base/share/classes/java/util/DualPivotQuicksort.java line 389: > >> 387: */ >> 388: pivotIndices = new int[] {e1, e5}; >> 389: Arrays.arrayPartition(int.class, a, baseOffset, low, high, pivotIndices, Unsafe.ARRAY_INT_BASE_OFFSET, isDualPivot); > > The Unsafe.ARRAY_INT_BASE_OFFSET parameter after pivotIndices is not needed. pivotIndices array is being passed as a parameter to the partition intrinsic as it is updated in-place with the new pivot indices after partitioning. The Unsafe.ARRAY_INT_BASE_OFFSET is being used in libary_call.cpp to get the address of pivotIndices. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1305026970 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1304970521 From duke at openjdk.org Fri Aug 25 01:57:41 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 25 Aug 2023 01:57:41 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > > **Arrays.sort performance data using JMH benchmarks for arrays with random data** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | > | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | > | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | > | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | > | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | > | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | > | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | > | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | > | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | > | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | > | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | > | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | > | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | > | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | > | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | > | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | > | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | > | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | > | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | > | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | > | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | > | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | > | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | > | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | > | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | > | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | > | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | > | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | > | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | > | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | > | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | > | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | > | ArraysSort.longSort | 1000 | 10.449 | 6.239 | 1.7 | > | ArraysSort.longSort | 10000 | 307.074 | 70.284 | **4.4** | > | ArraysSor... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary import in Arrays.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/f3b5fcf5..e44f11a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=27-28 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From sviswanathan at openjdk.org Fri Aug 25 01:57:41 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 25 Aug 2023 01:57:41 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v26] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 23:43:36 GMT, Srinivas Vamsi Parasa wrote: >> src/java.base/share/classes/java/util/DualPivotQuicksort.java line 389: >> >>> 387: */ >>> 388: pivotIndices = new int[] {e1, e5}; >>> 389: Arrays.arrayPartition(int.class, a, baseOffset, low, high, pivotIndices, Unsafe.ARRAY_INT_BASE_OFFSET, isDualPivot); >> >> The Unsafe.ARRAY_INT_BASE_OFFSET parameter after pivotIndices is not needed. > > pivotIndices array is being passed as a parameter to the partition intrinsic as it is updated in-place with the new pivot indices after partitioning. The Unsafe.ARRAY_INT_BASE_OFFSET is being used in libary_call.cpp to get the address of pivotIndices. As PivotIndices is local to the DualPivotQuickSort and is always going to be int array, there are other ways to compute the address in library_call.cpp without having to pass an additional argument. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1305037445 From kvn at openjdk.org Fri Aug 25 02:00:18 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Aug 2023 02:00:18 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 06:23:29 GMT, Srinivas Vamsi Parasa wrote: >>> Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. >> >> Hi Vladimir, >> >> Thank you for the suggestion! >> Currently, AVX512sort is doing well for Random, Repeated and Shuffle patterns of input data. The regressions are observed for Staggered (Wave) pattern of input data. >> Will investigate the regressions and adjust the implementations to address them. >> >> Thanks, >> Vamsi > >> Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. > > Hello Vladimir (@vnkozlov) , > > As per your suggestion, the implementation was adjusted to address the regressions caused for STAGGER and REPEATED type of input data patterns. > Please see below the new JMH performance data using the adjusted implementation. > > In the new implementation, we don't call the AVX512 sort intrinsic at the top level (`Arrays.sort()`) . Instead, we take a decomposed or a 'building blocks' approach where we only intrinsify (using AVX512 instructions) the partitioning and small array sort functions used in the current baseline ( `DualPivotQuickSort.sort()` ). Since the current baseline has logic to detect and sort special input patterns like STAGGER, we fallback to the current baseline instead of using AVX512 partitioning and sorting (which works best for RANDOM, REPEATED and SHUFFLE patterns). > > Data below shows `Arrays.sort()` performance comparison between the current **Java baseline (DPQS)** vs. **AVX512 sort** (this PR) using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568) > > - The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort` > - The scores shown are the average time (us/op), thus lower is better. The last column towards the right shows the speedup. > > > | Benchmark | Mode | Size | Baseline DPQS (us/op) | AVX512 partitioning & sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | --- | > | ArraysSort.Double.testSort | RANDOM | 800 | 6.7 | 4.8 | 1.39x | > | ArraysSort.Double.testSort | RANDOM | 7000 | 234.1 | 51.5 | **4.55x** | > | ArraysSort.Double.testSort | RANDOM | 50000 | 2155.9 | 470.0 | **4.59x** | > | ArraysSort.Double.testSort | RANDOM | 300000 | 15076.3 | 3391.3 | **4.45x** | > | ArraysSort.Double.testSort | RANDOM | 2000000 | 116445.5 | 27491.7 | **4.24x** | > | ArraysSort.Double.testSort | REPEATED | 800 | 2.3 | 1.7 | 1.35x | > | ArraysSort.Double.testSort | REPEATED | 7000 | 23.3 | 12.1 | **1.92x** | > | ArraysSort.Double.testSort |... @vamsi-parasa Thank you for addressing performance issues I asked about. First, since you add new public Java API to Arrays class this have to be reviewed by Core Libs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1692644591 From kvn at openjdk.org Fri Aug 25 02:00:20 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Aug 2023 02:00:20 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v28] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 01:51:12 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Move sort and partition intrinsics from Arrays.java to DPQS.java Also by build group. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1692645418 From sviswanathan at openjdk.org Fri Aug 25 02:11:24 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 25 Aug 2023 02:11:24 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v23] In-Reply-To: <1yCo55YXhhweh_3xXTORwBCZNnQjYneqD3xMxV_SbQE=.b1e286fa-a5fd-4236-84d8-255b62f1b627@github.com> References: <1yCo55YXhhweh_3xXTORwBCZNnQjYneqD3xMxV_SbQE=.b1e286fa-a5fd-4236-84d8-255b62f1b627@github.com> Message-ID: On Tue, 22 Aug 2023 23:38:47 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Decomposed DPQS using AVX512 partitioning and AVX512 sort (for small arrays). Works for serial and parallel sort. src/java.base/linux/native/libx86_64/avx512-common-qsort.h line 29: > 27: > 28: // This implementation is based on x86-simd-sort(https://github.com/intel/x86-simd-sort) > 29: #include Is the include iostream needed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1305047404 From kvn at openjdk.org Fri Aug 25 02:53:17 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Aug 2023 02:53:17 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 01:57:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary import in Arrays.java Second. We do have already the precedent to generate separate dynamic library (and load it into JVM) for intrinsics : [8265783](https://bugs.openjdk.org/browse/JDK-8265783). But I consider that as an exception. To have second such library gives me concerns. Especially C++ code - we can't control what vectors code particular version of C++ produces. Are `_mm512_set1_*` part of C++ standard or it is dependancies on another tool? In 8265783 case we had assembler code which is why we accepted it after some discussions. And I don't see (may be missing it somewhere) any checks in JVM that a CPU on which you use this library code actually supports AVX512. Is it possible to identify the hottest code in Java implementation and look why C2 does not produce good vectorized code of it? Even then you may find that performance is coming from some core code which you can then implement in VM in stub generator. We had similar issue back with CRC32. What we ended with: we looked on generated by C assembler code and duplicated it in stub generator. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1692680714 From duke at openjdk.org Fri Aug 25 03:09:29 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 25 Aug 2023 03:09:29 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v23] In-Reply-To: References: <1yCo55YXhhweh_3xXTORwBCZNnQjYneqD3xMxV_SbQE=.b1e286fa-a5fd-4236-84d8-255b62f1b627@github.com> Message-ID: On Fri, 25 Aug 2023 02:07:54 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Decomposed DPQS using AVX512 partitioning and AVX512 sort (for small arrays). Works for serial and parallel sort. > > src/java.base/linux/native/libx86_64/avx512-common-qsort.h line 29: > >> 27: >> 28: // This implementation is based on x86-simd-sort(https://github.com/intel/x86-simd-sort) >> 29: #include > > Is the include iostream needed? That was from an earlier commit and was removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1305080141 From jiefu at openjdk.org Fri Aug 25 05:02:19 2023 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 25 Aug 2023 05:02:19 GMT Subject: RFR: 8314951: VM build without C2 still fails after JDK-8313530 In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 14:32:14 GMT, Doug Simon wrote: >> JDK-8313530 fixed the release VM build but the debug VM build would still fail. >> This patch fix the debug VM build failure. >> Thanks. > > Marked as reviewed by dnsimon (Reviewer). Thanks @dougxc and @vnkozlov for your review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15419#issuecomment-1692758604 From jiefu at openjdk.org Fri Aug 25 05:02:20 2023 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 25 Aug 2023 05:02:20 GMT Subject: Integrated: 8314951: VM build without C2 still fails after JDK-8313530 In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 14:16:38 GMT, Jie Fu wrote: > JDK-8313530 fixed the release VM build but the debug VM build would still fail. > This patch fix the debug VM build failure. > Thanks. This pull request has now been integrated. Changeset: d0240591 Author: Jie Fu URL: https://git.openjdk.org/jdk/commit/d02405917406a355a11741bb278ea58c3a4642fb Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8314951: VM build without C2 still fails after JDK-8313530 Reviewed-by: dnsimon, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15419 From qamai at openjdk.org Fri Aug 25 06:24:28 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 25 Aug 2023 06:24:28 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved [v4] In-Reply-To: References: Message-ID: <4uWbVFySPC093PrDVGRD4YsHzdlZETAXUeYj6HTnLeI=.ebe48c98-22c8-43f5-a4d0-46a59dcb3154@github.com> On Fri, 25 Aug 2023 01:33:33 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into values >> - address review >> - fix min/maxfp nodes >> - AddNode::Value should not return early >> - AddNode::Value should not return early > > I am fine with fix and cleaning you did. @vnkozlov Thanks a lot for your reviews, I will integrate the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15021#issuecomment-1692829176 From chagedorn at openjdk.org Fri Aug 25 06:53:09 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 25 Aug 2023 06:53:09 GMT Subject: RFR: 8314513: [IR Framework] Some internal IR Framework tests are failing after JDK-8310308 on PPC and Cascade Lake In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 11:51:11 GMT, Christian Hagedorn wrote: > This patch fixes some internal IR framework failures after [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308): > - `testlibrary_tests/ir_framework/tests/TestBadFormat.java` on Linux ppc64le: > - `applyIfCPUFeature` clauses are false and the rule is not run. We will therefore not hit the format violations which the test expects to find. The fix here is to remove the `applyIfCPUFeature` constraints as the test is only interested in properly reporting format violations. > - `testlibrary_tests/ir_framework/examples/IRExample.java` on Cascade Lake x86_64: > - On Cascade Lake, `failOn` constraints need the same "always true" handling as `counts` constraints. This was missed in JDK-8310308. I've added the same `try-catch` handling as in `RawCountsConstraint::parse()`: > https://github.com/openjdk/jdk/blob/97b94cb1cdeba00f4bba7326a300c0336950f3ec/test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java#L97-L104 > > Thanks to @MBaesken and @TheRealMDoerr for reporting this and helping with some pre-PR testing. Would you like to rerun your testing on PPC and Cascade Lake again? > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15415#issuecomment-1692857456 From rcastanedalo at openjdk.org Fri Aug 25 07:18:11 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 25 Aug 2023 07:18:11 GMT Subject: RFR: 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) [v2] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 20:39:45 GMT, Albert Mingkun Yang wrote: > Sounds reasonable. Thanks, Albert! I created [JDK-8314994](https://bugs.openjdk.org/browse/JDK-8314994) to capture your suggestions, please feel free to edit/extend the description if needed to reflect better your idea. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15288#issuecomment-1692883168 From rcastanedalo at openjdk.org Fri Aug 25 07:21:28 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 25 Aug 2023 07:21:28 GMT Subject: Integrated: 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) In-Reply-To: References: Message-ID: On Tue, 15 Aug 2023 12:43:56 GMT, Roberto Casta?eda Lozano wrote: > This changeset ensures that the array copy stub underlying the intrinsic implementation of `Object.clone` only copies its (double-word aligned) payload, excluding the remaining object alignment padding words, when a non-default `ObjectAlignmentInBytes` value is used. This prevents the specialized ZGC stubs for `Object[]` array copy from processing undefined object alignment padding words as valid object pointers. For further details about the specific failure, see [initial analysis](https://bugs.openjdk.org/browse/JDK-8312749?focusedCommentId=14600658&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14600658) by Erik ?sterlund and Stefan Karlsson and comments in the regression test included in this changeset. > > As a side-benefit, the changeset simplifies the array size computation logic in `GraphKit::new_array()` by decoupling computation of header size and alignment padding size. > > #### Testing > > ##### Functionality > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64) > - tier4-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; ZGC-specific tests only) > - tier6-9 (linux-x64; ZGC-specific tests only) > - tier1-3, and a few custom examples, applying [JDK-8139457](https://github.com/openjdk/jdk/pull/11044) (under review) on top of this changeset > > ##### Performance > > Tested performance on the following set of OpenJDK micro-benchmarks, on linux-x64 (for both G1 and ZGC, using different ObjectAlignmentInBytes values): > > - `openjdk.bench.java.lang.ArrayClone.byteClone` > - `openjdk.bench.java.lang.ArrayClone.intClone` > - `openjdk.bench.java.lang.ArrayFiddle.simple_clone` > - `openjdk.bench.java.lang.Clone.cloneLarge` > - `openjdk.bench.java.lang.Clone.cloneThreeDifferent` > > No significant regression was observed. This pull request has now been integrated. Changeset: 002b5948 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/002b59487094f98d9805997b5d1122c1a411b391 Stats: 115 lines in 4 files changed: 89 ins; 9 del; 17 mod 8312749: Generational ZGC: Tests crash with assert(index == 0 || is_power_of_2(index)) Co-authored-by: Stefan Karlsson Co-authored-by: Erik ?sterlund Reviewed-by: thartmann, ayang, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15288 From mdoerr at openjdk.org Fri Aug 25 07:39:09 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Aug 2023 07:39:09 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: <4zrTfeu8tY86-yVJXBoqQ-gsWxGvdogZLTxfScBR7wU=.4f27a1f5-8334-497b-8a7d-756c95ad33ea@github.com> References: <4zrTfeu8tY86-yVJXBoqQ-gsWxGvdogZLTxfScBR7wU=.4f27a1f5-8334-497b-8a7d-756c95ad33ea@github.com> Message-ID: <3TcrDO2J1wAFDG5UaHKeh6tSKSKrR4ZSDC6n5-7pT20=.a53854a3-835d-4738-b6d3-54fdf13f70e9@github.com> On Thu, 24 Aug 2023 23:33:23 GMT, Maurizio Cimadamore wrote: >> I've found a way to solve the remaining FFI problem on linux PPC64 Big Endian. Large structs (>8 Bytes) which are passed in registers or on stack require shifting the Bytes in the last slot if the size is not a multiple of 8. This PR adds the required functionality to the Java code. >> >> Please review and provide feedback. There may be better ways to implement it. I just found one which works and makes the tests pass: >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/jdk/java/foreign 88 88 0 0 >> >> >> Note: This PR should be considered as preparation work for AIX which also uses ABIv1. > > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/ABIv1CallArranger.java line 33: > >> 31: * PPC64 CallArranger specialized for ABI v1. >> 32: */ >> 33: public class ABIv1CallArranger extends CallArranger { > > Wouldn't it be more natural for CallArranger to have an abstract method (or even a kind() accessor for the different kinds of ABI supported) and then have these specialized subclasses return the correct kind? It seems to me that setting the `useXYZAbi` flag using an instanceof test is a little dirty. I had something like that, but another reviewer didn't like it, either. Originally, I had thought that the v1 and v2 CallArrangers would get more content, but they're still empty. Would it be better to remove these special CallArrangers and distinguish in the base class? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1305300539 From mdoerr at openjdk.org Fri Aug 25 07:51:09 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Aug 2023 07:51:09 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: References: <-H8TLXCvHxTIMGSl0vfnuPByyVX1olhlQzNtKut6aa8=.1094aa6b-60c9-430c-99f2-9af72f994191@github.com> Message-ID: <9eDbDxSzdUMYrRDZh8XMiSd2Np4JrwAy2l6jSyykEVA=.1d009a67-a8e9-4868-8244-d244b8653729@github.com> On Thu, 24 Aug 2023 23:58:35 GMT, Maurizio Cimadamore wrote: >> Maybe I'm starting to see it - it's not a special rule, as much as it is a consequence of the endianness. E.g. if you have a struct that is 64 + 32 bytes, you can store the first 64 bytes as a long. Then, there's an issue as we have to fill another long, but we have only 32 bits of value. Is it the problem that if we just copy the value into the long word "as is" it will be stored in the "wrong" 32 bits? So the shift takes care of that, I guess? > > If my assumption above is correct, then maybe another way to solve the problem, would be to, instead of adding a new shift binding, to generalize the VM store binding we have to allow writing a smaller value into a bigger storage, with an offset. Correct? The ABI says: "An aggregate or union smaller than one doubleword in size is padded so that it appears in the least significant bits of the doubleword. All others are padded, if necessary, at their tail." [https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#PARAM-PASS]. I have written examples which pass 9 and 15 Bytes. In the first case, we need to get 0x0001020304050607 in the first argument and 0x08XXXXXXXXXXXXXX into the second argument (X is "don't care"). Shift amount is 7. In the second case, we need to get 0x0001020304050607 in the first argument and 0x08090a0b0c0d0eXX into the second argument. Shift amount is 1. In other words, we need shift amounts between 1 and 7. Stack slots and registers are always 64 bit on PPC64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1305313810 From mdoerr at openjdk.org Fri Aug 25 07:57:09 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Aug 2023 07:57:09 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 00:10:11 GMT, Maurizio Cimadamore wrote: >> I've found a way to solve the remaining FFI problem on linux PPC64 Big Endian. Large structs (>8 Bytes) which are passed in registers or on stack require shifting the Bytes in the last slot if the size is not a multiple of 8. This PR adds the required functionality to the Java code. >> >> Please review and provide feedback. There may be better ways to implement it. I just found one which works and makes the tests pass: >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/jdk/java/foreign 88 88 0 0 >> >> >> Note: This PR should be considered as preparation work for AIX which also uses ABIv1. > > src/java.base/share/classes/jdk/internal/foreign/abi/Binding.java line 717: > >> 715: public void interpret(Deque stack, StoreFunc storeFunc, >> 716: LoadFunc loadFunc, SegmentAllocator allocator) { >> 717: if (shiftAmount > 0) { > > Why do we assume we can only deal with ints or longs? I have inserted casts into `public Binding.Builder shiftLeft(int shiftAmount, Class type)` (similar to other bindings). The VM handles integral types smaller than `int` like `int` and uses 4 Bytes for arithmetic operations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1305321446 From mdoerr at openjdk.org Fri Aug 25 08:04:10 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Aug 2023 08:04:10 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: <-H8TLXCvHxTIMGSl0vfnuPByyVX1olhlQzNtKut6aa8=.1094aa6b-60c9-430c-99f2-9af72f994191@github.com> References: <-H8TLXCvHxTIMGSl0vfnuPByyVX1olhlQzNtKut6aa8=.1094aa6b-60c9-430c-99f2-9af72f994191@github.com> Message-ID: On Thu, 24 Aug 2023 23:38:42 GMT, Maurizio Cimadamore wrote: >> I've found a way to solve the remaining FFI problem on linux PPC64 Big Endian. Large structs (>8 Bytes) which are passed in registers or on stack require shifting the Bytes in the last slot if the size is not a multiple of 8. This PR adds the required functionality to the Java code. >> >> Please review and provide feedback. There may be better ways to implement it. I just found one which works and makes the tests pass: >> >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/jdk/java/foreign 88 88 0 0 >> >> >> Note: This PR should be considered as preparation work for AIX which also uses ABIv1. > > Overall these changes look good - as commented I'd like to learn a bit more of the underlying ABI, to get a sense of whether adding a new binding is ok. But overall it's great to see support for a big-endian ABI - apart from the linker, I am pleased to see that you did not encounter too many issues in the memory-side of the FFM API. @mcimadamore: Thanks for your feedback! Jorn and I had resolved the other issues already when we have worked on the linux little endian part. It already contains some ABIv1 code. Note that we already have one big endian platform: s390. But that one doesn't pass structs >8 Bytes in registers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15417#issuecomment-1692938709 From duke at openjdk.org Fri Aug 25 08:46:17 2023 From: duke at openjdk.org (emmyyin) Date: Fri, 25 Aug 2023 08:46:17 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v5] In-Reply-To: References: <7RAXRCktzo9NArTyV_NBwxxLl7zgCaxurRLwwwKzeAM=.91d9abbe-12e0-43af-8e5c-b13052629a56@github.com> Message-ID: On Thu, 24 Aug 2023 10:52:26 GMT, Christian Hagedorn wrote: >> Exactly, if `n.vertex == null` we want to break the loop. There are two cases we need to consider when removing the dummy nodes: 1) there is a long chain of dummy nodes between node u and v, and 2) the edge is part of an edge concentration of multiple edges. In case 1) we just traverse the edge with all the dummy nodes and remove them as we go until we hit node u (`n.vertex != null`). In case 2) we traverse along the edge until we find the anchor node (which is a dummy node with one ingoing edge and multiple outgoing edges), and break the loop when we reach the anchor node. The `found` variable is to ensure the dummy node is actually connected to something. Not sure if that part is actually needed > > Thanks for explaining it in more details. We probably had a misunderstanding here, though. Toby and I were referring to this `n.vertex == null` here: > > if (n.vertex == null && n.succs.size() <= 1 && n.preds.size() <= 1) > > and not the one in > > while (n.vertex == null && found) > > From your explanation it makes sense to keep the one in the `while` but the other one in the `if` always seems to be true and could thus be removed. Oh okay thanks for clarifying that makes sense! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1305376237 From duke at openjdk.org Fri Aug 25 08:49:16 2023 From: duke at openjdk.org (emmyyin) Date: Fri, 25 Aug 2023 08:49:16 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: <92KNVrJzmYZRK8wdxR487MYQmKotzFwAA0BucY9ERfw=.f1ced558-d440-4c66-af7f-a40a8a09ed03@github.com> References: <92KNVrJzmYZRK8wdxR487MYQmKotzFwAA0BucY9ERfw=.f1ced558-d440-4c66-af7f-a40a8a09ed03@github.com> Message-ID: <2LHjEzRXIFOCuCOYgeJxC0av3uDWSewmhk9aOSlVYeE=.e0da5b8d-3d66-4fe1-a519-c8c137a22c27@github.com> On Thu, 24 Aug 2023 10:49:16 GMT, Christian Hagedorn wrote: >> Yes since `relativeTo` and `relativeFrom` refers to the ports on the node, which is irrelevant for the dummy nodes. Could definitely be done differently, but this is how it is in `HierarchicalLayoutManager` and I thought it would be better to be consistent across the layout managers > > Okay, thanks for the explanation. The code in `HierarchicalLayoutManager` looks very similar. So, you could also change `relativeTo` and `relativeFrom` to zero there. Or even better share the code somehow (if I see that correctly, the only difference is how to insert the node - `nodes.add(n)` vs. `insertNode(n, layer)`). @robcasloz has created a suggestion list of how to improve `HierarchicalLayoutManager`. Could we perhaps add this to that list instead? I think there must have been some reason to set `n.width/2` (maybe just in case you want to make the dummy nodes visible with a larger width) so it could be worth to work it through some more ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1305379806 From duke at openjdk.org Fri Aug 25 09:00:05 2023 From: duke at openjdk.org (emmyyin) Date: Fri, 25 Aug 2023 09:00:05 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v10] In-Reply-To: References: Message-ID: <98mqhD2GfkWXqbg8zuJ-I8uDKx0mOIM83jLqC1LlDJ4=.da37c5a7-08cf-4a1f-a0a4-cf6ac6b3e3af@github.com> > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with one additional commit since the last revision: removing stuff ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/c0eec085..1908baa5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=08-09 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From duke at openjdk.org Fri Aug 25 09:00:08 2023 From: duke at openjdk.org (emmyyin) Date: Fri, 25 Aug 2023 09:00:08 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 11:04:39 GMT, Christian Hagedorn wrote: >> emmyyin has updated the pull request incrementally with one additional commit since the last revision: >> >> fixing trailing ws > > src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 189: > >> 187: assert e.to.layer == n.layer + 1; >> 188: } else { >> 189: n.succs.remove(e); > > This removal and the one in the next loop seem unexpected being in a sanity check method where the expectation would be to only query and not modify. Do we really need these removals for the correctness of the algorithm? Yes, it breaks often if removed. It's mostly sanity checking but also ensuring the state of the graph is correct by fixing weird edge connections ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1305387010 From mcimadamore at openjdk.org Fri Aug 25 09:04:10 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 25 Aug 2023 09:04:10 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: <3TcrDO2J1wAFDG5UaHKeh6tSKSKrR4ZSDC6n5-7pT20=.a53854a3-835d-4738-b6d3-54fdf13f70e9@github.com> References: <4zrTfeu8tY86-yVJXBoqQ-gsWxGvdogZLTxfScBR7wU=.4f27a1f5-8334-497b-8a7d-756c95ad33ea@github.com> <3TcrDO2J1wAFDG5UaHKeh6tSKSKrR4ZSDC6n5-7pT20=.a53854a3-835d-4738-b6d3-54fdf13f70e9@github.com> Message-ID: On Fri, 25 Aug 2023 07:36:47 GMT, Martin Doerr wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/ABIv1CallArranger.java line 33: >> >>> 31: * PPC64 CallArranger specialized for ABI v1. >>> 32: */ >>> 33: public class ABIv1CallArranger extends CallArranger { >> >> Wouldn't it be more natural for CallArranger to have an abstract method (or even a kind() accessor for the different kinds of ABI supported) and then have these specialized subclasses return the correct kind? It seems to me that setting the `useXYZAbi` flag using an instanceof test is a little dirty. > > I had something like that, but another reviewer didn't like it, either. Originally, I had thought that the v1 and v2 CallArrangers would get more content, but they're still empty. Would it be better to remove these special CallArrangers and distinguish in the base class? It seems to me that what you are doing is similar to what was done for aarch64, which was dealt with using very simple subclasses: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/linux/LinuxAArch64CallArranger.java https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/macos/MacOsAArch64CallArranger.java https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/windows/WindowsAArch64CallArranger.java In your case there's less difference, but I think we should follow the same idiom for both. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1305396592 From mcimadamore at openjdk.org Fri Aug 25 09:27:09 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 25 Aug 2023 09:27:09 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: <9eDbDxSzdUMYrRDZh8XMiSd2Np4JrwAy2l6jSyykEVA=.1d009a67-a8e9-4868-8244-d244b8653729@github.com> References: <-H8TLXCvHxTIMGSl0vfnuPByyVX1olhlQzNtKut6aa8=.1094aa6b-60c9-430c-99f2-9af72f994191@github.com> <9eDbDxSzdUMYrRDZh8XMiSd2Np4JrwAy2l6jSyykEVA=.1d009a67-a8e9-4868-8244-d244b8653729@github.com> Message-ID: On Fri, 25 Aug 2023 07:48:19 GMT, Martin Doerr wrote: >> If my assumption above is correct, then maybe another way to solve the problem, would be to, instead of adding a new shift binding, to generalize the VM store binding we have to allow writing a smaller value into a bigger storage, with an offset. Correct? > > The ABI says: "An aggregate or union smaller than one doubleword in size is padded so that it appears in the least significant bits of the doubleword. All others are padded, if necessary, at their tail." [https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#PARAM-PASS]. > I have written examples which pass 9 and 15 Bytes. > In the first case, we need to get 0x0001020304050607 in the first argument and 0x08XXXXXXXXXXXXXX into the second argument (X is "don't care"). Shift amount is 7. > In the second case, we need to get 0x0001020304050607 in the first argument and 0x08090a0b0c0d0eXX into the second argument. Shift amount is 1. > In other words, we need shift amounts between 1 and 7. Stack slots and registers are always 64 bit on PPC64. Got it - I found these representations: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.7.html#BYTEORDER Very helpful. So you have e.g. a `short` value (loaded from somewhere) and you have to store it on a double-word. Now, if you just stored it at offset 0, you will write the bits 0-15, which are the "most" significant bits in big-endian representation. So, it's backwards. I believe FFM will take care of endianness, so that the bytes 0-7 and 8-15 will be "swapped" when writing into the double-word (right?) but their base offset (0) is still off, as they should really start at offset 48. Hence the shift. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1305420316 From mcimadamore at openjdk.org Fri Aug 25 09:33:09 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 25 Aug 2023 09:33:09 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 07:54:51 GMT, Martin Doerr wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/Binding.java line 717: >> >>> 715: public void interpret(Deque stack, StoreFunc storeFunc, >>> 716: LoadFunc loadFunc, SegmentAllocator allocator) { >>> 717: if (shiftAmount > 0) { >> >> Why do we assume we can only deal with ints or longs? > > I have inserted casts into `public Binding.Builder shiftLeft(int shiftAmount, Class type)` (similar to other bindings). The VM handles integral types smaller than `int` like `int` and uses 4 Bytes for arithmetic operations. Ah I see that now - it's done the binding "builder". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1305426640 From chagedorn at openjdk.org Fri Aug 25 09:55:19 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 25 Aug 2023 09:55:19 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 08:53:25 GMT, emmyyin wrote: >> src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 189: >> >>> 187: assert e.to.layer == n.layer + 1; >>> 188: } else { >>> 189: n.succs.remove(e); >> >> This removal and the one in the next loop seem unexpected being in a sanity check method where the expectation would be to only query and not modify. Do we really need these removals for the correctness of the algorithm? > > Yes, it breaks often if removed. It's mostly sanity checking but also ensuring the state of the graph is correct by fixing weird edge connections As mentioned below, this method is still a bottleneck and we spend quite some time inside it. Overall, it attributes to around 90% of the overall time to open the graph in the example mentioned earlier. Is there another way to fix these breakages instead of looping over all nodes and edges here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1305452104 From chagedorn at openjdk.org Fri Aug 25 09:55:21 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 25 Aug 2023 09:55:21 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: <2LHjEzRXIFOCuCOYgeJxC0av3uDWSewmhk9aOSlVYeE=.e0da5b8d-3d66-4fe1-a519-c8c137a22c27@github.com> References: <92KNVrJzmYZRK8wdxR487MYQmKotzFwAA0BucY9ERfw=.f1ced558-d440-4c66-af7f-a40a8a09ed03@github.com> <2LHjEzRXIFOCuCOYgeJxC0av3uDWSewmhk9aOSlVYeE=.e0da5b8d-3d66-4fe1-a519-c8c137a22c27@github.com> Message-ID: On Fri, 25 Aug 2023 08:46:39 GMT, emmyyin wrote: >> Okay, thanks for the explanation. The code in `HierarchicalLayoutManager` looks very similar. So, you could also change `relativeTo` and `relativeFrom` to zero there. Or even better share the code somehow (if I see that correctly, the only difference is how to insert the node - `nodes.add(n)` vs. `insertNode(n, layer)`). > > @robcasloz has created a suggestion list of how to improve `HierarchicalLayoutManager`. Could we perhaps add this to that list instead? I think there must have been some reason to set `n.width/2` (maybe just in case you want to make the dummy nodes visible with a larger width) so it could be worth to work it through some more I'm fine with moving this and my other suggested clean-ups to this list. However, even after removing `sanityCheckNodesAndLayerNodes()` and additionally removing the `assert` checks (which are currently still in the code), the example graph mentioned earlier still needs around 18s to load on my machine. The bottleneck is now `sanityCheckEdges()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1305451093 From mdoerr at openjdk.org Fri Aug 25 10:41:04 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Aug 2023 10:41:04 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API [v2] In-Reply-To: References: Message-ID: > I've found a way to solve the remaining FFI problem on linux PPC64 Big Endian. Large structs (>8 Bytes) which are passed in registers or on stack require shifting the Bytes in the last slot if the size is not a multiple of 8. This PR adds the required functionality to the Java code. > > Please review and provide feedback. There may be better ways to implement it. I just found one which works and makes the tests pass: > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/jdk/java/foreign 88 88 0 0 > > > Note: This PR should be considered as preparation work for AIX which also uses ABIv1. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Implement ABI version selection by virtual method instead of instanceof check. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15417/files - new: https://git.openjdk.org/jdk/pull/15417/files/5d7b0e1d..50144b14 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15417&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15417&range=00-01 Stats: 16 lines in 3 files changed: 13 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15417.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15417/head:pull/15417 PR: https://git.openjdk.org/jdk/pull/15417 From mdoerr at openjdk.org Fri Aug 25 10:41:05 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Aug 2023 10:41:05 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API [v2] In-Reply-To: References: <4zrTfeu8tY86-yVJXBoqQ-gsWxGvdogZLTxfScBR7wU=.4f27a1f5-8334-497b-8a7d-756c95ad33ea@github.com> <3TcrDO2J1wAFDG5UaHKeh6tSKSKrR4ZSDC6n5-7pT20=.a53854a3-835d-4738-b6d3-54fdf13f70e9@github.com> Message-ID: On Fri, 25 Aug 2023 09:01:43 GMT, Maurizio Cimadamore wrote: >> I had something like that, but another reviewer didn't like it, either. Originally, I had thought that the v1 and v2 CallArrangers would get more content, but they're still empty. Would it be better to remove these special CallArrangers and distinguish in the base class? > > It seems to me that what you are doing is similar to what was done for aarch64, which was dealt with using very simple subclasses: > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/linux/LinuxAArch64CallArranger.java > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/macos/MacOsAArch64CallArranger.java > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/windows/WindowsAArch64CallArranger.java > > In your case there's less difference, but I think we should follow the same idiom for both. Makes sense. I've changed it with the 2nd commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15417#discussion_r1305495063 From mdoerr at openjdk.org Fri Aug 25 10:59:45 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Aug 2023 10:59:45 GMT Subject: RFR: 8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API [v3] In-Reply-To: References: Message-ID: <9Kiu1jOvcK84xyXL6IlXsgHZSBWyWe35HQU9ZdG0bZ8=.20665eea-7e0b-4ec8-94d1-f926a9f0f38b@github.com> > I've found a way to solve the remaining FFI problem on linux PPC64 Big Endian. Large structs (>8 Bytes) which are passed in registers or on stack require shifting the Bytes in the last slot if the size is not a multiple of 8. This PR adds the required functionality to the Java code. > > Please review and provide feedback. There may be better ways to implement it. I just found one which works and makes the tests pass: > > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/jdk/java/foreign 88 88 0 0 > > > Note: This PR should be considered as preparation work for AIX which also uses ABIv1. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary imports. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15417/files - new: https://git.openjdk.org/jdk/pull/15417/files/50144b14..430fa018 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15417&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15417&range=01-02 Stats: 5 lines in 3 files changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15417.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15417/head:pull/15417 PR: https://git.openjdk.org/jdk/pull/15417 From erikj at openjdk.org Fri Aug 25 13:23:23 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Fri, 25 Aug 2023 13:23:23 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 01:57:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary import in Arrays.java make/modules/java.base/Lib.gmk line 239: > 237: ################################################################################ > 238: > 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64)+$(INCLUDE_COMPILER2), true+true+true) Is there a reason for this to only be supported on Linux? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1305652232 From rcastanedalo at openjdk.org Fri Aug 25 15:09:45 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 25 Aug 2023 15:09:45 GMT Subject: RFR: 8315033: Problemlist java/lang/template/StringTemplateTest.java Message-ID: This changeset problem-lists `java/lang/template/StringTemplateTest.java`, which fails intermittently after the integration of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749), to reduce CI pipeline noise while [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) is investigated. ------------- Commit messages: - ProblemList java/lang/template/StringTemplateTest.java Changes: https://git.openjdk.org/jdk/pull/15430/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15430&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315033 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15430.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15430/head:pull/15430 PR: https://git.openjdk.org/jdk/pull/15430 From chagedorn at openjdk.org Fri Aug 25 15:09:45 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 25 Aug 2023 15:09:45 GMT Subject: RFR: 8315033: Problemlist java/lang/template/StringTemplateTest.java In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 14:51:57 GMT, Roberto Casta?eda Lozano wrote: > This changeset problem-lists `java/lang/template/StringTemplateTest.java`, which fails intermittently after the integration of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749), to reduce CI pipeline noise while [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) is investigated. Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15430#pullrequestreview-1595911479 From mdoerr at openjdk.org Fri Aug 25 15:18:08 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 25 Aug 2023 15:18:08 GMT Subject: RFR: 8314513: [IR Framework] Some internal IR Framework tests are failing after JDK-8310308 on PPC and Cascade Lake In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 11:51:11 GMT, Christian Hagedorn wrote: > This patch fixes some internal IR framework failures after [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308): > - `testlibrary_tests/ir_framework/tests/TestBadFormat.java` on Linux ppc64le: > - `applyIfCPUFeature` clauses are false and the rule is not run. We will therefore not hit the format violations which the test expects to find. The fix here is to remove the `applyIfCPUFeature` constraints as the test is only interested in properly reporting format violations. > - `testlibrary_tests/ir_framework/examples/IRExample.java` on Cascade Lake x86_64: > - On Cascade Lake, `failOn` constraints need the same "always true" handling as `counts` constraints. This was missed in JDK-8310308. I've added the same `try-catch` handling as in `RawCountsConstraint::parse()`: > https://github.com/openjdk/jdk/blob/97b94cb1cdeba00f4bba7326a300c0336950f3ec/test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java#L97-L104 > > Thanks to @MBaesken and @TheRealMDoerr for reporting this and helping with some pre-PR testing. Would you like to rerun your testing on PPC and Cascade Lake again? > > Thanks, > Christian Test results look good on our side. Thanks for fixing it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15415#issuecomment-1693525517 From rcastanedalo at openjdk.org Fri Aug 25 15:22:09 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 25 Aug 2023 15:22:09 GMT Subject: RFR: 8315033: Problemlist java/lang/template/StringTemplateTest.java In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 14:56:45 GMT, Christian Hagedorn wrote: > Looks good and trivial! Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15430#issuecomment-1693531477 From rcastanedalo at openjdk.org Fri Aug 25 15:30:19 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 25 Aug 2023 15:30:19 GMT Subject: Integrated: 8315033: Problemlist java/lang/template/StringTemplateTest.java In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 14:51:57 GMT, Roberto Casta?eda Lozano wrote: > This changeset problem-lists `java/lang/template/StringTemplateTest.java`, which fails intermittently after the integration of [JDK-8312749](https://bugs.openjdk.org/browse/JDK-8312749), to reduce CI pipeline noise while [JDK-8315029](https://bugs.openjdk.org/browse/JDK-8315029) is investigated. This pull request has now been integrated. Changeset: f139f306 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/f139f30695d9c9a79e1426949a130f24e0b240fc Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8315033: Problemlist java/lang/template/StringTemplateTest.java Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/15430 From qamai at openjdk.org Fri Aug 25 17:51:22 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 25 Aug 2023 17:51:22 GMT Subject: Integrated: 8312547: Max/Min nodes Value implementation could be improved In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 15:02:00 GMT, Quan Anh Mai wrote: > Hi, > > This patch removes the early return in `AddNode::Value` in case one of the inputs is a bottom, which may affect the value calculation of nodes such as `Min/MaxNode`. > > Please kindly review, thanks very much. This pull request has now been integrated. Changeset: 837cf85f Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/837cf85f7d5917f03c61c9bb4b8efe021de92b77 Stats: 63 lines in 2 files changed: 34 ins; 5 del; 24 mod 8312547: Max/Min nodes Value implementation could be improved Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15021 From kvn at openjdk.org Fri Aug 25 18:50:19 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Aug 2023 18:50:19 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 06:23:29 GMT, Srinivas Vamsi Parasa wrote: >>> Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. >> >> Hi Vladimir, >> >> Thank you for the suggestion! >> Currently, AVX512sort is doing well for Random, Repeated and Shuffle patterns of input data. The regressions are observed for Staggered (Wave) pattern of input data. >> Will investigate the regressions and adjust the implementations to address them. >> >> Thanks, >> Vamsi > >> Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. > > Hello Vladimir (@vnkozlov) , > > As per your suggestion, the implementation was adjusted to address the regressions caused for STAGGER and REPEATED type of input data patterns. > Please see below the new JMH performance data using the adjusted implementation. > > In the new implementation, we don't call the AVX512 sort intrinsic at the top level (`Arrays.sort()`) . Instead, we take a decomposed or a 'building blocks' approach where we only intrinsify (using AVX512 instructions) the partitioning and small array sort functions used in the current baseline ( `DualPivotQuickSort.sort()` ). Since the current baseline has logic to detect and sort special input patterns like STAGGER, we fallback to the current baseline instead of using AVX512 partitioning and sorting (which works best for RANDOM, REPEATED and SHUFFLE patterns). > > Data below shows `Arrays.sort()` performance comparison between the current **Java baseline (DPQS)** vs. **AVX512 sort** (this PR) using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568) > > - The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort` > - The scores shown are the average time (us/op), thus lower is better. The last column towards the right shows the speedup. > > > | Benchmark | Mode | Size | Baseline DPQS (us/op) | AVX512 partitioning & sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | --- | > | ArraysSort.Double.testSort | RANDOM | 800 | 6.7 | 4.8 | 1.39x | > | ArraysSort.Double.testSort | RANDOM | 7000 | 234.1 | 51.5 | **4.55x** | > | ArraysSort.Double.testSort | RANDOM | 50000 | 2155.9 | 470.0 | **4.59x** | > | ArraysSort.Double.testSort | RANDOM | 300000 | 15076.3 | 3391.3 | **4.45x** | > | ArraysSort.Double.testSort | RANDOM | 2000000 | 116445.5 | 27491.7 | **4.24x** | > | ArraysSort.Double.testSort | REPEATED | 800 | 2.3 | 1.7 | 1.35x | > | ArraysSort.Double.testSort | REPEATED | 7000 | 23.3 | 12.1 | **1.92x** | > | ArraysSort.Double.testSort |... @vamsi-parasa I submitted our testing of latest v28 version. It found issue in `ArraysSort.java` new benchmark file. You missed `,`after year in copyright line: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1693790589 From kvn at openjdk.org Fri Aug 25 18:50:21 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Aug 2023 18:50:21 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 01:57:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary import in Arrays.java After I fixed it Tier1 passed and I submitted other tiers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1693791251 From xliu at openjdk.org Fri Aug 25 18:59:10 2023 From: xliu at openjdk.org (Xin Liu) Date: Fri, 25 Aug 2023 18:59:10 GMT Subject: RFR: 8314452: Explicitly indicate inlining success/failure in PrintInlining In-Reply-To: References: Message-ID: On Wed, 16 Aug 2023 17:42:42 GMT, Jorn Vernee wrote: > This patch proposes to add a `+` or `-` to messages produced by `PrintInlining`, to indicate whether inlining succeeded or failed. This makes it easier to find inlining failures in an inlining trace, without having to rely on the message to figure out whether inlining succeeded or failed. Looking at inlining failures is often useful for diagnosing the results of benchmarks, but it can be hard to find inlining failures in lengthy traces. > > A sample of what this looks like: > > > +@ 0 java.lang.foreign.Arena::ofConfined (10 bytes) inline (hot) > +@ 0 java.lang.Thread::currentThread (0 bytes) (intrinsic) > +@ 3 jdk.internal.foreign.MemorySessionImpl::createConfined (9 bytes) inline (hot) > +@ 5 jdk.internal.foreign.ConfinedSession:: (18 bytes) inline (hot) > +@ 6 jdk.internal.foreign.ConfinedSession$ConfinedResourceList:: (5 bytes) inline (hot) > +@ 1 jdk.internal.foreign.MemorySessionImpl$ResourceList:: (5 bytes) inline (hot) > +@ 1 java.lang.Object:: (1 bytes) inline (hot) > +@ 9 jdk.internal.foreign.MemorySessionImpl:: (20 bytes) inline (hot) > +@ 1 java.lang.Object:: (1 bytes) inline (hot) > +@ 6 jdk.internal.foreign.MemorySessionImpl::asArena (9 bytes) inline (hot) > +@ 5 jdk.internal.foreign.MemorySessionImpl$1:: (10 bytes) inline (hot) > +@ 6 java.lang.Object:: (1 bytes) inline (hot) > -@ 8 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method > > > Using `grep`/`sls` to find inlining failures: > > >> Get-Content inlining_trace.txt | sls '-@' > -@ 8 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method > -@ 34 java.lang.foreign.SegmentAllocator::allocate (24 bytes) already compiled into a big method > -@ 19 java.lang.invoke.MethodHandle::linkToNative(JJJL)D (0 bytes) native call > -@ 95 java.lang.foreign.Arena::close (0 bytes) virtual call > ... I also feel explicit message 'fail to inline' is better than +/- Sigil here is essentially the value of a tree node. '+' denotes inline succeed. The problem is it increases the cognitive loads for java developers. I think we can establish a general rule: a failed inline emits a message starting with "fail to inline". reason is optional. Everything else are successful inlines. I think we still meet your goal: easily distinct inline or not inline method. to get all successful inline, we just use invert grep: grep -v "fail to inline" ------------- PR Comment: https://git.openjdk.org/jdk/pull/15315#issuecomment-1693801289 From duke at openjdk.org Fri Aug 25 19:02:19 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 25 Aug 2023 19:02:19 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 06:23:29 GMT, Srinivas Vamsi Parasa wrote: >>> Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. >> >> Hi Vladimir, >> >> Thank you for the suggestion! >> Currently, AVX512sort is doing well for Random, Repeated and Shuffle patterns of input data. The regressions are observed for Staggered (Wave) pattern of input data. >> Will investigate the regressions and adjust the implementations to address them. >> >> Thanks, >> Vamsi > >> Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. > > Hello Vladimir (@vnkozlov) , > > As per your suggestion, the implementation was adjusted to address the regressions caused for STAGGER and REPEATED type of input data patterns. > Please see below the new JMH performance data using the adjusted implementation. > > In the new implementation, we don't call the AVX512 sort intrinsic at the top level (`Arrays.sort()`) . Instead, we take a decomposed or a 'building blocks' approach where we only intrinsify (using AVX512 instructions) the partitioning and small array sort functions used in the current baseline ( `DualPivotQuickSort.sort()` ). Since the current baseline has logic to detect and sort special input patterns like STAGGER, we fallback to the current baseline instead of using AVX512 partitioning and sorting (which works best for RANDOM, REPEATED and SHUFFLE patterns). > > Data below shows `Arrays.sort()` performance comparison between the current **Java baseline (DPQS)** vs. **AVX512 sort** (this PR) using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568) > > - The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort` > - The scores shown are the average time (us/op), thus lower is better. The last column towards the right shows the speedup. > > > | Benchmark | Mode | Size | Baseline DPQS (us/op) | AVX512 partitioning & sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | --- | > | ArraysSort.Double.testSort | RANDOM | 800 | 6.7 | 4.8 | 1.39x | > | ArraysSort.Double.testSort | RANDOM | 7000 | 234.1 | 51.5 | **4.55x** | > | ArraysSort.Double.testSort | RANDOM | 50000 | 2155.9 | 470.0 | **4.59x** | > | ArraysSort.Double.testSort | RANDOM | 300000 | 15076.3 | 3391.3 | **4.45x** | > | ArraysSort.Double.testSort | RANDOM | 2000000 | 116445.5 | 27491.7 | **4.24x** | > | ArraysSort.Double.testSort | REPEATED | 800 | 2.3 | 1.7 | 1.35x | > | ArraysSort.Double.testSort | REPEATED | 7000 | 23.3 | 12.1 | **1.92x** | > | ArraysSort.Double.testSort |... > @vamsi-parasa I submitted our testing of latest v28 version. It found issue in `ArraysSort.java` new benchmark file. You missed `,`after year in copyright line: > > ``` > * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > ``` Thank you, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1693804429 From dnsimon at openjdk.org Fri Aug 25 19:04:21 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 25 Aug 2023 19:04:21 GMT Subject: RFR: 8313430: [JVMCI] fatal error: Never compilable: in JVMCI shutdown Message-ID: VM shutdown involves calling Java code which can schedule further compilations by the CompileBroker. With `UseJVMCICompiler`, all compilations started once VM shutdown has begun are abandoned since they are unnecessary and can delay VM shutdown. This PR makes `-XX:+AbortVMOnCompilationFailure` ignore such abandoned compilations. ------------- Commit messages: - make AbortVMOnCompilationFailure play nice with JVMCI Changes: https://git.openjdk.org/jdk/pull/15433/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15433&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313430 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15433.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15433/head:pull/15433 PR: https://git.openjdk.org/jdk/pull/15433 From never at openjdk.org Fri Aug 25 19:04:22 2023 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 25 Aug 2023 19:04:22 GMT Subject: RFR: 8313430: [JVMCI] fatal error: Never compilable: in JVMCI shutdown In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 18:58:31 GMT, Doug Simon wrote: > VM shutdown involves calling Java code which can schedule further compilations by the CompileBroker. With `UseJVMCICompiler`, all compilations started once VM shutdown has begun are abandoned since they are unnecessary and can delay VM shutdown. > > This PR makes `-XX:+AbortVMOnCompilationFailure` ignore such abandoned compilations. Looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15433#pullrequestreview-1596341943 From kvn at openjdk.org Fri Aug 25 19:12:09 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Aug 2023 19:12:09 GMT Subject: RFR: 8313430: [JVMCI] fatal error: Never compilable: in JVMCI shutdown In-Reply-To: References: Message-ID: <9UNLLmNDXjiREePlGKjAU9JX60BQ6a1ZbEe_mAbxUlI=.bb82417c-66b0-4234-889e-081e7242f88f@github.com> On Fri, 25 Aug 2023 18:58:31 GMT, Doug Simon wrote: > VM shutdown involves calling Java code which can schedule further compilations by the CompileBroker. With `UseJVMCICompiler`, all compilations started once VM shutdown has begun are abandoned since they are unnecessary and can delay VM shutdown. > > This PR makes `-XX:+AbortVMOnCompilationFailure` ignore such abandoned compilations. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15433#pullrequestreview-1596354079 From shade at openjdk.org Fri Aug 25 19:20:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 25 Aug 2023 19:20:08 GMT Subject: RFR: 8313430: [JVMCI] fatal error: Never compilable: in JVMCI shutdown In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 18:58:31 GMT, Doug Simon wrote: > VM shutdown involves calling Java code which can schedule further compilations by the CompileBroker. With `UseJVMCICompiler`, all compilations started once VM shutdown has begun are abandoned since they are unnecessary and can delay VM shutdown. > > This PR makes `-XX:+AbortVMOnCompilationFailure` ignore such abandoned compilations. Marked as reviewed by shade (Reviewer). src/hotspot/share/compiler/compileBroker.cpp line 2228: > 2226: } > 2227: } > 2228: if (!task->is_success() && !JVMCI::in_shutdown()) { Nit: There is a double space in here. ------------- PR Review: https://git.openjdk.org/jdk/pull/15433#pullrequestreview-1596371772 PR Review Comment: https://git.openjdk.org/jdk/pull/15433#discussion_r1306066075 From dnsimon at openjdk.org Fri Aug 25 19:59:46 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 25 Aug 2023 19:59:46 GMT Subject: RFR: 8313430: [JVMCI] fatal error: Never compilable: in JVMCI shutdown [v2] In-Reply-To: References: Message-ID: > VM shutdown involves calling Java code which can schedule further compilations by the CompileBroker. With `UseJVMCICompiler`, all compilations started once VM shutdown has begun are abandoned since they are unnecessary and can delay VM shutdown. > > This PR makes `-XX:+AbortVMOnCompilationFailure` ignore such abandoned compilations. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: remove extra space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15433/files - new: https://git.openjdk.org/jdk/pull/15433/files/8900b1c0..94981bbf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15433&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15433&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15433.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15433/head:pull/15433 PR: https://git.openjdk.org/jdk/pull/15433 From kvn at openjdk.org Fri Aug 25 22:07:22 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 25 Aug 2023 22:07:22 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 01:57:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary import in Arrays.java src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4143: > 4141: log_info(library)("Loaded library %s, handle " INTPTR_FORMAT, JNI_LIB_PREFIX "x86_64" JNI_LIB_SUFFIX, p2i(libx86_64)); > 4142: > 4143: if (UseAVX > 2 && VM_Version::supports_avx512dq()) { This check should be done before you locate and load library src/hotspot/share/opto/library_call.cpp line 5218: > 5216: BasicType bt = elem_type->basic_type(); > 5217: stubAddr = StubRoutines::select_array_partition_function(bt); > 5218: if (stubAddr == nullptr) return false; I see now how you check for AVX512 support. You bailout here if address for stubs is not set and I see that you have `if (UseAVX > 2 && VM_Version::supports_avx512dq())` check in stubGenerator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1306180258 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1306179926 From sviswanathan at openjdk.org Fri Aug 25 23:20:15 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 25 Aug 2023 23:20:15 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 18:46:53 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary import in Arrays.java > > After I fixed it Tier1 passed and I submitted other tiers. @vnkozlov The _mm512_set1_* are all C/C++ intrinsics for Intel instructions documented at https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html. Both GCC and Microsoft C implements them. https://learn.microsoft.com/en-us/cpp/intrinsics/x64-amd64-intrinsics-list. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1694025189 From dnsimon at openjdk.org Sat Aug 26 10:16:29 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 26 Aug 2023 10:16:29 GMT Subject: RFR: 8313430: [JVMCI] fatal error: Never compilable: in JVMCI shutdown [v2] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 19:59:46 GMT, Doug Simon wrote: >> VM shutdown involves calling Java code which can schedule further compilations by the CompileBroker. With `UseJVMCICompiler`, all compilations started once VM shutdown has begun are abandoned since they are unnecessary and can delay VM shutdown. >> >> This PR makes `-XX:+AbortVMOnCompilationFailure` ignore such abandoned compilations. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > remove extra space Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15433#issuecomment-1694249600 From dnsimon at openjdk.org Sat Aug 26 10:16:30 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 26 Aug 2023 10:16:30 GMT Subject: Integrated: 8313430: [JVMCI] fatal error: Never compilable: in JVMCI shutdown In-Reply-To: References: Message-ID: <27CU_I-oJMV3vnez-doVlatnqa2_6WKhWWuPBNioKtQ=.51d6cb63-cdc0-4047-aa86-1063a039da10@github.com> On Fri, 25 Aug 2023 18:58:31 GMT, Doug Simon wrote: > VM shutdown involves calling Java code which can schedule further compilations by the CompileBroker. With `UseJVMCICompiler`, all compilations started once VM shutdown has begun are abandoned since they are unnecessary and can delay VM shutdown. > > This PR makes `-XX:+AbortVMOnCompilationFailure` ignore such abandoned compilations. This pull request has now been integrated. Changeset: acd93102 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/acd93102348f592d6f2e77a4bff6037edf708d55 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8313430: [JVMCI] fatal error: Never compilable: in JVMCI shutdown Reviewed-by: never, kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/15433 From fjiang at openjdk.org Sat Aug 26 12:02:40 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 26 Aug 2023 12:02:40 GMT Subject: RFR: 8315070: RSIC-V: Clean up platform dependent inline headers Message-ID: <6KZojI_U1KT3lQq8wJ0XZuNDC69ArpTWOr_LmIXAV80=.c993bbb2-bf80-4e8f-9d77-f7de952a51d5@github.com> Hi team, please review this small clean-up changes. Inspired by [JDK-8267464](https://bugs.openjdk.org/browse/JDK-8267464), riscv port still has one place that includes platform-dependent inline header `assembler_riscv.inline.hpp`, it could be replaced with platform-independent header `asm/assembler.inline.hpp`. Testing: - [x] release build on linux-riscv64 - [x] tier1 on linux-riscv64 with release build ------------- Commit messages: - update copyright - include platform independent file Changes: https://git.openjdk.org/jdk/pull/15437/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15437&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315070 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15437.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15437/head:pull/15437 PR: https://git.openjdk.org/jdk/pull/15437 From fyang at openjdk.org Sun Aug 27 03:57:07 2023 From: fyang at openjdk.org (Fei Yang) Date: Sun, 27 Aug 2023 03:57:07 GMT Subject: RFR: 8315070: RISC-V: Clean up platform dependent inline headers In-Reply-To: <6KZojI_U1KT3lQq8wJ0XZuNDC69ArpTWOr_LmIXAV80=.c993bbb2-bf80-4e8f-9d77-f7de952a51d5@github.com> References: <6KZojI_U1KT3lQq8wJ0XZuNDC69ArpTWOr_LmIXAV80=.c993bbb2-bf80-4e8f-9d77-f7de952a51d5@github.com> Message-ID: On Sat, 26 Aug 2023 11:55:57 GMT, Feilong Jiang wrote: > Hi team, please review this small clean-up changes. > Inspired by [JDK-8267464](https://bugs.openjdk.org/browse/JDK-8267464), riscv port still has one place that includes platform-dependent inline header `assembler_riscv.inline.hpp`, it could be replaced with platform-independent header `asm/assembler.inline.hpp`. > > Testing: > - [x] release build on linux-riscv64 > - [x] tier1 on linux-riscv64 with release build Looks like there is a typo in the PR title. It should be: "8315070: RISC-V: Clean up platform dependent inline headers". ------------- PR Comment: https://git.openjdk.org/jdk/pull/15437#issuecomment-1694562156 From fjiang at openjdk.org Sun Aug 27 03:57:08 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Sun, 27 Aug 2023 03:57:08 GMT Subject: RFR: 8315070: RISC-V: Clean up platform dependent inline headers In-Reply-To: References: <6KZojI_U1KT3lQq8wJ0XZuNDC69ArpTWOr_LmIXAV80=.c993bbb2-bf80-4e8f-9d77-f7de952a51d5@github.com> Message-ID: On Sun, 27 Aug 2023 03:52:46 GMT, Fei Yang wrote: > Looks like there is a typo in the PR title. It should be: "8315070: RISC-V: Clean up platform dependent inline headers". Thanks! Typo fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15437#issuecomment-1694562275 From qamai at openjdk.org Sun Aug 27 12:41:42 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 27 Aug 2023 12:41:42 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long Message-ID: Hi, This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below. In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. Please kindly review, thanks very much. Testing - [x] GHA - [x] Linux x64, tier 1-3 - [ ] Linux x64, tier 4 ------------- Commit messages: - whitespace - fix tests for x86_32 - fix widen of ConvI2L - problem lists - format - comment - use properly_contains - fix cast nodes - fix Cast nodes - fix Boolean test - ... and 12 more: https://git.openjdk.org/jdk/compare/837cf85f...5fc5cf3a Changes: https://git.openjdk.org/jdk/pull/15440/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15440&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315066 Stats: 3851 lines in 36 files changed: 2069 ins; 1189 del; 593 mod Patch: https://git.openjdk.org/jdk/pull/15440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15440/head:pull/15440 PR: https://git.openjdk.org/jdk/pull/15440 From qamai at openjdk.org Sun Aug 27 12:56:09 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 27 Aug 2023 12:56:09 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long In-Reply-To: References: Message-ID: On Sun, 27 Aug 2023 12:26:22 GMT, Quan Anh Mai wrote: > Hi, > > This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > Please kindly review, thanks very much. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-3 > - [ ] Linux x64, tier 4 Regarding duality, I don't really understand the concept of duality used in the type system, from the usage of `dual` in `join(x, y) = dual(meet(dual(x), dual(y)))`, I kind of understand that the dual of a type is the complement of that type, but I don't see why do we need to represent it using the same representation. Furthermore, having an invalid type instance seems to be dangerous. This concept is only used to calculate the join of 2 types, so I use a `_dual` field to indicate whether we are calculating the union or the intersection of 2 sets, and `make` of invalid types (e.g. intersection of 2 nonoverlap types, or `make` with contradicting constraints) will result in `Type::TOP`. I'm not sure there is any other component relying on the old behaviour. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15440#issuecomment-1694661397 From qamai at openjdk.org Sun Aug 27 13:17:06 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 27 Aug 2023 13:17:06 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long In-Reply-To: References: Message-ID: On Sun, 27 Aug 2023 12:26:22 GMT, Quan Anh Mai wrote: > Hi, > > This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > Please kindly review, thanks very much. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-3 > - [ ] Linux x64, tier 4 The details regarding the transformation in each node: - `And/Or/Xor`: Previously, ad-hoc rules are used, such that and of a positive is a positive, or of 2 bools is a bool, etc. This is naturally expanded in a generalised manner using the bit information. - `L/R/URShift`: Since the operations only concern the lowest bits of the shift amount, bit information is a natural generalisation, bounds calculation are mostly the same, and the information of the pushed-in bits can be used. - `CountLeadingZeros`, `CountTrailingZeros`, `PopCount`: Previously, this can only do constant propagation, otherwise it returns the typed bottom. The type can be greatly sharpened. - `Add/Sub`: Previously, bounds can only be inferred if there is no overflow occurs, I expand this a little bit so that bounds can also be inferred if the upper and lower bounds overflow in the same manner. Unsigned bounds are added, and bits are calculated manually, this can help in situations such as: for (int i = 0; i < max; i += 4) { bool b = (i & 3) == 0; // This is constant zero } - `CmpI/L`: Unsigned bounds help us represent `TypeInt::CC_NE`, helps narrow the comparison results when the inputs do not overlap - `CmpU/UL`: An ad-hoc rule regarding comparing of an `Add/Sub` and a constant was used, this is replaced by the usage of unsigned bounds ------------- PR Comment: https://git.openjdk.org/jdk/pull/15440#issuecomment-1694666619 From lucy at openjdk.org Sun Aug 27 15:42:07 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Sun, 27 Aug 2023 15:42:07 GMT Subject: RFR: 8299658: C1 compilation crashes in LinearScan::resolve_exception_edge In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 20:17:52 GMT, Martin Doerr wrote: > This is a quick fix for the C1 problem described in the JBS issue. > When we find an illegal operand (modelled by nullptr) while resolving an exception edge we can propagate this state to the phi function and skip the edge. > > If somebody finds a better way to propagate the "illegal" state to the phi function, I can change or close this PR. > > Please review. A nice regression test would be a good thing, but probably not easy to write. Looks reasonable. For sure you can't continue if from_value isn't a valid operand. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15348#pullrequestreview-1597155658 From haosun at openjdk.org Sun Aug 27 22:21:19 2023 From: haosun at openjdk.org (Hao Sun) Date: Sun, 27 Aug 2023 22:21:19 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 [v2] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 09:11:49 GMT, Gerg? Barany wrote: >> The EnableVectorSupport flag is declared in `opto/c2_globals.hpp`, which is not included if `COMPILER2` is not set. But after my changes for [JDK-8312579](https://bugs.openjdk.org/browse/JDK-8312579) we try to access this flag in some places guarded by `#if COMPILER2_OR_JVMCI`. >> >> This PR moves some flags from `c2_globals.hpp` to the shared `compiler_globals.hpp`, so that they are accessible even if C2 is disabled but JVMCI is enabled. > > Gerg? Barany has updated the pull request incrementally with two additional commits since the last revision: > > - Add copies of Vector API flags in jvmci_globals.hpp > - Revert "8313530: VM build without C2 fails after JDK-8312579" > > This reverts commit d82e89c469e91f78f9c2e5b28c725b0e1ba0fb8c. > > Verified with Linux/AArch64 and Linux/x86_64 that VM build without C2 is passed now. > > The debug build still fails: #15419 . So I assume the verified builds were all release builds, right? Sorry for the late reply. Yes. I only verified the release builds. And thanks for your fix in https://github.com/openjdk/jdk/pull/15419 ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15384#issuecomment-1694775166 From jiefu at openjdk.org Sun Aug 27 23:04:22 2023 From: jiefu at openjdk.org (Jie Fu) Date: Sun, 27 Aug 2023 23:04:22 GMT Subject: RFR: 8313530: VM build without C2 fails after JDK-8312579 [v2] In-Reply-To: References: Message-ID: On Sun, 27 Aug 2023 22:18:27 GMT, Hao Sun wrote: > Yes. I only verified the release builds. Thanks for your clarification. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15384#issuecomment-1694784670 From fyang at openjdk.org Mon Aug 28 02:48:16 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 28 Aug 2023 02:48:16 GMT Subject: RFR: 8315070: RISC-V: Clean up platform dependent inline headers In-Reply-To: <6KZojI_U1KT3lQq8wJ0XZuNDC69ArpTWOr_LmIXAV80=.c993bbb2-bf80-4e8f-9d77-f7de952a51d5@github.com> References: <6KZojI_U1KT3lQq8wJ0XZuNDC69ArpTWOr_LmIXAV80=.c993bbb2-bf80-4e8f-9d77-f7de952a51d5@github.com> Message-ID: <1J2XGGmHHJxadoDm5_xIzH6jQzeE0RbNdaMWoM5O-Yk=.4c12139d-dfc0-43d6-809d-5c5f2277b687@github.com> On Sat, 26 Aug 2023 11:55:57 GMT, Feilong Jiang wrote: > Hi team, please review this small clean-up changes. > Inspired by [JDK-8267464](https://bugs.openjdk.org/browse/JDK-8267464), riscv port still has one place that includes platform-dependent inline header `assembler_riscv.inline.hpp`, it could be replaced with platform-independent header `asm/assembler.inline.hpp`. > > Testing: > - [x] release build on linux-riscv64 > - [x] tier1 on linux-riscv64 with release build LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15437#pullrequestreview-1597385345 From chagedorn at openjdk.org Mon Aug 28 06:16:17 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Aug 2023 06:16:17 GMT Subject: RFR: 8314513: [IR Framework] Some internal IR Framework tests are failing after JDK-8310308 on PPC and Cascade Lake In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 11:51:11 GMT, Christian Hagedorn wrote: > This patch fixes some internal IR framework failures after [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308): > - `testlibrary_tests/ir_framework/tests/TestBadFormat.java` on Linux ppc64le: > - `applyIfCPUFeature` clauses are false and the rule is not run. We will therefore not hit the format violations which the test expects to find. The fix here is to remove the `applyIfCPUFeature` constraints as the test is only interested in properly reporting format violations. > - `testlibrary_tests/ir_framework/examples/IRExample.java` on Cascade Lake x86_64: > - On Cascade Lake, `failOn` constraints need the same "always true" handling as `counts` constraints. This was missed in JDK-8310308. I've added the same `try-catch` handling as in `RawCountsConstraint::parse()`: > https://github.com/openjdk/jdk/blob/97b94cb1cdeba00f4bba7326a300c0336950f3ec/test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java#L97-L104 > > Thanks to @MBaesken and @TheRealMDoerr for reporting this and helping with some pre-PR testing. Would you like to rerun your testing on PPC and Cascade Lake again? > > Thanks, > Christian Thanks Martin for testing it again! Do you or @MBaesken also want to approve the PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15415#issuecomment-1695079597 From duke at openjdk.org Mon Aug 28 07:24:20 2023 From: duke at openjdk.org (emmyyin) Date: Mon, 28 Aug 2023 07:24:20 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 09:52:03 GMT, Christian Hagedorn wrote: >> Yes, it breaks often if removed. It's mostly sanity checking but also ensuring the state of the graph is correct by fixing weird edge connections > > As mentioned below, this method is still a bottleneck and we spend quite some time inside it. Overall, it attributes to around 90% of the overall time to open the graph in the example mentioned earlier. Is there another way to fix these breakages instead of looping over all nodes and edges here? I am not sure, I've tried to find other solutions but have not managed. The function can be called less, but not removed entirely. A suggestion is to put it up as a future enhancement? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1307011222 From duke at openjdk.org Mon Aug 28 07:48:20 2023 From: duke at openjdk.org (emmyyin) Date: Mon, 28 Aug 2023 07:48:20 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v11] In-Reply-To: References: Message-ID: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with one additional commit since the last revision: fixing shared code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/1908baa5..40ba69a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=09-10 Stats: 71 lines in 2 files changed: 0 ins; 62 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From duke at openjdk.org Mon Aug 28 07:48:23 2023 From: duke at openjdk.org (emmyyin) Date: Mon, 28 Aug 2023 07:48:23 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: References: Message-ID: On Tue, 22 Aug 2023 11:57:16 GMT, Christian Hagedorn wrote: >> emmyyin has updated the pull request incrementally with one additional commit since the last revision: >> >> fixing trailing ws > > src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1450: > >> 1448: } >> 1449: >> 1450: public void insert(LayoutNode n, int pos) { > > This method and also other code is duplicated from `HierarchicalLayoutManager`. Could the code be shared somehow? I have made the shared parts public in `HierarchicalLayoutManager` and reused in `HierarchicalStableLayoutManager`. Is this the way to go or should we move them out to a `LayoutManagerUtils` class instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1307031812 From duke at openjdk.org Mon Aug 28 07:53:12 2023 From: duke at openjdk.org (emmyyin) Date: Mon, 28 Aug 2023 07:53:12 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v12] In-Reply-To: References: Message-ID: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with one additional commit since the last revision: only do sanityCheckEdges where necessary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/40ba69a4..c9ea772c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=10-11 Stats: 14 lines in 1 file changed: 0 ins; 14 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From pli at openjdk.org Mon Aug 28 08:30:07 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 28 Aug 2023 08:30:07 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v5] In-Reply-To: References: Message-ID: <8axqn5CwX2h1-lA3zpDuuWluj70sYV6NwY8f5iEyXW0=.b88c04c0-ff31-41a8-a94c-cbdf5b3fdfb2@github.com> > This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. > > To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. > > Tested various of compiler control related VM flags on x86 and AArch64. Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into flagless - Remove useless conditions and imports - Revert to the 1st commit and re-address comments - Re-work correctness check to allow "-Xbatch" - 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. Tested various of compiler control related VM flags on x86 and AArch64. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15011/files - new: https://git.openjdk.org/jdk/pull/15011/files/00d48cc8..42de3da5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15011&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15011&range=03-04 Stats: 41171 lines in 1597 files changed: 22101 ins; 8029 del; 11041 mod Patch: https://git.openjdk.org/jdk/pull/15011.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15011/head:pull/15011 PR: https://git.openjdk.org/jdk/pull/15011 From pli at openjdk.org Mon Aug 28 08:49:27 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 28 Aug 2023 08:49:27 GMT Subject: RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE [v2] In-Reply-To: References: Message-ID: <0h18a9nnmEqd5KAlYsBLZzhG09pPRmRZ6q_z1eB2igI=.d81b14bd-5e08-4cb5-ae3d-b20d45a78a44@github.com> > Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java` fails on AArch64 CPUs with 512-bit SVE. The reason is that many test loops in the code cannot be vectorized due to data dependence but IR tests assume they can. > > On AArch64, these IR tests just check the CPU feature of `asimd` and incorrectly assumes AArch64 vectors are at most 256 bits. But actually, `asimd` on AArch64 only represents NEON vectors which are at most 128 bits. AArch64 CPUs may have another feature of `sve` which represents scalable vectors of at most 2048 bits. The vectorization won't succeed on 512-bit SVE CPUs if the memory offset between some read and write is less than 512 bits. > > As this jtreg is auto-generated by a python script, we have updated the script and re-generated this jtreg. In this new version, we checked the auto-vectorization on both NEON-only and NEON+SVE platforms. Below is the diff of the generator script. We have also attached the new script to the JBS page. > > > @@ -321,7 +321,8 @@ class Type: > p.append(Platform("avx512", ["avx512", "true"], 64)) > else: > assert False, "type not implemented" + self.name > - p.append(Platform("asimd", ["asimd", "true"], 32)) > + p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16)) > + p.append(Platform("sve", ["sve", "true"], 256)) > return p > > class Test: > @@ -457,7 +458,7 @@ class Generator: > lines.append(" * and various MaxVectorSize values, and +- AlignVector.") > lines.append(" *") > lines.append(" * Note: this test is auto-generated. Please modify / generate with script:") > - lines.append(" * https://bugs.openjdk.org/browse/JDK-8308606") > + lines.append(" * https://bugs.openjdk.org/browse/JDK-8312570") > lines.append(" *") > lines.append(" * Types: " + ", ".join([t.name for t in self.types])) > lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets])) > @@ -598,7 +599,8 @@ class Generator: > # IR rules > for p in test.t.platforms(): > elements = p.vector_width // test.t.size > - lines.append(f" // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}") > + max_pre = "max " if p.name == "sve" else "" > + lines.append(f" // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}") > ############### -Align... Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into deptest - 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java` fails on AArch64 CPUs with 512-bit SVE. The reason is that many test loops in the code cannot be vectorized due to data dependence but IR tests assume they can. On AArch64, these IR tests just check the CPU feature of `asimd` and incorrectly assumes AArch64 vectors are at most 256 bits. But actually, `asimd` on AArch64 only represents NEON vectors which are at most 128 bits. AArch64 CPUs may have another feature of `sve` which represents scalable vectors of at most 2048 bits. The vectorization won't succeed on 512-bit SVE CPUs if the memory offset between some read and write is less than 512 bits. As this jtreg is auto-generated by a python script, we have updated the script and re-generated this jtreg. In this new version, we checked the auto-vectorization on both NEON-only and NEON+SVE platforms. Below is the diff of the generator script. We have also attached the new script to the JBS page. ``` @@ -321,7 +321,8 @@ class Type: p.append(Platform("avx512", ["avx512", "true"], 64)) else: assert False, "type not implemented" + self.name - p.append(Platform("asimd", ["asimd", "true"], 32)) + p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16)) + p.append(Platform("sve", ["sve", "true"], 256)) return p class Test: @@ -457,7 +458,7 @@ class Generator: lines.append(" * and various MaxVectorSize values, and +- AlignVector.") lines.append(" *") lines.append(" * Note: this test is auto-generated. Please modify / generate with script:") - lines.append(" * https://bugs.openjdk.org/browse/JDK-8308606") + lines.append(" * https://bugs.openjdk.org/browse/JDK-8312570") lines.append(" *") lines.append(" * Types: " + ", ".join([t.name for t in self.types])) lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets])) @@ -598,7 +599,8 @@ class Generator: # IR rules for p in test.t.platforms(): elements = p.vector_width // test.t.size - lines.append(f" // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}") + max_pre = "max " if p.name == "sve" else "" + lines.append(f" // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}") ############### -AlignVector rule = PlatformIRRule(p) rule.add_pre_constraint("AlignVector", IRBool.makeFalse()) @@ -694,8 +696,8 @@ class Generator: def main(): g = Generator() g.generate("TestDependencyOffsets", - "/home/emanuel/Documents/fork7-jdk/open/test/hotspot/jtreg/compiler/loopopts/superword", - "8298935 8308606", # Big ID + "test/hotspot/jtreg/compiler/loopopts/superword", + "8298935 8308606 8312570", # Bug ID "compiler.loopopts.superword", # package ) ``` We tested this on various of AArch64 CPUs. ------------- Changes: https://git.openjdk.org/jdk/pull/15010/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15010&range=01 Stats: 2062 lines in 1 file changed: 1422 ins; 0 del; 640 mod Patch: https://git.openjdk.org/jdk/pull/15010.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15010/head:pull/15010 PR: https://git.openjdk.org/jdk/pull/15010 From pli at openjdk.org Mon Aug 28 08:50:10 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 28 Aug 2023 08:50:10 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v2] In-Reply-To: References: Message-ID: On Fri, 4 Aug 2023 20:10:19 GMT, Vladimir Kozlov wrote: >> Hi @vnkozlov , >> >> Thanks for your reply. But it still has problems. >> >>> About your change to allow -Xbatch. Let me clarify, if you exclude -Xcomp mode (which I agree with) by checking UseInterpreter flag for true, then a method could be always executed in Interpeter to get reference result (even with -XX:CompileThreshold=100) by calling method once first (we do that in other tests). >> >>> You don't need to call WB.lockCompilation() if you exclude -Xcomp mode. There will be no compilation requests for called method when you call the method first time because compilation threshold will not be reached - it is guarantee that method will be executed in Interpreter. And you have the assert to verify that. >> >> These tests are a bit different because we test loops. If the loop iteration count reaches some threshold, the loop will be *OSR compiled* even test method is called only once. I just did an experiment according to your suggestion. After removing `WB.lockCompilation()` and updating loop iteration count to 100,000, I got assertion failure that tells me the test method is NOT running in interpreter. >> >> >> STDERR: >> java.lang.AssertionError >> at compiler.vectorization.runner.VectorizationTestRunner.runTestOnMethod(VectorizationTestRunner.java:131) >> at compiler.vectorization.runner.VectorizationTestRunner.run(VectorizationTestRunner.java:73) >> at compiler.vectorization.runner.VectorizationTestRunner.main(VectorizationTestRunner.java:215) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) >> at java.base/java.lang.reflect.Method.invoke(Method.java:580) >> at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) >> at java.base/java.lang.Thread.run(Thread.java:1570) >> >> >> A solution to this may be adding one more check of `CICompileOSR` is OFF if we still want to use interpreted execution for the reference result. >> >> Now the question is, which verification approach do you think is better? "C2 vs. interpreted" or "C2 vs. C1"? > >> A solution to this may be adding one more check of `CICompileOSR` is OFF if we still want to use interpreted execution for the reference result. > > I would suggest to use `WB.setBooleanVMFlag("CICompileOSR", false);`. But it is debug flag which can be set only in debug VM. There are may be other product flags you can temporary set to avoid compilation without locking. > >> >> Now the question is, which verification approach do you think is better? "C2 vs. interpreted" or "C2 vs. C1"? > > We usually use Interpreter as gold standard. Thanks @vnkozlov @TobiHartmann @eme64 for your review. Patch is currently merged with latest master and re-tested. I will integrate this if there's no further comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1695293144 From pli at openjdk.org Mon Aug 28 09:01:14 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 28 Aug 2023 09:01:14 GMT Subject: RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE In-Reply-To: References: <-McZdVKFHZcQcCJhosf7KVw34o6ZcAHr0hqGH7QIqsw=.eafa3a9e-f521-4097-aa6c-b00d5302b63d@github.com> Message-ID: On Tue, 15 Aug 2023 12:55:35 GMT, Emanuel Peter wrote: >>> We had this running on Aarch64 machines with asimd but without sve. Why do you think that this even passed with my 32 byte assumption (256 bit)? You say it should only have 128 bit. >> >> Assuming NEON has larger vector size (256 bit, which is wrong) won't result in any failure on NEON-only machines. But it results in running less IR checks on 256-bit SVE. Let's take below IR condition change as an example. >> >> - applyIfAnd = {"AlignVector", "false", "MaxVectorSize", ">= 8", "MaxVectorSize", "<= 16"}, >> + applyIfAnd = {"AlignVector", "false", "MaxVectorSize", ">= 8"}, >> >> Before this patch, the existence of vector IRs won't be checked on 256-bit SVE as we have `MaxVectorSize <= 16`. After this patch, it will be checked. The main reason of failures on 512-bit SVE is the lack of `sve == false` check so the IR tests will run on machines with vector length > 256 bits. >> >>> What is the max_pre for? Is it necessary? >> >> It just adds a prefix to make the comment more precise, as SVE uses scalable vectors and the vector length ranges from 128 bits to 2048 bits. > > @pfustc you will have to merge the changes from [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308). Thanks @eme64 @vnkozlov for review. Patch is currently merged with master. The python generator file is also updated and attached to the JBS page. > Ah. Just one more idea: Since you now have even longer vector widths with 2048 bits: Should we not add some cases with even larger dependency offsets? We should go further than -196, 196. We could consider adding 255, 256, 511, 512, 1024, 1536 (positive and negative). Of course the question is if that increases the runtime too much, what do you think? This is a good suggestion but I don't think it's necessary now and in the near future. Although Arm architecture document says SVE vector length can be at most 2048 bits, so far, the largest SVE vector length of real AArch64 CPUs in the world is 512 bits. AFAIK, there will not be real CPUs with 1024-bit SVE in the short term. So I prefer keeping current offset values for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15010#issuecomment-1695308307 From vkempik at openjdk.org Mon Aug 28 09:18:15 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Mon, 28 Aug 2023 09:18:15 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v2] In-Reply-To: References: Message-ID: On Thu, 17 Aug 2023 07:07:17 GMT, Fei Yang wrote: >> Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: >> >> Change fsgnj_d(dst, src, src) to fmv_d(dst, src) > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4258: > >> 4256: >> 4257: // setting roundig mode to double->long (rm_direct) and long->double (rm_back) conversions >> 4258: RoundingMode rm_direct, rm_back; > > Can we use the same rounding mode for conversions in both direction? Say `rup` for `ceil`, and `rdn` for `floor`. > I see this policy is used for both glibc [1] and V8. > > [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/riscv/rv64/rvd/s_ceil.c;h=6c355cd72691c45c97201fe8947683287982ade9;hb=41d8c3bc33bcae1ebb8077b0442caef4917f763a basically for this case the backward rounding mode doesn't matter at all. it would only matter if intermediate integer contained something not representable in double. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1307167492 From rcastanedalo at openjdk.org Mon Aug 28 09:18:22 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 28 Aug 2023 09:18:22 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v7] In-Reply-To: References: Message-ID: <-3-AhsGoBus4BpZg_cxOiko_thOo74AtRenIoWVn1HA=.9a8702fe-7afd-46f3-be58-ee081f4e60c7@github.com> On Mon, 28 Aug 2023 07:42:09 GMT, emmyyin wrote: >> src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1450: >> >>> 1448: } >>> 1449: >>> 1450: public void insert(LayoutNode n, int pos) { >> >> This method and also other code is duplicated from `HierarchicalLayoutManager`. Could the code be shared somehow? > > I have made the shared parts public in `HierarchicalLayoutManager` and reused in `HierarchicalStableLayoutManager`. Is this the way to go or should we move them out to a `LayoutManagerUtils` class instead? Reusing directly from `HierarchicalLayoutManager` looks good to me, since `HierarchicalStableLayoutManager` already depends on it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1307168854 From mdoerr at openjdk.org Mon Aug 28 09:31:08 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Aug 2023 09:31:08 GMT Subject: RFR: 8314513: [IR Framework] Some internal IR Framework tests are failing after JDK-8310308 on PPC and Cascade Lake In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 11:51:11 GMT, Christian Hagedorn wrote: > This patch fixes some internal IR framework failures after [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308): > - `testlibrary_tests/ir_framework/tests/TestBadFormat.java` on Linux ppc64le: > - `applyIfCPUFeature` clauses are false and the rule is not run. We will therefore not hit the format violations which the test expects to find. The fix here is to remove the `applyIfCPUFeature` constraints as the test is only interested in properly reporting format violations. > - `testlibrary_tests/ir_framework/examples/IRExample.java` on Cascade Lake x86_64: > - On Cascade Lake, `failOn` constraints need the same "always true" handling as `counts` constraints. This was missed in JDK-8310308. I've added the same `try-catch` handling as in `RawCountsConstraint::parse()`: > https://github.com/openjdk/jdk/blob/97b94cb1cdeba00f4bba7326a300c0336950f3ec/test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java#L97-L104 > > Thanks to @MBaesken and @TheRealMDoerr for reporting this and helping with some pre-PR testing. Would you like to rerun your testing on PPC and Cascade Lake again? > > Thanks, > Christian Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15415#pullrequestreview-1597905004 From duke at openjdk.org Mon Aug 28 09:49:24 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Mon, 28 Aug 2023 09:49:24 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v3] In-Reply-To: References: Message-ID: > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Fixes in double rounding intrinsic Use similar converting mode instead of different. Change pipe_class to default and remove effecr on cr. Indentation fixes, variables renaming, add more comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14991/files - new: https://git.openjdk.org/jdk/pull/14991/files/1c43b040..47a7fb2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=01-02 Stats: 45 lines in 3 files changed: 6 ins; 13 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/14991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14991/head:pull/14991 PR: https://git.openjdk.org/jdk/pull/14991 From duke at openjdk.org Mon Aug 28 09:56:45 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Mon, 28 Aug 2023 09:56:45 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v4] In-Reply-To: References: Message-ID: <0AxOOGlNcodBw74WDB0a9WOE12izVxssdTzJ8NkfGy8=.bb194dad-5a06-4af9-aa71-3972a66d6df3@github.com> > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Ilya Gavrilin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8312569 - Fixes in double rounding intrinsic Use similar converting mode instead of different. Change pipe_class to default and remove effecr on cr. Indentation fixes, variables renaming, add more comments. - Change fsgnj_d(dst, src, src) to fmv_d(dst, src) - Fix comments style - Add missing intrinsic for double rounding Add intrinsics for rounding with mode like Math.ceil/floor/rint. Improve whitespaces from previous pr. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14991/files - new: https://git.openjdk.org/jdk/pull/14991/files/47a7fb2b..f4a9dd75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=02-03 Stats: 42765 lines in 1650 files changed: 23201 ins; 8068 del; 11496 mod Patch: https://git.openjdk.org/jdk/pull/14991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14991/head:pull/14991 PR: https://git.openjdk.org/jdk/pull/14991 From mdoerr at openjdk.org Mon Aug 28 10:17:16 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Aug 2023 10:17:16 GMT Subject: RFR: 8299658: C1 compilation crashes in LinearScan::resolve_exception_edge In-Reply-To: References: Message-ID: <8Az4QIw_psvPb9W1IrPK-TacyQ3vlt6oD-7ZbN_XvXk=.c8979849-a27f-48b9-8335-97ee8a307dbe@github.com> On Fri, 18 Aug 2023 20:17:52 GMT, Martin Doerr wrote: > This is a quick fix for the C1 problem described in the JBS issue. > When we find an illegal operand (modelled by nullptr) while resolving an exception edge we can propagate this state to the phi function and skip the edge. > > If somebody finds a better way to propagate the "illegal" state to the phi function, I can change or close this PR. > > Please review. A nice regression test would be a good thing, but probably not easy to write. Thanks for the reviews! I think it's time for verifying it in the field. Let's ship it with an EA build. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15348#issuecomment-1695418297 From mdoerr at openjdk.org Mon Aug 28 10:17:16 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 28 Aug 2023 10:17:16 GMT Subject: Integrated: 8299658: C1 compilation crashes in LinearScan::resolve_exception_edge In-Reply-To: References: Message-ID: On Fri, 18 Aug 2023 20:17:52 GMT, Martin Doerr wrote: > This is a quick fix for the C1 problem described in the JBS issue. > When we find an illegal operand (modelled by nullptr) while resolving an exception edge we can propagate this state to the phi function and skip the edge. > > If somebody finds a better way to propagate the "illegal" state to the phi function, I can change or close this PR. > > Please review. A nice regression test would be a good thing, but probably not easy to write. This pull request has now been integrated. Changeset: cf2d33ca Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/cf2d33ca2ee08c61596ab10b7602500a6931fa31 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod 8299658: C1 compilation crashes in LinearScan::resolve_exception_edge Reviewed-by: thartmann, lucy ------------- PR: https://git.openjdk.org/jdk/pull/15348 From chagedorn at openjdk.org Mon Aug 28 10:31:10 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Aug 2023 10:31:10 GMT Subject: RFR: 8314513: [IR Framework] Some internal IR Framework tests are failing after JDK-8310308 on PPC and Cascade Lake In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 11:51:11 GMT, Christian Hagedorn wrote: > This patch fixes some internal IR framework failures after [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308): > - `testlibrary_tests/ir_framework/tests/TestBadFormat.java` on Linux ppc64le: > - `applyIfCPUFeature` clauses are false and the rule is not run. We will therefore not hit the format violations which the test expects to find. The fix here is to remove the `applyIfCPUFeature` constraints as the test is only interested in properly reporting format violations. > - `testlibrary_tests/ir_framework/examples/IRExample.java` on Cascade Lake x86_64: > - On Cascade Lake, `failOn` constraints need the same "always true" handling as `counts` constraints. This was missed in JDK-8310308. I've added the same `try-catch` handling as in `RawCountsConstraint::parse()`: > https://github.com/openjdk/jdk/blob/97b94cb1cdeba00f4bba7326a300c0336950f3ec/test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java#L97-L104 > > Thanks to @MBaesken and @TheRealMDoerr for reporting this and helping with some pre-PR testing. Would you like to rerun your testing on PPC and Cascade Lake again? > > Thanks, > Christian Thanks Martin! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15415#issuecomment-1695439917 From duke at openjdk.org Mon Aug 28 10:32:55 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Mon, 28 Aug 2023 10:32:55 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v5] In-Reply-To: References: Message-ID: <8vCtVR7KyiI-3e_oaNz5IR7e2_Hzl-lQ7LxWxlD7cPs=.4a4e417b-ea95-400a-8653-db04ed041e28@github.com> > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Ilya Gavrilin has updated the pull request incrementally with three additional commits since the last revision: - Remove unused cr flag in node - Merge branch 'JDK-8312569' of github.com:Ilyagavrilin/jdk into JDK-8312569 - Move round intrinsic to c2_MacroAssembler_riscv ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14991/files - new: https://git.openjdk.org/jdk/pull/14991/files/f4a9dd75..f6dd7b16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=03-04 Stats: 150 lines in 5 files changed: 63 ins; 84 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14991/head:pull/14991 PR: https://git.openjdk.org/jdk/pull/14991 From chagedorn at openjdk.org Mon Aug 28 10:34:18 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Aug 2023 10:34:18 GMT Subject: Integrated: 8314513: [IR Framework] Some internal IR Framework tests are failing after JDK-8310308 on PPC and Cascade Lake In-Reply-To: References: Message-ID: <5gWktDX8v-T6mEf6R6mEkzs21SL_kQjU2RKlO_L4k3U=.59731871-cb99-40ea-b9a1-b269f804973c@github.com> On Thu, 24 Aug 2023 11:51:11 GMT, Christian Hagedorn wrote: > This patch fixes some internal IR framework failures after [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308): > - `testlibrary_tests/ir_framework/tests/TestBadFormat.java` on Linux ppc64le: > - `applyIfCPUFeature` clauses are false and the rule is not run. We will therefore not hit the format violations which the test expects to find. The fix here is to remove the `applyIfCPUFeature` constraints as the test is only interested in properly reporting format violations. > - `testlibrary_tests/ir_framework/examples/IRExample.java` on Cascade Lake x86_64: > - On Cascade Lake, `failOn` constraints need the same "always true" handling as `counts` constraints. This was missed in JDK-8310308. I've added the same `try-catch` handling as in `RawCountsConstraint::parse()`: > https://github.com/openjdk/jdk/blob/97b94cb1cdeba00f4bba7326a300c0336950f3ec/test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java#L97-L104 > > Thanks to @MBaesken and @TheRealMDoerr for reporting this and helping with some pre-PR testing. Would you like to rerun your testing on PPC and Cascade Lake again? > > Thanks, > Christian This pull request has now been integrated. Changeset: 5c4f1dc4 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/5c4f1dc43ebd1ad699923e0082cfed72ba414982 Stats: 30 lines in 2 files changed: 5 ins; 12 del; 13 mod 8314513: [IR Framework] Some internal IR Framework tests are failing after JDK-8310308 on PPC and Cascade Lake Reviewed-by: kvn, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/15415 From rehn at openjdk.org Mon Aug 28 10:57:13 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 28 Aug 2023 10:57:13 GMT Subject: RFR: 8315070: RISC-V: Clean up platform dependent inline headers In-Reply-To: <6KZojI_U1KT3lQq8wJ0XZuNDC69ArpTWOr_LmIXAV80=.c993bbb2-bf80-4e8f-9d77-f7de952a51d5@github.com> References: <6KZojI_U1KT3lQq8wJ0XZuNDC69ArpTWOr_LmIXAV80=.c993bbb2-bf80-4e8f-9d77-f7de952a51d5@github.com> Message-ID: On Sat, 26 Aug 2023 11:55:57 GMT, Feilong Jiang wrote: > Hi team, please review this small clean-up changes. > Inspired by [JDK-8267464](https://bugs.openjdk.org/browse/JDK-8267464), riscv port still has one place that includes platform-dependent inline header `assembler_riscv.inline.hpp`, it could be replaced with platform-independent header `asm/assembler.inline.hpp`. > > Testing: > - [x] release build on linux-riscv64 > - [x] tier1 on linux-riscv64 with release build Thank you! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15437#pullrequestreview-1598057453 From duke at openjdk.org Mon Aug 28 11:01:48 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Mon, 28 Aug 2023 11:01:48 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v6] In-Reply-To: References: Message-ID: <4qYfHNw3G57_Mfqo6R5YNR8wwijGYZCke4_VwxNNsOI=.d3a3844a-781b-444f-8de6-4e3109b0d499@github.com> > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Fix whitespaces in c2_MacroAssembler ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14991/files - new: https://git.openjdk.org/jdk/pull/14991/files/f6dd7b16..492fb25c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14991/head:pull/14991 PR: https://git.openjdk.org/jdk/pull/14991 From epeter at openjdk.org Mon Aug 28 11:06:11 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 28 Aug 2023 11:06:11 GMT Subject: RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE In-Reply-To: References: <-McZdVKFHZcQcCJhosf7KVw34o6ZcAHr0hqGH7QIqsw=.eafa3a9e-f521-4097-aa6c-b00d5302b63d@github.com> Message-ID: On Mon, 28 Aug 2023 08:58:13 GMT, Pengfei Li wrote: >> @pfustc you will have to merge the changes from [JDK-8310308](https://bugs.openjdk.org/browse/JDK-8310308). > > Thanks @eme64 @vnkozlov for review. Patch is currently merged with master. The python generator file is also updated and attached to the JBS page. > >> Ah. Just one more idea: Since you now have even longer vector widths with 2048 bits: Should we not add some cases with even larger dependency offsets? We should go further than -196, 196. We could consider adding 255, 256, 511, 512, 1024, 1536 (positive and negative). Of course the question is if that increases the runtime too much, what do you think? > > This is a good suggestion but I don't think it's necessary now and in the near future. Although Arm architecture document says SVE vector length can be at most 2048 bits, so far, the largest SVE vector length of real AArch64 CPUs in the world is 512 bits. AFAIK, there will not be real CPUs with 1024-bit SVE in the short term. So I prefer keeping current offset values for now. @pfustc ok, fair enough. If there is no real hardware, and not coming anytime soon, then we can leave the offsets as is. Thanks again for the fix! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15010#issuecomment-1695495355 From chagedorn at openjdk.org Mon Aug 28 11:09:15 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Aug 2023 11:09:15 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v5] In-Reply-To: <8axqn5CwX2h1-lA3zpDuuWluj70sYV6NwY8f5iEyXW0=.b88c04c0-ff31-41a8-a94c-cbdf5b3fdfb2@github.com> References: <8axqn5CwX2h1-lA3zpDuuWluj70sYV6NwY8f5iEyXW0=.b88c04c0-ff31-41a8-a94c-cbdf5b3fdfb2@github.com> Message-ID: <5USj_vkpPwgmp0LZO1u8S_udGf67ry53CZP2PRtecpY=.6906b667-18ca-46cd-8179-e7ad204a9981@github.com> On Mon, 28 Aug 2023 08:30:07 GMT, Pengfei Li wrote: >> This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. >> >> To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. >> >> Tested various of compiler control related VM flags on x86 and AArch64. > > Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into flagless > - Remove useless conditions and imports > - Revert to the 1st commit and re-address comments > - Re-work correctness check to allow "-Xbatch" > - 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests > > This patch removes `@require vm.flagless` annotations from HotSpot jtreg > tests in `compiler/vectorization/runner`. All jtreg cases in this folder > are invoked by test driver `VectorizationTestRunner.java` which checks > both correctness and vectorizability (IR) for each test method. We added > flagless requirement before because extra flags may mess with compiler > control in the test driver for correctness check. But `flagless` has a > side effect that it makes tests with extra flags skipped. So we propose > to get rid of it now. > > To adapt the removal of `@require vm.flagless`, a few checks are added > in the test driver to skip the correctness check if extra flags make the > compiler control not work. This patch also moves previously hard-coded > flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. > > Tested various of compiler control related VM flags on x86 and AArch64. I will also give this another spin in our testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1695499328 From rcastanedalo at openjdk.org Mon Aug 28 12:05:17 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 28 Aug 2023 12:05:17 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v12] In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 07:53:12 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with one additional commit since the last revision: > > only do sanityCheckEdges where necessary As far as I understand, the only issue left to be addressed in this PR is the performance overhead caused by `sanityCheckEdges()`. I (together with @tobiasholenstein and @chhagedorn) suggest to replace it with a simpler method that only contains the logic strictly necessary for layout correctness, something like: private void ensureNeighborEdgeConsistency() { for (LayoutNode n : nodes) { n.succs.removeIf(e -> !nodes.contains(e.to)); n.preds.removeIf(e -> !nodes.contains(e.from)); } } This does not entirely remove the overhead but at least mitigates it to an acceptable level. While at it, could you remove the unused `sanityCheck...()` methods? Thanks! ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14349#pullrequestreview-1598153971 From chagedorn at openjdk.org Mon Aug 28 12:46:40 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Aug 2023 12:46:40 GMT Subject: RFR: 8314997: Missing optimization opportunities due to missing try_clean_mem_phi() calls Message-ID: While working on a Valhalla bug, I've noticed that we sometimes miss `RegionNode::try_clean_mem_phi()` calls to remove a useless diamond If True False Region with only a single memory phi. This blocks further optimizations like converting a loop into a counted one. The code in Valhalla looks slightly different but the problem is also reproducible in mainline. **Problem** In the test case, a region is transformed in IGVN such that it merges a diamond without any dependencies on both paths. The region has two phis. One of them is a memory phi which could be transformed by `RegionNode::try_clean_mem_phi()`. But when processing the region with its two phis in IGVN, we do not optimize the memory phi away because `has_unique_phi()` is false and we bail out: https://github.com/openjdk/jdk/blob/725ec0ce1b463b21cd4c5287cf4ccbee53ec7349/src/hotspot/share/opto/cfgnode.cpp#L450-L471 Later in IGVN, the second phi dies and we only have the single memory phi left. But the region will not be added to the IGVN worklist again to re-apply `try_clean_mem_phi()`. We therefore miss the removal of the diamond and we fail to apply further optimizations. In the test case, we fail to convert the loop into a counted loop. **Proposed Fix** The fix I propose is to try to apply `try_clean_mem_phi()` whenever a region is merging a diamond with the assumption that the transformation of a memory phi does not hurt when being applied without being able to remove the region with the diamond (because there are other phis left that cannot be removed). Another option would be to re-add the region to the IGVN worklist when the second last phi dies. But the first approach seems simpler and less invasive. I've also applied some clean-ups and added an IR test. Thanks, Christian ------------- Commit messages: - 8314997: Missing optimizations due to missing try_clean_mem_phi() calls Changes: https://git.openjdk.org/jdk/pull/15445/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15445&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314997 Stats: 263 lines in 3 files changed: 205 ins; 34 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/15445.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15445/head:pull/15445 PR: https://git.openjdk.org/jdk/pull/15445 From duke at openjdk.org Mon Aug 28 13:54:05 2023 From: duke at openjdk.org (emmyyin) Date: Mon, 28 Aug 2023 13:54:05 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v13] In-Reply-To: References: Message-ID: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with one additional commit since the last revision: redoing edge sanity check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/c9ea772c..3418f070 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=11-12 Stats: 49 lines in 1 file changed: 0 ins; 43 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From chagedorn at openjdk.org Mon Aug 28 14:19:24 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Aug 2023 14:19:24 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v13] In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 13:54:05 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with one additional commit since the last revision: > > redoing edge sanity check Thanks for the updates! There are only some easy clean-up suggestions left and a copyright year to be updated. Otherwise, it looks good to me. Thanks for your work! As discussed, we can move everything else to follow-up tasks. src/utils/IdealGraphVisualizer/HierarchicalLayout/pom.xml line 51: > 49: > 50: com.sun.hotspot.igv > 51: Layout You should also update the copyright year of this file. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 283: > 281: // Whether the node has non-self reversed edges going downwards. > 282: // If so, reversed edges going upwards are drawn to the left. > 283: boolean hasReversedDown = reversedDown.size() > 0; Suggestion: boolean hasReversedDown = !reversedDown.isEmpty(); src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 285: > 283: boolean hasReversedDown = reversedDown.size() > 0; > 284: > 285: SortedSet reversedUp = null; Suggestion: SortedSet reversedUp; src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 388: > 386: > 387: node.xOffset = -minX; > 388: node.width += -minX; Somewhat confusing. Rather use: Suggestion: node.width -= minX; src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 425: > 423: > 424: // Only apply updates if there are any > 425: if (linkActions.size() > 0 || vertexActions.size() > 0) { Suggestion: if (!linkActions.isEmpty() || !vertexActions.isEmpty()) { src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java line 1361: > 1359: LayoutNode cur = e.from; > 1360: LayoutEdge curEdge = e; > 1361: while (cur.vertex == null && cur.preds.size() != 0) { Suggestion: while (cur.vertex == null && !cur.preds.isEmpty()) { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14349#pullrequestreview-1598362810 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1307467333 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1307473308 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1307473606 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1307474584 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1307475421 PR Review Comment: https://git.openjdk.org/jdk/pull/14349#discussion_r1307480632 From duke at openjdk.org Mon Aug 28 14:29:47 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Mon, 28 Aug 2023 14:29:47 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v7] In-Reply-To: References: Message-ID: > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Fix intrinsic round_node parameter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14991/files - new: https://git.openjdk.org/jdk/pull/14991/files/492fb25c..82b5e593 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=05-06 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14991/head:pull/14991 PR: https://git.openjdk.org/jdk/pull/14991 From chagedorn at openjdk.org Mon Aug 28 14:46:26 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Aug 2023 14:46:26 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination Message-ID: This is the last clean-up PR before the complete fix for Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch includes: - Removal of `ConI`->`Opaque1`->`Conv2B` input nodes for `ParsePredicateNodes` with the following additional changes: - Adjusting `ParsePredicateNode` to block unwanted optimizations (added empty `ParsePredicateNode::Ideal()`). - Changing `Compile::_parse_predicate_opaqs` to not store `Opaque1Nodes` to keep track of Parse Predicates but instead storing `ParsePredicateNodes` directly. Renamed to `Compile::_parse_predicates` and adjusted related methods. - Removed asserts matching `Opaque1` -> `Conv2B` shape. - Cleaning up `eliminate_useless_predicates()`: - Adjust code to find useful/useless Parse Predicates with the new `Compile::_parse_predicates` list with `ParsePredicateNodes` instead of `Opaque1Nodes`. - Changing `ParsePredicateNode` to carry a `_useless` state which simplifies the elimination of useless predicates with `eliminate_useless_predicates()` and during IGVN (added `ParsePredicateNode::Value()` for that which also removes the predicate once we are in post loop opts IGVN). - Some refactoring/clean-ups of involved code. Testing: tier1-7 + some fuzzer testing Thanks, Christian ------------- Commit messages: - 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination Changes: https://git.openjdk.org/jdk/pull/15449/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15449&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305637 Stats: 319 lines in 13 files changed: 164 ins; 70 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/15449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15449/head:pull/15449 PR: https://git.openjdk.org/jdk/pull/15449 From chagedorn at openjdk.org Mon Aug 28 15:05:16 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 28 Aug 2023 15:05:16 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests [v5] In-Reply-To: <8axqn5CwX2h1-lA3zpDuuWluj70sYV6NwY8f5iEyXW0=.b88c04c0-ff31-41a8-a94c-cbdf5b3fdfb2@github.com> References: <8axqn5CwX2h1-lA3zpDuuWluj70sYV6NwY8f5iEyXW0=.b88c04c0-ff31-41a8-a94c-cbdf5b3fdfb2@github.com> Message-ID: On Mon, 28 Aug 2023 08:30:07 GMT, Pengfei Li wrote: >> This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. >> >> To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. >> >> Tested various of compiler control related VM flags on x86 and AArch64. > > Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into flagless > - Remove useless conditions and imports > - Revert to the 1st commit and re-address comments > - Re-work correctness check to allow "-Xbatch" > - 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests > > This patch removes `@require vm.flagless` annotations from HotSpot jtreg > tests in `compiler/vectorization/runner`. All jtreg cases in this folder > are invoked by test driver `VectorizationTestRunner.java` which checks > both correctness and vectorizability (IR) for each test method. We added > flagless requirement before because extra flags may mess with compiler > control in the test driver for correctness check. But `flagless` has a > side effect that it makes tests with extra flags skipped. So we propose > to get rid of it now. > > To adapt the removal of `@require vm.flagless`, a few checks are added > in the test driver to skip the correctness check if extra flags make the > compiler control not work. This patch also moves previously hard-coded > flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. > > Tested various of compiler control related VM flags on x86 and AArch64. Testing looked good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15011#pullrequestreview-1598486980 From roland at openjdk.org Mon Aug 28 15:21:19 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 28 Aug 2023 15:21:19 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 14:38:34 GMT, Christian Hagedorn wrote: > This is the last clean-up PR before the complete fix for Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > This patch includes: > - Removal of `ConI`->`Opaque1`->`Conv2B` input nodes for `ParsePredicateNodes` with the following additional changes: > - Adjusting `ParsePredicateNode` to block unwanted optimizations (added empty `ParsePredicateNode::Ideal()`). > - Changing `Compile::_parse_predicate_opaqs` to not store `Opaque1Nodes` to keep track of Parse Predicates but instead storing `ParsePredicateNodes` directly. Renamed to `Compile::_parse_predicates` and adjusted related methods. > - Removed asserts matching `Opaque1` -> `Conv2B` shape. > - Cleaning up `eliminate_useless_predicates()`: > - Adjust code to find useful/useless Parse Predicates with the new `Compile::_parse_predicates` list with `ParsePredicateNodes` instead of `Opaque1Nodes`. > - Changing `ParsePredicateNode` to carry a `_useless` state which simplifies the elimination of useless predicates with `eliminate_useless_predicates()` and during IGVN (added `ParsePredicateNode::Value()` for that which also removes the predicate once we are in post loop opts IGVN). > - Some refactoring/clean-ups of involved code. > > Testing: tier1-7 + some fuzzer testing > > Thanks, > Christian Opaque1 nodes fold away after loop opts which guarantees the parse predicate are removed too after loop opts. In the new code, without the Opaque1 nodes, what causes the parse predicate to be removed after loop opts? src/hotspot/share/opto/loopPredicate.cpp line 314: > 312: assert(new_predicate_proj->is_IfTrue(), "the success projection of a Parse Predicate is a true projection"); > 313: ParsePredicateNode* parse_predicate = new_predicate_proj->in(0)->as_ParsePredicate(); > 314: _igvn.hash_delete(parse_predicate); That looks strange. Wasn't the reason for the `hash_delete` in the previous version of the code that the `iff` was then modified. Is it still needed? ------------- PR Review: https://git.openjdk.org/jdk/pull/15449#pullrequestreview-1598472992 PR Review Comment: https://git.openjdk.org/jdk/pull/15449#discussion_r1307538018 From roland at openjdk.org Mon Aug 28 15:27:09 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 28 Aug 2023 15:27:09 GMT Subject: RFR: 8314997: Missing optimization opportunities due to missing try_clean_mem_phi() calls In-Reply-To: References: Message-ID: <1jdayacZ4fejQ6KJ7EBWU3uVjlIkL35hF-heWKegFsU=.150bb15b-4b75-4b2a-b993-589f57f15d51@github.com> On Mon, 28 Aug 2023 12:38:43 GMT, Christian Hagedorn wrote: > While working on a Valhalla bug, I've noticed that we sometimes miss `RegionNode::try_clean_mem_phi()` calls to remove a useless diamond > > If > True False > Region > > with only a single memory phi. This blocks further optimizations like converting a loop into a counted one. The code in Valhalla looks slightly different but the problem is also reproducible in mainline. > > **Problem** > > In the test case, a region is transformed in IGVN such that it merges a diamond without any dependencies on both paths. The region has two phis. One of them is a memory phi which could be transformed by `RegionNode::try_clean_mem_phi()`. But when processing the region with its two phis in IGVN, we do not optimize the memory phi away because `has_unique_phi()` is false and we bail out: > https://github.com/openjdk/jdk/blob/725ec0ce1b463b21cd4c5287cf4ccbee53ec7349/src/hotspot/share/opto/cfgnode.cpp#L450-L471 > > Later in IGVN, the second phi dies and we only have the single memory phi left. But the region will not be added to the IGVN worklist again to re-apply `try_clean_mem_phi()`. We therefore miss the removal of the diamond and we fail to apply further optimizations. In the test case, we fail to convert the loop into a counted loop. > > **Proposed Fix** > > The fix I propose is to try to apply `try_clean_mem_phi()` whenever a region is merging a diamond with the assumption that the transformation of a memory phi does not hurt when being applied without being able to remove the region with the diamond (because there are other phis left that cannot be removed). Another option would be to re-add the region to the IGVN worklist when the second last phi dies. But the first approach seems simpler and less invasive. > > I've also applied some clean-ups and added an IR test. > > Thanks, > Christian Looks reasonable to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15445#pullrequestreview-1598528051 From qamai at openjdk.org Mon Aug 28 15:37:53 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 28 Aug 2023 15:37:53 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > Please kindly review, thanks very much. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15440/files - new: https://git.openjdk.org/jdk/pull/15440/files/5fc5cf3a..ae7cc260 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15440&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15440&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15440/head:pull/15440 PR: https://git.openjdk.org/jdk/pull/15440 From rcastanedalo at openjdk.org Mon Aug 28 17:37:19 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 28 Aug 2023 17:37:19 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v13] In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 13:54:05 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with one additional commit since the last revision: > > redoing edge sanity check Looks good, thanks again for your work and for patiently addressing our feedback, Emmy! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14349#pullrequestreview-1598746571 From jbhateja at openjdk.org Mon Aug 28 18:59:20 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 28 Aug 2023 18:59:20 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22] In-Reply-To: References: Message-ID: <2NFjfo8C9unam7w-JOyVthNbkmTlAJH8tt56THmx6o8=.6858a6c3-dc83-46fa-adda-8ea1be6e02a0@github.com> On Fri, 25 Aug 2023 18:59:47 GMT, Srinivas Vamsi Parasa wrote: >>> Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them. >> >> Hello Vladimir (@vnkozlov) , >> >> As per your suggestion, the implementation was adjusted to address the regressions caused for STAGGER and REPEATED type of input data patterns. >> Please see below the new JMH performance data using the adjusted implementation. >> >> In the new implementation, we don't call the AVX512 sort intrinsic at the top level (`Arrays.sort()`) . Instead, we take a decomposed or a 'building blocks' approach where we only intrinsify (using AVX512 instructions) the partitioning and small array sort functions used in the current baseline ( `DualPivotQuickSort.sort()` ). Since the current baseline has logic to detect and sort special input patterns like STAGGER, we fallback to the current baseline instead of using AVX512 partitioning and sorting (which works best for RANDOM, REPEATED and SHUFFLE patterns). >> >> Data below shows `Arrays.sort()` performance comparison between the current **Java baseline (DPQS)** vs. **AVX512 sort** (this PR) using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568) >> >> - The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort` >> - The scores shown are the average time (us/op), thus lower is better. The last column towards the right shows the speedup. >> >> >> | Benchmark | Mode | Size | Baseline DPQS (us/op) | AVX512 partitioning & sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | --- | >> | ArraysSort.Double.testSort | RANDOM | 800 | 6.7 | 4.8 | 1.39x | >> | ArraysSort.Double.testSort | RANDOM | 7000 | 234.1 | 51.5 | **4.55x** | >> | ArraysSort.Double.testSort | RANDOM | 50000 | 2155.9 | 470.0 | **4.59x** | >> | ArraysSort.Double.testSort | RANDOM | 300000 | 15076.3 | 3391.3 | **4.45x** | >> | ArraysSort.Double.testSort | RANDOM | 2000000 | 116445.5 | 27491.7 | **4.24x** | >> | ArraysSort.Double.testSort | REPEATED | 800 | 2.3 | 1.7 | 1.35x | >> | ArraysSort.Double.testSort | REPEATED | 7000 | 23.3 | 12... > >> @vamsi-parasa I submitted our testing of latest v28 version. It found issue in `ArraysSort.java` new benchmark file. You missed `,`after year in copyright line: >> >> ``` >> * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. >> ``` > > Thank you, Vladimir! Hi @vamsi-parasa , I did some runs on linux by disabling intrinsification of _arraySort and _arrayPartition since these are not handled for windows and non-x86 targets, just to measure the impact of java side code re-factoring. ![image](https://github.com/openjdk/jdk/assets/59989778/2866e684-955d-4902-af61-e08ac3f548d9) ![image](https://github.com/openjdk/jdk/assets/59989778/0a5e3265-b821-4a43-874e-6f1501423a07) ![image](https://github.com/openjdk/jdk/assets/59989778/f192f259-36e6-4e3e-8082-5cb0c05c9f10) Please find the results, may I request you to kindly verify performance number on windows to be sure that there are no performance degradation. Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1696209731 From duke at openjdk.org Mon Aug 28 21:27:25 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 28 Aug 2023 21:27:25 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: References: Message-ID: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > > **Arrays.sort performance data using JMH benchmarks for arrays with random data** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | > | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | > | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | > | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | > | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | > | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | > | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | > | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | > | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | > | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | > | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | > | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | > | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | > | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | > | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | > | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | > | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | > | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | > | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | > | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | > | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | > | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | > | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | > | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | > | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | > | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | > | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | > | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | > | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | > | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | > | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | > | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | > | ArraysSort.longSort | 1000 | 10.449 | 6.239 | 1.7 | > | ArraysSort.longSort | 10000 | 307.074 | 70.284 | **4.4** | > | ArraysSor... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Clean up parameters passed to arrayPartition; update the check to load library ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/e44f11a6..9642d852 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=28-29 Stats: 56 lines in 7 files changed: 17 ins; 19 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Mon Aug 28 21:27:30 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 28 Aug 2023 21:27:30 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 22:04:45 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary import in Arrays.java > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4143: > >> 4141: log_info(library)("Loaded library %s, handle " INTPTR_FORMAT, JNI_LIB_PREFIX "x86_64" JNI_LIB_SUFFIX, p2i(libx86_64)); >> 4142: >> 4143: if (UseAVX > 2 && VM_Version::supports_avx512dq()) { > > This check should be done before you locate and load library Please see the check moved to before loading the library in the latest commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1307962504 From duke at openjdk.org Mon Aug 28 21:27:31 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 28 Aug 2023 21:27:31 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v26] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 01:52:32 GMT, Sandhya Viswanathan wrote: >> pivotIndices array is being passed as a parameter to the partition intrinsic as it is updated in-place with the new pivot indices after partitioning. The Unsafe.ARRAY_INT_BASE_OFFSET is being used in libary_call.cpp to get the address of pivotIndices. > > As PivotIndices is local to the DualPivotQuickSort and is always going to be int array, there are other ways to compute the address in library_call.cpp without having to pass an additional argument. As suggested, the code was updated to no longer pass the offset(Unsafe.ARRAY_INT_BASE_OFFSET) for pivotIndices array. Please see the code in the latest commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1307964768 From duke at openjdk.org Mon Aug 28 22:45:21 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 28 Aug 2023 22:45:21 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 13:20:09 GMT, Erik Joelsson wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary import in Arrays.java > > make/modules/java.base/Lib.gmk line 239: > >> 237: ################################################################################ >> 238: >> 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64)+$(INCLUDE_COMPILER2), true+true+true) > > Is there a reason for this to only be supported on Linux? Hi Erik, The reason this PR is focused on Linux is because the AVX512 sort and partitioning routines are based on Intel?s x86-simd-library (https://github.com/intel/x86-simd-sort) which was originally developed with GCC as the target compiler. Thus, this PR has restricted itself to Linux as the code was tested using GCC/Linux platforms. Additionally, the x86_64 library is compiled for AVX512 using file specific compilation pragmas (`#pragma GCC target("avx512dq", "avx512f")`). This feature is absent for Windows/MSVC++ compiler.? Thanks, Vamsi ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1308027470 From sviswanathan at openjdk.org Mon Aug 28 23:19:21 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 28 Aug 2023 23:19:21 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Clean up parameters passed to arrayPartition; update the check to load library Thanks for considering all the review comments and fixing them. The PR looks good to me. @PaulSandoz Could you please review the Java part? ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14227#pullrequestreview-1599251153 PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1696551426 From erikj at openjdk.org Mon Aug 28 23:48:25 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 28 Aug 2023 23:48:25 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Clean up parameters passed to arrayPartition; update the check to load library make/modules/java.base/Lib.gmk line 240: > 238: > 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64)+$(INCLUDE_COMPILER2), true+true+true) > 240: $(eval $(call SetupJdkLibrary, BUILD_LIB_X86_64, \ As this is a C++ lib, consider using g++ for linking by setting: TOOLCHAIN := TOOLCHAIN_LINK_CXX make/modules/java.base/Lib.gmk line 241: > 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64)+$(INCLUDE_COMPILER2), true+true+true) > 240: $(eval $(call SetupJdkLibrary, BUILD_LIB_X86_64, \ > 241: NAME := x86_64, \ This looks like a rather generic name for a library. I would expect something a bit more descriptive. I also noted that @vnkozlov questioned needing a separate library for this and I didn't really find an answer. What do we gain from separating this into a separate dynamic library? make/modules/java.base/Lib.gmk line 247: > 245: LDFLAGS := $(LDFLAGS_JDKLIB) \ > 246: $(call SET_SHARED_LIBRARY_ORIGIN), \ > 247: LDFLAGS_linux := -Wl$(COMMA)--no-as-needed, \ This is set by default since JDK-8314554. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1308053768 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1308051118 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1308052384 From erikj at openjdk.org Mon Aug 28 23:48:20 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 28 Aug 2023 23:48:20 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 22:42:50 GMT, Srinivas Vamsi Parasa wrote: >> make/modules/java.base/Lib.gmk line 239: >> >>> 237: ################################################################################ >>> 238: >>> 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64)+$(INCLUDE_COMPILER2), true+true+true) >> >> Is there a reason for this to only be supported on Linux? > > Hi Erik, > > The reason this PR is focused on Linux is because the AVX512 sort and partitioning routines are based on Intel?s x86-simd-library (https://github.com/intel/x86-simd-sort) which was originally developed with GCC as the target compiler. Thus, this PR has restricted itself to Linux as the code was tested using GCC/Linux platforms. > Additionally, the x86_64 library is compiled for AVX512 using file specific compilation pragmas (`#pragma GCC target("avx512dq", "avx512f")`). This feature is absent for Windows/MSVC++ compiler.? > > Thanks, > Vamsi If it's tied to GCC as well, then we should probably include that in the condition here unless it's also expected to work with Clang. (`TOOLCHAIN_TYPE` = `gcc`) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1308050702 From pli at openjdk.org Tue Aug 29 01:35:44 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 29 Aug 2023 01:35:44 GMT Subject: RFR: 8312332: C2: Refactor SWPointer out from SuperWord [v2] In-Reply-To: References: Message-ID: > As discussed in JDK-8308994, we should first do some refactoring work before proceeding with the new post loop vectorization. In this patch, we have done the following. > > 1) We have created new C2 source files `vectorization.[cpp|hpp]` for shared logics and utilities for C2's auto-vectorization. So far we have moved class `SWPointer` and `VectorElementSizeStats` here from `superword.[cpp|hpp]`. > > 2) We have decoupled `SWPointer` from class `SuperWord` and renamed it to `VPointer` as it will be used by vectorizers other than SuperWord. The original class `SWPointer` and its inner class `Tracer` both have a `_slp` field initialized in their constructors. In this patch, we have replaced them by other fields and re-written the constructors for the same functionality. Original `SWPointer::invariant()` calls function `SuperWord::find_pre_loop_end()` for loop invariant checks. To help decoupling, we moved function `find_pre_loop_end()` to class `CountedLoopNode`. As function `SWPointer::Tracer::invariant_1()` is tightly coupled with `SuperWord` but only prints some debug messages, we temporarily removed it in this patch. We will consider adding it back after later refactoring of `SuperWord` so we added a `TODO` at its call site in this patch. > > 3) We have a lot of memory phi node checks in loop optimizations. So we added a utility function `is_memory_phi()` in `node.hpp`. > > Tested tier1~3 on x86 and AArch64. Also manually verified that option `VectorizeDebug` in compiler directives still works well. Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into swpointer - 8312332: C2: Refactor SWPointer out from SuperWord As discussed in JDK-8308994, we should first do some refactoring work before proceeding with the new post loop vectorization. In this patch, we have done the following. 1) We have created new C2 source files `vectorization.[cpp|hpp]` for shared logics and utilities for C2's auto-vectorization. So far we have moved class `SWPointer` and `VectorElementSizeStats` here from `superword.[cpp|hpp]`. 2) We have decoupled `SWPointer` from class `SuperWord` and renamed it to `VPointer` as it will be used by vectorizers other than SuperWord. The original class `SWPointer` and its inner class `Tracer` both have a `_slp` field initialized in their constructors. In this patch, we have replaced them by other fields and re-written the constructors for the same functionality. Original `SWPointer::invariant()` calls function `SuperWord::find_pre_loop_end()` for loop invariant checks. To help decoupling, we moved function `find_pre_loop_end()` to class `CountedLoopNode`. As function `SWPointer::Tracer::invariant_1()` is tightly coupled with `SuperWord` but only prints some debug messages, we temporarily removed it in this patch. We will consider adding it back after later refactoring of `SuperWord` so we added a `TODO` at its call site in this patch. 3) We have a lot of memory phi node checks in loop optimizations. So we added a utility function `is_memory_phi()` in `node.hpp`. Tested tier1~3 on x86 and AArch64. Also manually verified that option `VectorizeDebug` in compiler directives still works well. ------------- Changes: https://git.openjdk.org/jdk/pull/15013/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15013&range=01 Stats: 1924 lines in 7 files changed: 967 ins; 909 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/15013.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15013/head:pull/15013 PR: https://git.openjdk.org/jdk/pull/15013 From pli at openjdk.org Tue Aug 29 01:38:19 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 29 Aug 2023 01:38:19 GMT Subject: Integrated: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 07:42:59 GMT, Pengfei Li wrote: > Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java` fails on AArch64 CPUs with 512-bit SVE. The reason is that many test loops in the code cannot be vectorized due to data dependence but IR tests assume they can. > > On AArch64, these IR tests just check the CPU feature of `asimd` and incorrectly assumes AArch64 vectors are at most 256 bits. But actually, `asimd` on AArch64 only represents NEON vectors which are at most 128 bits. AArch64 CPUs may have another feature of `sve` which represents scalable vectors of at most 2048 bits. The vectorization won't succeed on 512-bit SVE CPUs if the memory offset between some read and write is less than 512 bits. > > As this jtreg is auto-generated by a python script, we have updated the script and re-generated this jtreg. In this new version, we checked the auto-vectorization on both NEON-only and NEON+SVE platforms. Below is the diff of the generator script. We have also attached the new script to the JBS page. > > > @@ -321,7 +321,8 @@ class Type: > p.append(Platform("avx512", ["avx512", "true"], 64)) > else: > assert False, "type not implemented" + self.name > - p.append(Platform("asimd", ["asimd", "true"], 32)) > + p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16)) > + p.append(Platform("sve", ["sve", "true"], 256)) > return p > > class Test: > @@ -457,7 +458,7 @@ class Generator: > lines.append(" * and various MaxVectorSize values, and +- AlignVector.") > lines.append(" *") > lines.append(" * Note: this test is auto-generated. Please modify / generate with script:") > - lines.append(" * https://bugs.openjdk.org/browse/JDK-8308606") > + lines.append(" * https://bugs.openjdk.org/browse/JDK-8312570") > lines.append(" *") > lines.append(" * Types: " + ", ".join([t.name for t in self.types])) > lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets])) > @@ -598,7 +599,8 @@ class Generator: > # IR rules > for p in test.t.platforms(): > elements = p.vector_width // test.t.size > - lines.append(f" // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}") > + max_pre = "max " if p.name == "sve" else "" > + lines.append(f" // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}") > ############### -Align... This pull request has now been integrated. Changeset: e5ea9aa9 Author: Pengfei Li URL: https://git.openjdk.org/jdk/commit/e5ea9aa9aa446503fd92cdba0a9653593c958597 Stats: 2062 lines in 1 file changed: 1422 ins; 0 del; 640 mod 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15010 From pli at openjdk.org Tue Aug 29 01:43:20 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 29 Aug 2023 01:43:20 GMT Subject: Integrated: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests In-Reply-To: References: Message-ID: <0IykzlU0Q3dpYsYgLVsItV5_CJ1t2RG67rJvYzm62_s=.b025f558-aa5a-4525-9b36-ce71f08636f7@github.com> On Tue, 25 Jul 2023 08:35:11 GMT, Pengfei Li wrote: > This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. > > To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. > > Tested various of compiler control related VM flags on x86 and AArch64. This pull request has now been integrated. Changeset: a03954e6 Author: Pengfei Li URL: https://git.openjdk.org/jdk/commit/a03954e6c57369446ef77136966662780e4b1c4e Stats: 77 lines in 23 files changed: 31 ins; 16 del; 30 mod 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests Reviewed-by: kvn, thartmann, epeter, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/15011 From fjiang at openjdk.org Tue Aug 29 02:46:18 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 29 Aug 2023 02:46:18 GMT Subject: RFR: 8315070: RISC-V: Clean up platform dependent inline headers In-Reply-To: References: <6KZojI_U1KT3lQq8wJ0XZuNDC69ArpTWOr_LmIXAV80=.c993bbb2-bf80-4e8f-9d77-f7de952a51d5@github.com> Message-ID: On Sun, 27 Aug 2023 03:52:46 GMT, Fei Yang wrote: >> Hi team, please review this small clean-up changes. >> Inspired by [JDK-8267464](https://bugs.openjdk.org/browse/JDK-8267464), riscv port still has one place that includes platform-dependent inline header `assembler_riscv.inline.hpp`, it could be replaced with platform-independent header `asm/assembler.inline.hpp`. >> >> Testing: >> - [x] release build on linux-riscv64 >> - [x] tier1 on linux-riscv64 with release build > > Looks like there is a typo in the PR title. It should be: "8315070: RISC-V: Clean up platform dependent inline headers". @RealFYang @robehn -- Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15437#issuecomment-1696685708 From fjiang at openjdk.org Tue Aug 29 02:46:19 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 29 Aug 2023 02:46:19 GMT Subject: Integrated: 8315070: RISC-V: Clean up platform dependent inline headers In-Reply-To: <6KZojI_U1KT3lQq8wJ0XZuNDC69ArpTWOr_LmIXAV80=.c993bbb2-bf80-4e8f-9d77-f7de952a51d5@github.com> References: <6KZojI_U1KT3lQq8wJ0XZuNDC69ArpTWOr_LmIXAV80=.c993bbb2-bf80-4e8f-9d77-f7de952a51d5@github.com> Message-ID: <9nNE2QZSecjg_c1uH_XjAnhyWtGt26PB5ya6BlJ4iG0=.e6d994a8-527e-47e1-a129-e014ad908e3d@github.com> On Sat, 26 Aug 2023 11:55:57 GMT, Feilong Jiang wrote: > Hi team, please review this small clean-up changes. > Inspired by [JDK-8267464](https://bugs.openjdk.org/browse/JDK-8267464), riscv port still has one place that includes platform-dependent inline header `assembler_riscv.inline.hpp`, it could be replaced with platform-independent header `asm/assembler.inline.hpp`. > > Testing: > - [x] release build on linux-riscv64 > - [x] tier1 on linux-riscv64 with release build This pull request has now been integrated. Changeset: 3dc266c5 Author: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/3dc266c58bf92b8f072ad5bcc3ac6962c06c35a9 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod 8315070: RISC-V: Clean up platform dependent inline headers Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/15437 From fjiang at openjdk.org Tue Aug 29 02:56:23 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 29 Aug 2023 02:56:23 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v7] In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 14:29:47 GMT, Ilya Gavrilin wrote: >> Please review this changes into risc-v double rounding intrinsic. >> >> On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). >> >> Also, we should provide some rounding mode to fcvt.x.x instruction. >> >> Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: >> >>> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). >> >> For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. >> For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. >> >> For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). >> As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). >> >> We have observed significant improvement on hifive and thead boards. >> >> testing: tier1, tier2 and hotspot:tier3 on hifive >> >> Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): >> >> Without intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms >> >> With intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Fix intrinsic round_node parameter Changes requested by fjiang (Committer). src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1381: > 1379: // handling is needed by NaN, +/-Infinity, +/-0. > 1380: void C2_MacroAssembler::round_double_mode(FloatRegister dst, FloatRegister src, int round_mode, Register tmp1, Register tmp2, Register tmp3) > 1381: { Suggestion: void C2_MacroAssembler::round_double_mode(FloatRegister dst, FloatRegister src, int round_mode, Register tmp1, Register tmp2, Register tmp3) { ------------- PR Review: https://git.openjdk.org/jdk/pull/14991#pullrequestreview-1599388699 PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1308136130 From duke at openjdk.org Tue Aug 29 07:58:48 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Tue, 29 Aug 2023 07:58:48 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v8] In-Reply-To: References: Message-ID: > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp Co-authored-by: Feilong Jiang ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14991/files - new: https://git.openjdk.org/jdk/pull/14991/files/82b5e593..6216e38f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14991/head:pull/14991 PR: https://git.openjdk.org/jdk/pull/14991 From roland at openjdk.org Tue Aug 29 08:06:50 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 29 Aug 2023 08:06:50 GMT Subject: RFR: 8314024: SIGSEGV in PhaseIdealLoop::build_loop_late_post_work due to bad immediate dominator info [v2] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 09:03:39 GMT, Christian Hagedorn wrote: > Looks good but I'm wondering if we could also bail out in Range Check Elimination instead, if we find that `get_ctrl()` of one of the involved data nodes does not dominate the pre loop exit test. What do you think? We could, but it seems unfortunate to bail out of a major optimization when it's fairly straightforward to avoid it. I added asserts in RCE to catch that issue earlier instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15399#issuecomment-1696959086 From roland at openjdk.org Tue Aug 29 08:06:50 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 29 Aug 2023 08:06:50 GMT Subject: RFR: 8314024: SIGSEGV in PhaseIdealLoop::build_loop_late_post_work due to bad immediate dominator info [v2] In-Reply-To: References: Message-ID: <84wtKPDZHpZwUVE6zK9XGJZ4nqqgynshnxaOuPdu7Js=.3394d670-3950-4be8-9298-36d7d784966d@github.com> > A node is sunk from the pre loop into the main loop. That node, in the > main loop, feeds into a test. When the node is sunk it is pinned > between the main and pre loop. The test it feeds into is then > eliminated by range check elimination: the sunk node becomes input to > an expression that computes the new bound of the pre loop. The > resulting graph is broken because the sunk node is pinned below the > pre loop but used by the exit test of the pre loop. > > The fix I propose is in `PhaseIdealLoop::try_sink_out_of_loop()`, to > skip nodes in pre loops that have a use in the companion main loop. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15399/files - new: https://git.openjdk.org/jdk/pull/15399/files/7d0ad243..d6e2c8b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15399&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15399&range=00-01 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15399.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15399/head:pull/15399 PR: https://git.openjdk.org/jdk/pull/15399 From duke at openjdk.org Tue Aug 29 08:09:50 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Tue, 29 Aug 2023 08:09:50 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v9] In-Reply-To: References: Message-ID: > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Fix typo in c2_MacroAssembler_riscv.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14991/files - new: https://git.openjdk.org/jdk/pull/14991/files/6216e38f..16900449 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14991/head:pull/14991 PR: https://git.openjdk.org/jdk/pull/14991 From duke at openjdk.org Tue Aug 29 08:13:24 2023 From: duke at openjdk.org (emmyyin) Date: Tue, 29 Aug 2023 08:13:24 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v14] In-Reply-To: References: Message-ID: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with five additional commits since the last revision: - Update src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java Co-authored-by: Christian Hagedorn - Update src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java Co-authored-by: Christian Hagedorn - Update src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java Co-authored-by: Christian Hagedorn - Update src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java Co-authored-by: Christian Hagedorn - Update src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalStableLayoutManager.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/3418f070..0f17ea42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=12-13 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From duke at openjdk.org Tue Aug 29 08:19:01 2023 From: duke at openjdk.org (emmyyin) Date: Tue, 29 Aug 2023 08:19:01 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v15] In-Reply-To: References: Message-ID: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... emmyyin has updated the pull request incrementally with one additional commit since the last revision: update copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14349/files - new: https://git.openjdk.org/jdk/pull/14349/files/0f17ea42..2be0ff27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14349&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14349/head:pull/14349 PR: https://git.openjdk.org/jdk/pull/14349 From duke at openjdk.org Tue Aug 29 08:28:42 2023 From: duke at openjdk.org (Ilya Gavrilin) Date: Tue, 29 Aug 2023 08:28:42 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v10] In-Reply-To: References: Message-ID: > Please review this changes into risc-v double rounding intrinsic. > > On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). > > Also, we should provide some rounding mode to fcvt.x.x instruction. > > Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: > >> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). > > For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. > For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. > > For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). > As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). > > We have observed significant improvement on hifive and thead boards. > > testing: tier1, tier2 and hotspot:tier3 on hifive > > Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): > > Without intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms > > With intrinsic: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms > FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms > FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Fix typo in c2_MacroAssembler_riscv.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14991/files - new: https://git.openjdk.org/jdk/pull/14991/files/16900449..09ad14aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14991&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14991/head:pull/14991 PR: https://git.openjdk.org/jdk/pull/14991 From roland at openjdk.org Tue Aug 29 08:28:50 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 29 Aug 2023 08:28:50 GMT Subject: RFR: 8314024: SIGSEGV in PhaseIdealLoop::build_loop_late_post_work due to bad immediate dominator info [v3] In-Reply-To: References: Message-ID: <08hhWxI_fkzFY9PRiYwgCW5j96FGkVJmTH0bk8v6yaQ=.efb69989-ea24-4a3e-8240-3d569fc7f131@github.com> > A node is sunk from the pre loop into the main loop. That node, in the > main loop, feeds into a test. When the node is sunk it is pinned > between the main and pre loop. The test it feeds into is then > eliminated by range check elimination: the sunk node becomes input to > an expression that computes the new bound of the pre loop. The > resulting graph is broken because the sunk node is pinned below the > pre loop but used by the exit test of the pre loop. > > The fix I propose is in `PhaseIdealLoop::try_sink_out_of_loop()`, to > skip nodes in pre loops that have a use in the companion main loop. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: test fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15399/files - new: https://git.openjdk.org/jdk/pull/15399/files/d6e2c8b7..412f899f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15399&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15399&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15399.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15399/head:pull/15399 PR: https://git.openjdk.org/jdk/pull/15399 From chagedorn at openjdk.org Tue Aug 29 08:39:39 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Aug 2023 08:39:39 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination [v2] In-Reply-To: References: Message-ID: > This is the last clean-up PR before the complete fix for Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > This patch includes: > - Removal of `ConI`->`Opaque1`->`Conv2B` input nodes for `ParsePredicateNodes` with the following additional changes: > - Adjusting `ParsePredicateNode` to block unwanted optimizations (added empty `ParsePredicateNode::Ideal()`). > - Changing `Compile::_parse_predicate_opaqs` to not store `Opaque1Nodes` to keep track of Parse Predicates but instead storing `ParsePredicateNodes` directly. Renamed to `Compile::_parse_predicates` and adjusted related methods. > - Removed asserts matching `Opaque1` -> `Conv2B` shape. > - Cleaning up `eliminate_useless_predicates()`: > - Adjust code to find useful/useless Parse Predicates with the new `Compile::_parse_predicates` list with `ParsePredicateNodes` instead of `Opaque1Nodes`. > - Changing `ParsePredicateNode` to carry a `_useless` state which simplifies the elimination of useless predicates with `eliminate_useless_predicates()` and during IGVN (added `ParsePredicateNode::Value()` for that which also removes the predicate once we are in post loop opts IGVN). > - Some refactoring/clean-ups of involved code. > > Testing: tier1-7 + some fuzzer testing > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: remove hash_delete() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15449/files - new: https://git.openjdk.org/jdk/pull/15449/files/d50356e1..763a6b4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15449&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15449&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15449/head:pull/15449 PR: https://git.openjdk.org/jdk/pull/15449 From chagedorn at openjdk.org Tue Aug 29 08:39:39 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Aug 2023 08:39:39 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination [v2] In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 14:55:25 GMT, Roland Westrelin wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> remove hash_delete() > > src/hotspot/share/opto/loopPredicate.cpp line 314: > >> 312: assert(new_predicate_proj->is_IfTrue(), "the success projection of a Parse Predicate is a true projection"); >> 313: ParsePredicateNode* parse_predicate = new_predicate_proj->in(0)->as_ParsePredicate(); >> 314: _igvn.hash_delete(parse_predicate); > > That looks strange. Wasn't the reason for the `hash_delete` in the previous version of the code that the `iff` was then modified. Is it still needed? You're right. That `hash_delete()` is not needed anymore. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15449#discussion_r1308358506 From chagedorn at openjdk.org Tue Aug 29 08:45:09 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Aug 2023 08:45:09 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination [v2] In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 15:18:03 GMT, Roland Westrelin wrote: > Opaque1 nodes fold away after loop opts which guarantees the parse predicate are removed too after loop opts. In the new code, without the Opaque1 nodes, what causes the parse predicate to be removed after loop opts? When creating a new `ParsePredicateNode`, we are registering it for post loop opts IGVN: https://github.com/openjdk/jdk/blob/d50356e14877b4d4376fc18ecdec29f7d98d77bd/src/hotspot/share/opto/ifnode.cpp#L1978-L1984 During post loop opts IGVN (or earlier if `_useless` is true), the `ParsePredicateNode` is folded when calling `ParsePredicateNode::Value()`: https://github.com/openjdk/jdk/blob/d50356e14877b4d4376fc18ecdec29f7d98d77bd/src/hotspot/share/opto/ifnode.cpp#L2004-L2011 I don't think that _useful_ `ParsePredicateNodes` should ever be cloned. If a loop is removed, then either major progress is true and `eliminate_useless_predicates()` will mark them useless (i.e. set `_useless` to true) in the next round of loop opts or if major progress was false then `Compile::mark_parse_predicate_nodes_useless()` have already marked them useless. In both cases, they will be removed by the next round of IGVN. Otherwise, I don't think that useful `ParsePredicateNodes` should ever be split (i.e. being part of a loop body to be cloned) or cloned otherwise (i.e. by split-if etc.). When cloning them to unswitched loops, I create them newly by calling the constructor, so they end up on the post loop opts IGVN list: https://github.com/openjdk/jdk/blob/d50356e14877b4d4376fc18ecdec29f7d98d77bd/src/hotspot/share/opto/loopPredicate.cpp#L310-L311 However, there are some cases where useless `ParsePredicateNodes` are cloned. For example, parsing could have added them to Ifs inside a loop. But then `eliminate_useless_predicates()` will mark them useless. When splitting the loop and cloning these `ParsePredicateNodes`, they will still be marked useless and they are all cleaned up during the next round of IGVN. So, I think registering `ParsePredicateNodes` for post loop opts once in the constructor should be enough. Nevertheless, I still register them in the `Compile::_parse_predicates` list in `Node:clone()` to not miss this: https://github.com/openjdk/jdk/blob/d50356e14877b4d4376fc18ecdec29f7d98d77bd/src/hotspot/share/opto/node.cpp#L511-L513 After post loop opts IGVN, I assert that we got rid of all the `ParsePredicateNodes`: https://github.com/openjdk/jdk/blob/d50356e14877b4d4376fc18ecdec29f7d98d77bd/src/hotspot/share/opto/compile.cpp#L1856-L1858 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15449#issuecomment-1697016855 From chagedorn at openjdk.org Tue Aug 29 08:53:10 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Aug 2023 08:53:10 GMT Subject: RFR: 8314024: SIGSEGV in PhaseIdealLoop::build_loop_late_post_work due to bad immediate dominator info [v3] In-Reply-To: <08hhWxI_fkzFY9PRiYwgCW5j96FGkVJmTH0bk8v6yaQ=.efb69989-ea24-4a3e-8240-3d569fc7f131@github.com> References: <08hhWxI_fkzFY9PRiYwgCW5j96FGkVJmTH0bk8v6yaQ=.efb69989-ea24-4a3e-8240-3d569fc7f131@github.com> Message-ID: On Tue, 29 Aug 2023 08:28:50 GMT, Roland Westrelin wrote: >> A node is sunk from the pre loop into the main loop. That node, in the >> main loop, feeds into a test. When the node is sunk it is pinned >> between the main and pre loop. The test it feeds into is then >> eliminated by range check elimination: the sunk node becomes input to >> an expression that computes the new bound of the pre loop. The >> resulting graph is broken because the sunk node is pinned below the >> pre loop but used by the exit test of the pre loop. >> >> The fix I propose is in `PhaseIdealLoop::try_sink_out_of_loop()`, to >> skip nodes in pre loops that have a use in the companion main loop. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test fix > > Looks good but I'm wondering if we could also bail out in Range Check Elimination instead, if we find that `get_ctrl()` of one of the involved data nodes does not dominate the pre loop exit test. What do you think? > > We could, but it seems unfortunate to bail out of a major optimization when it's fairly straightforward to avoid it. I added asserts in RCE to catch that issue earlier instead. I agree with that. Thanks for adding additional asserts to catch such cases earlier. Looks good! I'll run some more testing with the new asserts in place. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15399#pullrequestreview-1599848081 From chagedorn at openjdk.org Tue Aug 29 09:14:11 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Aug 2023 09:14:11 GMT Subject: RFR: 8314997: Missing optimization opportunities due to missing try_clean_mem_phi() calls In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 12:38:43 GMT, Christian Hagedorn wrote: > While working on a Valhalla bug, I've noticed that we sometimes miss `RegionNode::try_clean_mem_phi()` calls to remove a useless diamond > > If > True False > Region > > with only a single memory phi. This blocks further optimizations like converting a loop into a counted one. The code in Valhalla looks slightly different but the problem is also reproducible in mainline. > > **Problem** > > In the test case, a region is transformed in IGVN such that it merges a diamond without any dependencies on both paths. The region has two phis. One of them is a memory phi which could be transformed by `RegionNode::try_clean_mem_phi()`. But when processing the region with its two phis in IGVN, we do not optimize the memory phi away because `has_unique_phi()` is false and we bail out: > https://github.com/openjdk/jdk/blob/725ec0ce1b463b21cd4c5287cf4ccbee53ec7349/src/hotspot/share/opto/cfgnode.cpp#L450-L471 > > Later in IGVN, the second phi dies and we only have the single memory phi left. But the region will not be added to the IGVN worklist again to re-apply `try_clean_mem_phi()`. We therefore miss the removal of the diamond and we fail to apply further optimizations. In the test case, we fail to convert the loop into a counted loop. > > **Proposed Fix** > > The fix I propose is to try to apply `try_clean_mem_phi()` whenever a region is merging a diamond with the assumption that the transformation of a memory phi does not hurt when being applied without being able to remove the region with the diamond (because there are other phis left that cannot be removed). Another option would be to re-add the region to the IGVN worklist when the second last phi dies. But the first approach seems simpler and less invasive. > > I've also applied some clean-ups and added an IR test. > > Thanks, > Christian Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15445#issuecomment-1697060570 From chagedorn at openjdk.org Tue Aug 29 09:18:21 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Aug 2023 09:18:21 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v15] In-Reply-To: References: Message-ID: On Tue, 29 Aug 2023 08:19:01 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with one additional commit since the last revision: > > update copyright Thanks for the updates, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14349#pullrequestreview-1599895813 From tholenstein at openjdk.org Tue Aug 29 09:46:20 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 29 Aug 2023 09:46:20 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v15] In-Reply-To: References: Message-ID: On Tue, 29 Aug 2023 08:19:01 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with one additional commit since the last revision: > > update copyright Marked as reviewed by tholenstein (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14349#pullrequestreview-1599950224 From tholenstein at openjdk.org Tue Aug 29 09:55:27 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 29 Aug 2023 09:55:27 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm In-Reply-To: References: Message-ID: On Fri, 11 Aug 2023 07:35:55 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > OBS: does not handle self-edges atm, is it something that should be considered? Thank you for your work @emmyyin ! Looks good now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14349#issuecomment-1697122112 From duke at openjdk.org Tue Aug 29 09:55:29 2023 From: duke at openjdk.org (emmyyin) Date: Tue, 29 Aug 2023 09:55:29 GMT Subject: Integrated: 8309463: IGV: Dynamic graph layout algorithm In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 08:33:12 GMT, emmyyin wrote: > ### Purpose > > IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. > > The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". > > The difference between the algorithms is illustrated in the following video: > > > https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 > > > This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). > > > ### Implementation > > The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. > > If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. > > The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. > > Since `HierarchicalStableLayoutManager` tries to preserve the node positions, the layouts might get unreadable after a fe... This pull request has now been integrated. Changeset: 5cc64cc2 Author: Emmy Committer: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/5cc64cc27a58e824a6b0e5a331e30544847f50d8 Stats: 1826 lines in 14 files changed: 1736 ins; 59 del; 31 mod 8309463: IGV: Dynamic graph layout algorithm Reviewed-by: tholenstein, rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14349 From jbhateja at openjdk.org Tue Aug 29 12:55:12 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 29 Aug 2023 12:55:12 GMT Subject: RFR: JDK-8314056 Remove runtime platform check from frem/drem [v3] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 23:17:00 GMT, Scott Gibbons wrote: >> Remove platform check and move code to stubGenerator. This fix increases performance by ~4.5%. >> >> UPDATE: Subsequent commits increase performance gain to ~2x for AVX2, with no significant change to AVX512. >> >> Tested tier1. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > mxcsr fix; address review comments Changes looks good to me, you may also include the results of following benchmark. ./test/micro/org/openjdk/bench/vm/compiler/pea/Blender.java src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 473: > 471: __ ucomisd(xmm0, xmm1); > 472: __ movapd(xmm4, xmm0); > 473: __ jccb(Assembler::aboveEqual, L_117f); This is a bounded label jump, Assembler::jcc should automatically optimize it with short jump encoding, but its not a blocker. ------------- PR Review: https://git.openjdk.org/jdk/pull/15210#pullrequestreview-1600272296 PR Review Comment: https://git.openjdk.org/jdk/pull/15210#discussion_r1308729157 From fjiang at openjdk.org Tue Aug 29 14:43:11 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 29 Aug 2023 14:43:11 GMT Subject: RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v10] In-Reply-To: References: Message-ID: <3PcE05aE3Ti8SkcG6PZmPgq8SZcPHThe6iP2MsiY4Oo=.b27d01fd-cb37-404c-80af-c5cc9cfd03b2@github.com> On Tue, 29 Aug 2023 08:28:42 GMT, Ilya Gavrilin wrote: >> Please review this changes into risc-v double rounding intrinsic. >> >> On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l). >> >> Also, we should provide some rounding mode to fcvt.x.x instruction. >> >> Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: >> >>> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475). >> >> For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. >> For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion. >> >> For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer). >> As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0). >> >> We have observed significant improvement on hifive and thead boards. >> >> testing: tier1, tier2 and hotspot:tier3 on hifive >> >> Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint): >> >> Without intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 39.297 ? 0.037 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 39.398 ? 0.018 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 36.388 ? 0.844 ops/ms >> >> With intrinsic: >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.testceil 1024 thrpt 25 80.560 ? 0.053 ops/ms >> FpRoundingBenchmark.testfloor 1024 thrpt 25 80.541 ? 0.081 ops/ms >> FpRoundingBenchmark.testrint 1024 thrpt 25 80.603 ? 0.071 ops/ms > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in c2_MacroAssembler_riscv.cpp Marked as reviewed by fjiang (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14991#pullrequestreview-1600611526 From chagedorn at openjdk.org Tue Aug 29 15:42:10 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 29 Aug 2023 15:42:10 GMT Subject: RFR: 8314024: SIGSEGV in PhaseIdealLoop::build_loop_late_post_work due to bad immediate dominator info [v3] In-Reply-To: <08hhWxI_fkzFY9PRiYwgCW5j96FGkVJmTH0bk8v6yaQ=.efb69989-ea24-4a3e-8240-3d569fc7f131@github.com> References: <08hhWxI_fkzFY9PRiYwgCW5j96FGkVJmTH0bk8v6yaQ=.efb69989-ea24-4a3e-8240-3d569fc7f131@github.com> Message-ID: On Tue, 29 Aug 2023 08:28:50 GMT, Roland Westrelin wrote: >> A node is sunk from the pre loop into the main loop. That node, in the >> main loop, feeds into a test. When the node is sunk it is pinned >> between the main and pre loop. The test it feeds into is then >> eliminated by range check elimination: the sunk node becomes input to >> an expression that computes the new bound of the pre loop. The >> resulting graph is broken because the sunk node is pinned below the >> pre loop but used by the exit test of the pre loop. >> >> The fix I propose is in `PhaseIdealLoop::try_sink_out_of_loop()`, to >> skip nodes in pre loops that have a use in the companion main loop. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test fix Testing looked good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15399#issuecomment-1697696792 From jbhateja at openjdk.org Tue Aug 29 15:52:23 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 29 Aug 2023 15:52:23 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Clean up parameters passed to arrayPartition; update the check to load library > Hi @vamsi-parasa , Please find below the perf data collected over ?Linux? with following JMH options. java -jar target/benchmarks.jar -p builder=RANDOM -f 1 -wi 1 -i 10 -w 30 -jvmArgs "-XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_arraySort,_arrayPartition" ArraysSort.Long.testSort Baseline numbers are with stock JDK. ![image](https://github.com/openjdk/jdk/assets/59989778/d3bf2591-38bb-4924-b77d-6889c5dbc3c0) Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1697715820 From kvn at openjdk.org Tue Aug 29 16:08:28 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Aug 2023 16:08:28 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: <45lBRB5Jf3jkviwJnCUuiub6BDr-qqwNal1Bbr982ik=.6ce497c0-d2bb-4d2c-b700-28dc4842bf7c@github.com> On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Clean up parameters passed to arrayPartition; update the check to load library My testing passed. But I am not sure correctness of code is fully tested. For now we have only JMH benchmark for this new code. Do we have JDK test which can check correctness of this code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1697743981 From kvn at openjdk.org Tue Aug 29 16:08:29 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Aug 2023 16:08:29 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: Message-ID: <2s4hZgw7KRQ5AoYzTZ3-8BS5V6JAslioJXeil_jvfQA=.fb1a1a31-8726-494b-b77c-a66565a963da@github.com> On Mon, 28 Aug 2023 23:28:44 GMT, Erik Joelsson wrote: >> Hi Erik, >> >> The reason this PR is focused on Linux is because the AVX512 sort and partitioning routines are based on Intel?s x86-simd-library (https://github.com/intel/x86-simd-sort) which was originally developed with GCC as the target compiler. Thus, this PR has restricted itself to Linux as the code was tested using GCC/Linux platforms. >> Additionally, the x86_64 library is compiled for AVX512 using file specific compilation pragmas (`#pragma GCC target("avx512dq", "avx512f")`). This feature is absent for Windows/MSVC++ compiler.? >> >> Thanks, >> Vamsi > > If it's tied to GCC as well, then we should probably include that in the condition here unless it's also expected to work with Clang. (`TOOLCHAIN_TYPE` = `gcc`) > The reason this PR is focused on Linux is because the AVX512 sort and partitioning routines are based on Intel?s x86-simd-library (https://github.com/intel/x86-simd-sort) which was originally developed with GCC as the target compiler. Thus, this PR has restricted itself to Linux as the code was tested using GCC/Linux platforms. Additionally, the x86_64 library is compiled for AVX512 using file specific compilation pragmas (`#pragma GCC target("avx512dq", "avx512f")`). This feature is absent for Windows/MSVC++ compiler.? That is why I am questioning this approach to have additional separate C++ code library - too much dependencies on other tools. As I suggested before try to disassemble this library and use assembler code in VM new stubs. You can create specialized stubGenerator_x86_64_array_sort.cpp file for it. Then you don't need to depend on C++ compiler or OS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1309054538 From kvn at openjdk.org Tue Aug 29 16:35:10 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Aug 2023 16:35:10 GMT Subject: RFR: 8314024: SIGSEGV in PhaseIdealLoop::build_loop_late_post_work due to bad immediate dominator info [v3] In-Reply-To: <08hhWxI_fkzFY9PRiYwgCW5j96FGkVJmTH0bk8v6yaQ=.efb69989-ea24-4a3e-8240-3d569fc7f131@github.com> References: <08hhWxI_fkzFY9PRiYwgCW5j96FGkVJmTH0bk8v6yaQ=.efb69989-ea24-4a3e-8240-3d569fc7f131@github.com> Message-ID: On Tue, 29 Aug 2023 08:28:50 GMT, Roland Westrelin wrote: >> A node is sunk from the pre loop into the main loop. That node, in the >> main loop, feeds into a test. When the node is sunk it is pinned >> between the main and pre loop. The test it feeds into is then >> eliminated by range check elimination: the sunk node becomes input to >> an expression that computes the new bound of the pre loop. The >> resulting graph is broken because the sunk node is pinned below the >> pre loop but used by the exit test of the pre loop. >> >> The fix I propose is in `PhaseIdealLoop::try_sink_out_of_loop()`, to >> skip nodes in pre loops that have a use in the companion main loop. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test fix Update looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15399#pullrequestreview-1600849952 From kvn at openjdk.org Tue Aug 29 16:54:11 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Aug 2023 16:54:11 GMT Subject: RFR: 8314997: Missing optimization opportunities due to missing try_clean_mem_phi() calls In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 12:38:43 GMT, Christian Hagedorn wrote: > While working on a Valhalla bug, I've noticed that we sometimes miss `RegionNode::try_clean_mem_phi()` calls to remove a useless diamond > > If > True False > Region > > with only a single memory phi. This blocks further optimizations like converting a loop into a counted one. The code in Valhalla looks slightly different but the problem is also reproducible in mainline. > > **Problem** > > In the test case, a region is transformed in IGVN such that it merges a diamond without any dependencies on both paths. The region has two phis. One of them is a memory phi which could be transformed by `RegionNode::try_clean_mem_phi()`. But when processing the region with its two phis in IGVN, we do not optimize the memory phi away because `has_unique_phi()` is false and we bail out: > https://github.com/openjdk/jdk/blob/725ec0ce1b463b21cd4c5287cf4ccbee53ec7349/src/hotspot/share/opto/cfgnode.cpp#L450-L471 > > Later in IGVN, the second phi dies and we only have the single memory phi left. But the region will not be added to the IGVN worklist again to re-apply `try_clean_mem_phi()`. We therefore miss the removal of the diamond and we fail to apply further optimizations. In the test case, we fail to convert the loop into a counted loop. > > **Proposed Fix** > > The fix I propose is to try to apply `try_clean_mem_phi()` whenever a region is merging a diamond with the assumption that the transformation of a memory phi does not hurt when being applied without being able to remove the region with the diamond (because there are other phis left that cannot be removed). Another option would be to re-add the region to the IGVN worklist when the second last phi dies. But the first approach seems simpler and less invasive. > > I've also applied some clean-ups and added an IR test. > > Thanks, > Christian Looks good. Only few small comments. src/hotspot/share/opto/cfgnode.cpp line 503: > 501: if (left_path == nullptr || right_path == nullptr) { > 502: return false; > 503: } So the TOP input will fail next check. May be add comment about that. src/hotspot/share/opto/cfgnode.cpp line 1389: > 1387: return false; > 1388: } > 1389: assert(is_diamond_phi(), "sanity"); Add explicit check `> 0` ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15445#pullrequestreview-1600868239 PR Review Comment: https://git.openjdk.org/jdk/pull/15445#discussion_r1309100004 PR Review Comment: https://git.openjdk.org/jdk/pull/15445#discussion_r1309108117 From duke at openjdk.org Tue Aug 29 17:03:26 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 29 Aug 2023 17:03:26 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: <3MsIs3kNxvxNOftvjnsisc7eWu6CEb-BbBsHJnj9SH4=.64c640da-61c4-49b2-9e19-02de020d2976@github.com> On Mon, 28 Aug 2023 23:35:56 GMT, Erik Joelsson wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up parameters passed to arrayPartition; update the check to load library > > make/modules/java.base/Lib.gmk line 240: > >> 238: >> 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64)+$(INCLUDE_COMPILER2), true+true+true) >> 240: $(eval $(call SetupJdkLibrary, BUILD_LIB_X86_64, \ > > As this is a C++ lib, consider using g++ for linking by setting: > > TOOLCHAIN := TOOLCHAIN_LINK_CXX Thanks Erik. Will update Lib.gmk to use g++ for linking. > make/modules/java.base/Lib.gmk line 247: > >> 245: LDFLAGS := $(LDFLAGS_JDKLIB) \ >> 246: $(call SET_SHARED_LIBRARY_ORIGIN), \ >> 247: LDFLAGS_linux := -Wl$(COMMA)--no-as-needed, \ > > This is set by default since JDK-8314554. Thanks Erik. Will update Lib.gmk accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1309118874 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1309118373 From duke at openjdk.org Tue Aug 29 17:03:22 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 29 Aug 2023 17:03:22 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <45lBRB5Jf3jkviwJnCUuiub6BDr-qqwNal1Bbr982ik=.6ce497c0-d2bb-4d2c-b700-28dc4842bf7c@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> <45lBRB5Jf3jkviwJnCUuiub6BDr-qqwNal1Bbr982ik=.6ce497c0-d2bb-4d2c-b700-28dc4842bf7c@github.com> Message-ID: On Tue, 29 Aug 2023 16:04:58 GMT, Vladimir Kozlov wrote: > My testing passed. But I am not sure correctness of code is fully tested. For now we have only JMH benchmark for this new code. Do we have JDK test which can check correctness of this code? Hi Vladimir, will add the JDK tests to check for correctness and let you know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1697820155 From duke at openjdk.org Tue Aug 29 17:36:22 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 29 Aug 2023 17:36:22 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: <2s4hZgw7KRQ5AoYzTZ3-8BS5V6JAslioJXeil_jvfQA=.fb1a1a31-8726-494b-b77c-a66565a963da@github.com> References: <2s4hZgw7KRQ5AoYzTZ3-8BS5V6JAslioJXeil_jvfQA=.fb1a1a31-8726-494b-b77c-a66565a963da@github.com> Message-ID: On Tue, 29 Aug 2023 16:02:57 GMT, Vladimir Kozlov wrote: >> If it's tied to GCC as well, then we should probably include that in the condition here unless it's also expected to work with Clang. (`TOOLCHAIN_TYPE` = `gcc`) > >> The reason this PR is focused on Linux is because the AVX512 sort and partitioning routines are based on Intel?s x86-simd-library (https://github.com/intel/x86-simd-sort) which was originally developed with GCC as the target compiler. Thus, this PR has restricted itself to Linux as the code was tested using GCC/Linux platforms. Additionally, the x86_64 library is compiled for AVX512 using file specific compilation pragmas (`#pragma GCC target("avx512dq", "avx512f")`). This feature is absent for Windows/MSVC++ compiler.? > > That is why I am questioning this approach to have additional separate C++ code library - too much dependencies on other tools. > > As I suggested before try to disassemble this library and use assembler code in VM new stubs. You can create specialized stubGenerator_x86_64_array_sort.cpp file for it. Then you don't need to depend on C++ compiler or OS. The shared library approach is being followed currently as an initial implementation to demonstrate the value of AVX512 sorting. This will be followed up in future with support for Windows as well. If it is ok with you, the shared library approach could be pursued for now to be later replaced with specialized assembly stubs (which are agnostic to OS and compiler) when AVX512 sort is enabled for Windows. Please let us know. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1309151742 From sgibbons at openjdk.org Tue Aug 29 17:53:12 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 29 Aug 2023 17:53:12 GMT Subject: RFR: JDK-8314056 Remove runtime platform check from frem/drem [v3] In-Reply-To: References: Message-ID: <2pDs2Z-eS0Ut2gea76FkSsNFHH4JF4GetfQLf9QEyIg=.4f05ff23-3bfe-4999-a2d0-d66ab940706b@github.com> On Wed, 23 Aug 2023 23:17:00 GMT, Scott Gibbons wrote: >> Remove platform check and move code to stubGenerator. This fix increases performance by ~4.5%. >> >> UPDATE: Subsequent commits increase performance gain to ~2x for AVX2, with no significant change to AVX512. >> >> Tested tier1. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > mxcsr fix; address review comments @jddarcy FYI. If you'd like to review, it would be appreciated. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15210#issuecomment-1697886881 From rriggs at openjdk.org Tue Aug 29 18:48:25 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Tue, 29 Aug 2023 18:48:25 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: <3BYkSKH633EJPmI86EavRQeaLmvursnu0VsHWIRvceU=.25709139-26db-4e5d-b07d-cf693a9bc62d@github.com> On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Clean up parameters passed to arrayPartition; update the check to load library @mcimadamore Does Panama have anything to offer over hard coded stubs? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1697957043 From duke at openjdk.org Tue Aug 29 19:35:22 2023 From: duke at openjdk.org (iaroslavski) Date: Tue, 29 Aug 2023 19:35:22 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> <45lBRB5Jf3jkviwJnCUuiub6BDr-qqwNal1Bbr982ik=.6ce497c0-d2bb-4d2c-b700-28dc4842bf7c@github.com> Message-ID: On Tue, 29 Aug 2023 16:57:11 GMT, Srinivas Vamsi Parasa wrote: > > My testing passed. But I am not sure correctness of code is fully tested. For now we have only JMH benchmark for this new code. Do we have JDK test which can check correctness of this code? > > Hi Vladimir, will add the JDK tests to check for correctness and let you know Hi, We already have correctness tests. See test/jdk/java/util/Arrays/Sorting.java The latest version you can find in PR https://github.com/openjdk/jdk/pull/13568/files ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698014609 From alanb at openjdk.org Tue Aug 29 19:35:21 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 29 Aug 2023 19:35:21 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: <_yPxZATuaIq3XrDJWKxHuK674VN0QsuSph_5qLlCmKI=.06e3ca81-bdc9-4bd7-9f66-b5a28a71af1b@github.com> On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Clean up parameters passed to arrayPartition; update the check to load library The changes to DualPivotQuicksort will need detailed review to ensure that this is understandable and maintainable, there is a lot here to study. The addition of libx86_64 and having the stub generation call out to this library also needs discussion to make sure there is an agreement on how this would be integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698011712 From duke at openjdk.org Tue Aug 29 19:35:23 2023 From: duke at openjdk.org (iaroslavski) Date: Tue, 29 Aug 2023 19:35:23 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Clean up parameters passed to arrayPartition; update the check to load library Hi, We already have correctness tests. See test/jdk/java/util/Arrays/Sorting.java The latest version you can find in PR https://github.com/openjdk/jdk/pull/13568/files ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698016905 From sviswanathan at openjdk.org Tue Aug 29 20:28:21 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 29 Aug 2023 20:28:21 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <_yPxZATuaIq3XrDJWKxHuK674VN0QsuSph_5qLlCmKI=.06e3ca81-bdc9-4bd7-9f66-b5a28a71af1b@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> <_yPxZATuaIq3XrDJWKxHuK674VN0QsuSph_5qLlCmKI=.06e3ca81-bdc9-4bd7-9f66-b5a28a71af1b@github.com> Message-ID: On Tue, 29 Aug 2023 19:28:17 GMT, Alan Bateman wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up parameters passed to arrayPartition; update the check to load library > > The changes to DualPivotQuicksort will need detailed review to ensure that this is understandable and maintainable, there is a lot here to study. > > The addition of libx86_64 and having the stub generation call out to this library also needs discussion to make sure there is an agreement on how this would be integrated. @AlanBateman If it helps, the changes made by @vamsi-parasa to DualPivotQuickSort.java don't change the logic as written in Java. They only carve out the functionality into separate Java methods retaining the meaning exactly as before. These Java methods are then optimized through a stub. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698079501 From kvn at openjdk.org Tue Aug 29 20:28:23 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Aug 2023 20:28:23 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: <2s4hZgw7KRQ5AoYzTZ3-8BS5V6JAslioJXeil_jvfQA=.fb1a1a31-8726-494b-b77c-a66565a963da@github.com> Message-ID: On Tue, 29 Aug 2023 17:32:26 GMT, Srinivas Vamsi Parasa wrote: >>> The reason this PR is focused on Linux is because the AVX512 sort and partitioning routines are based on Intel?s x86-simd-library (https://github.com/intel/x86-simd-sort) which was originally developed with GCC as the target compiler. Thus, this PR has restricted itself to Linux as the code was tested using GCC/Linux platforms. Additionally, the x86_64 library is compiled for AVX512 using file specific compilation pragmas (`#pragma GCC target("avx512dq", "avx512f")`). This feature is absent for Windows/MSVC++ compiler.? >> >> That is why I am questioning this approach to have additional separate C++ code library - too much dependencies on other tools. >> >> As I suggested before try to disassemble this library and use assembler code in VM new stubs. You can create specialized stubGenerator_x86_64_array_sort.cpp file for it. Then you don't need to depend on C++ compiler or OS. > > The shared library approach is being followed currently as an initial implementation to demonstrate the value of AVX512 sorting. This will be followed up in future with support for Windows as well. > If it is ok with you, the shared library approach could be pursued for now to be later replaced with specialized assembly stubs (which are agnostic to OS and compiler) when AVX512 sort is enabled for Windows. Please let us know. I am okay with such incremental approach. Please, file RFE to replace library with stubs in a future (it could be still separate library but with assembler code). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1309316767 From kvn at openjdk.org Tue Aug 29 20:39:21 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Aug 2023 20:39:21 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Tue, 29 Aug 2023 19:32:44 GMT, iaroslavski wrote: > Hi, We already have correctness tests. See test/jdk/java/util/Arrays/Sorting.java > > The latest version you can find in PR https://github.com/openjdk/jdk/pull/13568/files Does test/jdk/java/util/Arrays/Sorting.java trigger usage of this intrinsic without additional flags? @vamsi-parasa can you check? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698092741 From duke at openjdk.org Tue Aug 29 20:45:22 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 29 Aug 2023 20:45:22 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: <4b8E1FzKQ4-Q_gCsL9Fn7jJwz21mEnUJJKiLCBeplrg=.54a06fcb-bcce-4866-b9b3-71cb1052f5e9@github.com> On Tue, 29 Aug 2023 20:36:04 GMT, Vladimir Kozlov wrote: > Hi, We already have correctness tests. See test/jdk/java/util/Arrays/Sorting.java > > The latest version you can find in PR https://github.com/openjdk/jdk/pull/13568/files Hello Vladimir (@iaroslavski), Thank you for providing the link to the correctness tests! Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698101345 From duke at openjdk.org Tue Aug 29 23:32:23 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 29 Aug 2023 23:32:23 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Tue, 29 Aug 2023 20:36:04 GMT, Vladimir Kozlov wrote: > > Hi, We already have correctness tests. See test/jdk/java/util/Arrays/Sorting.java > > The latest version you can find in PR https://github.com/openjdk/jdk/pull/13568/files > > Does test/jdk/java/util/Arrays/Sorting.java trigger usage of this intrinsic without additional flags? @vamsi-parasa can you check? Sure Vladimir (@vnkozlov). Will check if test/jdk/java/util/Arrays/Sorting.java is triggering the intrinsic without additional flags and let you know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698103318 From kvn at openjdk.org Tue Aug 29 23:32:25 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Aug 2023 23:32:25 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Clean up parameters passed to arrayPartition; update the check to load library I looked on my testing log and I see that this test was run on machines which do not have avx512. I am re-running jdk/util tests with -Xcomp flag on avx512 machines. My testing with -Xcomp flag on avx512 machines passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698106526 PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698272873 From duke at openjdk.org Wed Aug 30 00:40:32 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 30 Aug 2023 00:40:32 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Tue, 29 Aug 2023 20:44:32 GMT, Srinivas Vamsi Parasa wrote: > > > Hi, We already have correctness tests. See test/jdk/java/util/Arrays/Sorting.java > > > The latest version you can find in PR https://github.com/openjdk/jdk/pull/13568/files > > > > > > Does test/jdk/java/util/Arrays/Sorting.java trigger usage of this intrinsic without additional flags? @vamsi-parasa can you check? > > Sure Vladimir (@vnkozlov). Will check if test/jdk/java/util/Arrays/Sorting.java is triggering the intrinsic without additional flags and let you know. Hi Vladimir, Just verified that the test/jdk/java/util/Arrays/Sorting.java is triggering the intrinsic without additional flags as shown in the output snapshot below: ![image](https://github.com/openjdk/jdk/assets/23087109/a2d4edb1-9377-4f92-bed2-3e40bc5a7654) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698322922 From jbhateja at openjdk.org Wed Aug 30 01:31:21 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 30 Aug 2023 01:31:21 GMT Subject: RFR: JDK-8314056 Remove runtime platform check from frem/drem [v3] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 23:17:00 GMT, Scott Gibbons wrote: >> Remove platform check and move code to stubGenerator. This fix increases performance by ~4.5%. >> >> UPDATE: Subsequent commits increase performance gain to ~2x for AVX2, with no significant change to AVX512. >> >> Tested tier1. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > mxcsr fix; address review comments Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15210#pullrequestreview-1601725307 From sgibbons at openjdk.org Wed Aug 30 01:31:22 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 30 Aug 2023 01:31:22 GMT Subject: Integrated: JDK-8314056 Remove runtime platform check from frem/drem In-Reply-To: References: Message-ID: On Wed, 9 Aug 2023 16:48:36 GMT, Scott Gibbons wrote: > Remove platform check and move code to stubGenerator. This fix increases performance by ~4.5%. > > UPDATE: Subsequent commits increase performance gain to ~2x for AVX2, with no significant change to AVX512. > > Tested tier1. This pull request has now been integrated. Changeset: ce2a7ea4 Author: Scott Gibbons Committer: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/ce2a7ea40a22c652e5f8559c91d5eea197e2d708 Stats: 196 lines in 8 files changed: 67 ins; 86 del; 43 mod 8314056: Remove runtime platform check from frem/drem Reviewed-by: sviswanathan, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/15210 From kvn at openjdk.org Wed Aug 30 02:04:22 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Aug 2023 02:04:22 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Clean up parameters passed to arrayPartition; update the check to load library Good. Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698380569 From chagedorn at openjdk.org Wed Aug 30 06:38:50 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 06:38:50 GMT Subject: RFR: 8314997: Missing optimization opportunities due to missing try_clean_mem_phi() calls [v2] In-Reply-To: References: Message-ID: > While working on a Valhalla bug, I've noticed that we sometimes miss `RegionNode::try_clean_mem_phi()` calls to remove a useless diamond > > If > True False > Region > > with only a single memory phi. This blocks further optimizations like converting a loop into a counted one. The code in Valhalla looks slightly different but the problem is also reproducible in mainline. > > **Problem** > > In the test case, a region is transformed in IGVN such that it merges a diamond without any dependencies on both paths. The region has two phis. One of them is a memory phi which could be transformed by `RegionNode::try_clean_mem_phi()`. But when processing the region with its two phis in IGVN, we do not optimize the memory phi away because `has_unique_phi()` is false and we bail out: > https://github.com/openjdk/jdk/blob/725ec0ce1b463b21cd4c5287cf4ccbee53ec7349/src/hotspot/share/opto/cfgnode.cpp#L450-L471 > > Later in IGVN, the second phi dies and we only have the single memory phi left. But the region will not be added to the IGVN worklist again to re-apply `try_clean_mem_phi()`. We therefore miss the removal of the diamond and we fail to apply further optimizations. In the test case, we fail to convert the loop into a counted loop. > > **Proposed Fix** > > The fix I propose is to try to apply `try_clean_mem_phi()` whenever a region is merging a diamond with the assumption that the transformation of a memory phi does not hurt when being applied without being able to remove the region with the diamond (because there are other phis left that cannot be removed). Another option would be to re-add the region to the IGVN worklist when the second last phi dies. But the first approach seems simpler and less invasive. > > I've also applied some clean-ups and added an IR test. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Vladimir's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15445/files - new: https://git.openjdk.org/jdk/pull/15445/files/8734fefa..1788e3b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15445&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15445&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15445.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15445/head:pull/15445 PR: https://git.openjdk.org/jdk/pull/15445 From chagedorn at openjdk.org Wed Aug 30 06:38:51 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 06:38:51 GMT Subject: RFR: 8314997: Missing optimization opportunities due to missing try_clean_mem_phi() calls In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 12:38:43 GMT, Christian Hagedorn wrote: > While working on a Valhalla bug, I've noticed that we sometimes miss `RegionNode::try_clean_mem_phi()` calls to remove a useless diamond > > If > True False > Region > > with only a single memory phi. This blocks further optimizations like converting a loop into a counted one. The code in Valhalla looks slightly different but the problem is also reproducible in mainline. > > **Problem** > > In the test case, a region is transformed in IGVN such that it merges a diamond without any dependencies on both paths. The region has two phis. One of them is a memory phi which could be transformed by `RegionNode::try_clean_mem_phi()`. But when processing the region with its two phis in IGVN, we do not optimize the memory phi away because `has_unique_phi()` is false and we bail out: > https://github.com/openjdk/jdk/blob/725ec0ce1b463b21cd4c5287cf4ccbee53ec7349/src/hotspot/share/opto/cfgnode.cpp#L450-L471 > > Later in IGVN, the second phi dies and we only have the single memory phi left. But the region will not be added to the IGVN worklist again to re-apply `try_clean_mem_phi()`. We therefore miss the removal of the diamond and we fail to apply further optimizations. In the test case, we fail to convert the loop into a counted loop. > > **Proposed Fix** > > The fix I propose is to try to apply `try_clean_mem_phi()` whenever a region is merging a diamond with the assumption that the transformation of a memory phi does not hurt when being applied without being able to remove the region with the diamond (because there are other phis left that cannot be removed). Another option would be to re-add the region to the IGVN worklist when the second last phi dies. But the first approach seems simpler and less invasive. > > I've also applied some clean-ups and added an IR test. > > Thanks, > Christian Thanks Vladimir for your review! I've pushed an update. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15445#issuecomment-1698576913 From chagedorn at openjdk.org Wed Aug 30 06:38:53 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 06:38:53 GMT Subject: RFR: 8314997: Missing optimization opportunities due to missing try_clean_mem_phi() calls [v2] In-Reply-To: References: Message-ID: On Tue, 29 Aug 2023 16:43:44 GMT, Vladimir Kozlov wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Vladimir's review > > src/hotspot/share/opto/cfgnode.cpp line 503: > >> 501: if (left_path == nullptr || right_path == nullptr) { >> 502: return false; >> 503: } > > So the TOP input will fail next check. May be add comment about that. I've added a comment at the next `if`. > src/hotspot/share/opto/cfgnode.cpp line 1389: > >> 1387: return false; >> 1388: } >> 1389: assert(is_diamond_phi(), "sanity"); > > Add explicit check `> 0` Good point, added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15445#discussion_r1309712341 PR Review Comment: https://git.openjdk.org/jdk/pull/15445#discussion_r1309712015 From rcastanedalo at openjdk.org Wed Aug 30 06:46:35 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Aug 2023 06:46:35 GMT Subject: RFR: 8309463: IGV: Dynamic graph layout algorithm [v15] In-Reply-To: References: Message-ID: On Tue, 29 Aug 2023 08:19:01 GMT, emmyyin wrote: >> ### Purpose >> >> IGV currently uses a static layout algorithm to visualize graphs. However, this is problematic due to the use cases of IGV. Most often, the graphs that are visualized are dynamic, meaning the graphs change over time. A dynamic graph can be thought of as a sequence of graphs where a given graph in the sequence is the state of the dynamic graph at that point in time. Static layout algorithms do not account for the rest of the sequence when visualizing a given graph. On one hand, it makes each layout more readable. But on the other hand, the layout for two consecutive graphs in the sequence can be vastly different even though the difference between the graphs is small. This makes it difficult to identify the changes that has occurred to the graph and can damage the internal understanding of the graph that the viewer has obtained. A dynamic layout algorithm takes the changes into account when visualizing a graph. To enhance IGV, such an algorithm has been implemented in this PR. >> >> The layout drawn by the static layout algorithm is called "sea of nodes", while the layout drawn by the dynamic algorithm is called "stable sea of nodes". >> >> The difference between the algorithms is illustrated in the following video: >> >> >> https://github.com/openjdk/jdk/assets/52547536/35023362-a191-425e-b066-c7474db631f1 >> >> >> This work is the result of my Master's thesis which can be found [here](https://kth.diva-portal.org/smash/get/diva2:1770643/FULLTEXT01.pdf). >> >> >> ### Implementation >> >> The algorithm is based on update actions that are applied incrementally to a graph layout in order to obtain the layout of the next graph in the sequence. By doing so, the nodes that appears in both graphs remain in their relative positions. A new layout manager called `HierarchicalStableLayoutManager` has been added which holds the core algorithm. The corresponding layout manager with the static layout algorithm is called `HierarchicalLayoutManager`. >> >> If no layouts have been drawn yet, the `HierarchicalLayoutManager` is used. This is because the dynamic algorithm needs an initial layout to apply the update actions on. >> >> The whole graph is represented by `LayoutNode` and `LayoutEdge` objects, that holds the positions of the nodes and edges along with other relevant information such as ID, name and size. These are updated, added and removed in accordance with the update actions. >> >> Since `HierarchicalStableLayoutManager` tries to preserve the node positi... > > emmyyin has updated the pull request incrementally with one additional commit since the last revision: > > update copyright The slowdown caused by `ensureNeighborEdgeConsistency()` is now documented in [JDK-8315316](https://bugs.openjdk.org/browse/JDK-8315316). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14349#issuecomment-1698586974 From roland at openjdk.org Wed Aug 30 07:55:26 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 30 Aug 2023 07:55:26 GMT Subject: RFR: 8314024: SIGSEGV in PhaseIdealLoop::build_loop_late_post_work due to bad immediate dominator info [v3] In-Reply-To: References: <08hhWxI_fkzFY9PRiYwgCW5j96FGkVJmTH0bk8v6yaQ=.efb69989-ea24-4a3e-8240-3d569fc7f131@github.com> Message-ID: On Tue, 29 Aug 2023 15:39:22 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> test fix > > Testing looked good! @chhagedorn @vnkozlov thanks for the reviews and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15399#issuecomment-1698669371 From roland at openjdk.org Wed Aug 30 07:55:28 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 30 Aug 2023 07:55:28 GMT Subject: Integrated: 8314024: SIGSEGV in PhaseIdealLoop::build_loop_late_post_work due to bad immediate dominator info In-Reply-To: References: Message-ID: <9LJ8zjUkOpFR74-VSPFLX5pngGt2kcNrJh0Tsp8ZOTw=.c938afce-9640-4730-8466-d541f810b472@github.com> On Wed, 23 Aug 2023 09:15:38 GMT, Roland Westrelin wrote: > A node is sunk from the pre loop into the main loop. That node, in the > main loop, feeds into a test. When the node is sunk it is pinned > between the main and pre loop. The test it feeds into is then > eliminated by range check elimination: the sunk node becomes input to > an expression that computes the new bound of the pre loop. The > resulting graph is broken because the sunk node is pinned below the > pre loop but used by the exit test of the pre loop. > > The fix I propose is in `PhaseIdealLoop::try_sink_out_of_loop()`, to > skip nodes in pre loops that have a use in the companion main loop. This pull request has now been integrated. Changeset: ed1ea5fe Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/ed1ea5fe7c6fad03ca96e7dece2127eab21a608a Stats: 83 lines in 3 files changed: 83 ins; 0 del; 0 mod 8314024: SIGSEGV in PhaseIdealLoop::build_loop_late_post_work due to bad immediate dominator info Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/15399 From alanb at openjdk.org Wed Aug 30 08:51:28 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 30 Aug 2023 08:51:28 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Wed, 30 Aug 2023 00:37:26 GMT, Srinivas Vamsi Parasa wrote: > Hi Vladimir, Just verified that the test/jdk/java/util/Arrays/Sorting.java is triggering the intrinsic without additional flags Just to add that Sorting.java has short and long run modes. The default when running with jtreg or make run-test is the short run so that it doesn't take too long. It might be useful to try it without -shortrun to give the intrinsic a better work out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1698754202 From rcastanedalo at openjdk.org Wed Aug 30 08:55:44 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Aug 2023 08:55:44 GMT Subject: RFR: 8310220: IGV: dump graph after each IGVN step at level 4 [v2] In-Reply-To: References: Message-ID: > This changeset instruments Iterative GVN (IGVN) in C2 to dump the Ideal graph after each effective step (i.e. when the graph is rewritten or the recorded types are refined). This enables fine-grained tracing of IGVN transformation sequences using Ideal Graph Visualizer. This technique has proved useful for the investigation of [JDK-8303513](https://bugs.openjdk.org/browse/JDK-8303513), and can be also useful for educational purposes: > > ![igv-level4](https://github.com/openjdk/jdk/assets/8792647/56dc9729-d5eb-44f3-8614-dc72e17f1bef) > > These new dumps are emitted at print level 4 (`PrintIdealGraphLevel=4`), the highest level of detail. > > Following [feedback](https://bugs.openjdk.org/browse/JDK-8310220?focusedCommentId=14590132&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14590132) and offline discussions with Christian Hagedorn, the changeset also dumps the Ideal graph before and after IGVN at print level 3. This makes it possible to identify the source of graph changes between IGVN and other phases such as loop transformations. The existing phase `PHASE_MACH_ANALYSIS` is also promoted to print level 3, since it prints a single graph per compilation unit only (see print level documentation updates in this changeset). These additional changes increase the number of graph dumps per compilation at print level 3 by around 1.5x: > > ![igv-level3](https://github.com/openjdk/jdk/assets/8792647/9bccc78b-13b8-428d-8c98-ef3f0f769f4c) > > Finally, the verbose and rarely used bytecode parsing dumps are relegated to a new print level 5, which leaves the number of graphs per compilation at level 4 roughly as before the changeset. > > #### Testing > > - tier1-3 (linux-x64; release and debug mode). > > - Verified that thousands of new IGVN graph dumps are correctly opened and visualized with the Ideal Graph Visualizer, at print levels 3 to 5. Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Update compile phase list in IR test framework - Fix typo - Move bytecode parse dumping to a new IGV dump level 5 - Merge branch 'master' into JDK-8310220 - Dump graph before IGVN (by popular demand) and after IGVN (for symmetry) - Update IGV's README - Promote PHASE_MACH_ANALYSIS dump to print level 3 (since it runs once per compilation) - Dump Ideal graph after each IGVN step (in print level 4) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14537/files - new: https://git.openjdk.org/jdk/pull/14537/files/35ad9fba..e683d28d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14537&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14537&range=00-01 Stats: 177351 lines in 3531 files changed: 74319 ins; 81139 del; 21893 mod Patch: https://git.openjdk.org/jdk/pull/14537.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14537/head:pull/14537 PR: https://git.openjdk.org/jdk/pull/14537 From chagedorn at openjdk.org Wed Aug 30 08:55:44 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 08:55:44 GMT Subject: RFR: 8310220: IGV: dump graph after each IGVN step at level 4 [v2] In-Reply-To: References: Message-ID: <_QZmMFDCk5I70ppu3p3W7_uc2KynK2Pojzn9Kv_QHdw=.7eae4bcb-05eb-4d9b-8486-afe9d05c9e3c@github.com> On Wed, 30 Aug 2023 08:50:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset instruments Iterative GVN (IGVN) in C2 to dump the Ideal graph after each effective step (i.e. when the graph is rewritten or the recorded types are refined). This enables fine-grained tracing of IGVN transformation sequences using Ideal Graph Visualizer. This technique has proved useful for the investigation of [JDK-8303513](https://bugs.openjdk.org/browse/JDK-8303513), and can be also useful for educational purposes: >> >> ![igv-level4](https://github.com/openjdk/jdk/assets/8792647/56dc9729-d5eb-44f3-8614-dc72e17f1bef) >> >> These new dumps are emitted at print level 4 (`PrintIdealGraphLevel=4`), the highest level of detail. >> >> Following [feedback](https://bugs.openjdk.org/browse/JDK-8310220?focusedCommentId=14590132&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14590132) and offline discussions with Christian Hagedorn, the changeset also dumps the Ideal graph before and after IGVN at print level 3. This makes it possible to identify the source of graph changes between IGVN and other phases such as loop transformations. The existing phase `PHASE_MACH_ANALYSIS` is also promoted to print level 3, since it prints a single graph per compilation unit only (see print level documentation updates in this changeset). These additional changes increase the number of graph dumps per compilation at print level 3 by around 1.5x: >> >> ![igv-level3](https://github.com/openjdk/jdk/assets/8792647/9bccc78b-13b8-428d-8c98-ef3f0f769f4c) >> >> Finally, the verbose and rarely used bytecode parsing dumps are relegated to a new print level 5, which leaves the number of graphs per compilation at level 4 roughly as before the changeset. >> >> #### Testing >> >> - tier1-3 (linux-x64; release and debug mode). >> >> - Verified that thousands of new IGVN graph dumps are correctly opened and visualized with the Ideal Graph Visualizer, at print levels 3 to 5. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Update compile phase list in IR test framework > - Fix typo > - Move bytecode parse dumping to a new IGV dump level 5 > - Merge branch 'master' into JDK-8310220 > - Dump graph before IGVN (by popular demand) and after IGVN (for symmetry) > - Update IGV's README > - Promote PHASE_MACH_ANALYSIS dump to print level 3 (since it runs once per compilation) > - Dump Ideal graph after each IGVN step (in print level 4) src/hotspot/share/opto/phaseX.cpp line 896: > 894: const Type* newtype = type_or_null(n); > 895: if (nn != n || oldtype != newtype) { > 896: C->print_method(PHASE_AFTER_ITER_GVN_STEP, 4, n); Should we keep this at level 4 and move the parser generated dumps to a new level 5? I often add dumps for these steps during IGVN but I rarely ever need the parser generated dumps. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14537#discussion_r1236763830 From rcastanedalo at openjdk.org Wed Aug 30 08:55:44 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Aug 2023 08:55:44 GMT Subject: RFR: 8310220: IGV: dump graph after each IGVN step at level 4 [v2] In-Reply-To: <_QZmMFDCk5I70ppu3p3W7_uc2KynK2Pojzn9Kv_QHdw=.7eae4bcb-05eb-4d9b-8486-afe9d05c9e3c@github.com> References: <_QZmMFDCk5I70ppu3p3W7_uc2KynK2Pojzn9Kv_QHdw=.7eae4bcb-05eb-4d9b-8486-afe9d05c9e3c@github.com> Message-ID: On Wed, 21 Jun 2023 10:26:12 GMT, Christian Hagedorn wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Update compile phase list in IR test framework >> - Fix typo >> - Move bytecode parse dumping to a new IGV dump level 5 >> - Merge branch 'master' into JDK-8310220 >> - Dump graph before IGVN (by popular demand) and after IGVN (for symmetry) >> - Update IGV's README >> - Promote PHASE_MACH_ANALYSIS dump to print level 3 (since it runs once per compilation) >> - Dump Ideal graph after each IGVN step (in print level 4) > > src/hotspot/share/opto/phaseX.cpp line 896: > >> 894: const Type* newtype = type_or_null(n); >> 895: if (nn != n || oldtype != newtype) { >> 896: C->print_method(PHASE_AFTER_ITER_GVN_STEP, 4, n); > > Should we keep this at level 4 and move the parser generated dumps to a new level 5? I often add dumps for these steps during IGVN but I rarely ever need the parser generated dumps. Thanks for the feedback Christian, your suggestion makes sense. I will address this in a few weeks when I am back from vacation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14537#discussion_r1236825822 From chagedorn at openjdk.org Wed Aug 30 08:55:44 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 08:55:44 GMT Subject: RFR: 8310220: IGV: dump graph after each IGVN step at level 4 [v2] In-Reply-To: References: <_QZmMFDCk5I70ppu3p3W7_uc2KynK2Pojzn9Kv_QHdw=.7eae4bcb-05eb-4d9b-8486-afe9d05c9e3c@github.com> Message-ID: On Wed, 21 Jun 2023 11:22:46 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/opto/phaseX.cpp line 896: >> >>> 894: const Type* newtype = type_or_null(n); >>> 895: if (nn != n || oldtype != newtype) { >>> 896: C->print_method(PHASE_AFTER_ITER_GVN_STEP, 4, n); >> >> Should we keep this at level 4 and move the parser generated dumps to a new level 5? I often add dumps for these steps during IGVN but I rarely ever need the parser generated dumps. > > Thanks for the feedback Christian, your suggestion makes sense. I will address this in a few weeks when I am back from vacation. Sounds good, thanks Roberto and enjoy your vacation! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14537#discussion_r1236856172 From rcastanedalo at openjdk.org Wed Aug 30 08:55:44 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Aug 2023 08:55:44 GMT Subject: RFR: 8310220: IGV: dump graph after each IGVN step at level 4 [v2] In-Reply-To: References: <_QZmMFDCk5I70ppu3p3W7_uc2KynK2Pojzn9Kv_QHdw=.7eae4bcb-05eb-4d9b-8486-afe9d05c9e3c@github.com> Message-ID: On Wed, 21 Jun 2023 11:37:02 GMT, Christian Hagedorn wrote: >> Thanks for the feedback Christian, your suggestion makes sense. I will address this in a few weeks when I am back from vacation. > > Sounds good, thanks Roberto and enjoy your vacation! Done! Please re-review. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14537#discussion_r1309922139 From chagedorn at openjdk.org Wed Aug 30 09:03:18 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 09:03:18 GMT Subject: RFR: 8310220: IGV: dump graph after each IGVN step at level 4 [v2] In-Reply-To: References: Message-ID: <5QJNviX9qNLuceEEi421W475nZjHyBosf554zn889mE=.fe1e63d0-ab2e-4fe5-98b6-fce5851c0a61@github.com> On Wed, 30 Aug 2023 08:55:44 GMT, Roberto Casta?eda Lozano wrote: >> This changeset instruments Iterative GVN (IGVN) in C2 to dump the Ideal graph after each effective step (i.e. when the graph is rewritten or the recorded types are refined). This enables fine-grained tracing of IGVN transformation sequences using Ideal Graph Visualizer. This technique has proved useful for the investigation of [JDK-8303513](https://bugs.openjdk.org/browse/JDK-8303513), and can be also useful for educational purposes: >> >> ![igv-level4](https://github.com/openjdk/jdk/assets/8792647/56dc9729-d5eb-44f3-8614-dc72e17f1bef) >> >> These new dumps are emitted at print level 4 (`PrintIdealGraphLevel=4`), the highest level of detail. >> >> Following [feedback](https://bugs.openjdk.org/browse/JDK-8310220?focusedCommentId=14590132&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14590132) and offline discussions with Christian Hagedorn, the changeset also dumps the Ideal graph before and after IGVN at print level 3. This makes it possible to identify the source of graph changes between IGVN and other phases such as loop transformations. The existing phase `PHASE_MACH_ANALYSIS` is also promoted to print level 3, since it prints a single graph per compilation unit only (see print level documentation updates in this changeset). These additional changes increase the number of graph dumps per compilation at print level 3 by around 1.5x: >> >> ![igv-level3](https://github.com/openjdk/jdk/assets/8792647/9bccc78b-13b8-428d-8c98-ef3f0f769f4c) >> >> Finally, the verbose and rarely used bytecode parsing dumps are relegated to a new print level 5, which leaves the number of graphs per compilation at level 4 roughly as before the changeset. >> >> #### Testing >> >> - tier1-3 (linux-x64; release and debug mode). >> >> - Verified that thousands of new IGVN graph dumps are correctly opened and visualized with the Ideal Graph Visualizer, at print levels 3 to 5. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Update compile phase list in IR test framework > - Fix typo > - Move bytecode parse dumping to a new IGV dump level 5 > - Merge branch 'master' into JDK-8310220 > - Dump graph before IGVN (by popular demand) and after IGVN (for symmetry) > - Update IGV's README > - Promote PHASE_MACH_ANALYSIS dump to print level 3 (since it runs once per compilation) > - Dump Ideal graph after each IGVN step (in print level 4) That looks good to me! Thanks for introducing the new level for the parsing dumps. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14537#pullrequestreview-1602240332 From rcastanedalo at openjdk.org Wed Aug 30 09:37:11 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 30 Aug 2023 09:37:11 GMT Subject: RFR: 8310220: IGV: dump graph after each IGVN step at level 4 [v2] In-Reply-To: <5QJNviX9qNLuceEEi421W475nZjHyBosf554zn889mE=.fe1e63d0-ab2e-4fe5-98b6-fce5851c0a61@github.com> References: <5QJNviX9qNLuceEEi421W475nZjHyBosf554zn889mE=.fe1e63d0-ab2e-4fe5-98b6-fce5851c0a61@github.com> Message-ID: On Wed, 30 Aug 2023 09:00:34 GMT, Christian Hagedorn wrote: > That looks good to me! Thanks for introducing the new level for the parsing dumps. Thanks for reviewing, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14537#issuecomment-1698825453 From aph at openjdk.org Wed Aug 30 09:38:41 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 30 Aug 2023 09:38:41 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros Message-ID: Performance improvement. This also reduces the delta between mainline head and the backportts in 11u and 8u. ------------- Commit messages: - Remove unused scratch reg. - 8314748: 1-10% regressions on Crypto micros - 8314748: 1-10% regressions on Crypto micros Changes: https://git.openjdk.org/jdk/pull/15427/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15427&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314748 Stats: 32 lines in 2 files changed: 4 ins; 0 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/15427.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15427/head:pull/15427 PR: https://git.openjdk.org/jdk/pull/15427 From chagedorn at openjdk.org Wed Aug 30 09:38:42 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 09:38:42 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: Message-ID: <3kebbmU9_Pg_lad1wIBtzNNWTu0hOmw6NpuVubJX0e4=.1e536c0f-155a-4334-990a-35a391b3c25c@github.com> On Fri, 25 Aug 2023 09:50:25 GMT, Andrew Haley wrote: > Performance improvement. This also reduces the delta between mainline head and the backportts in 11u and 8u. Thanks for following up with this patch. I'll resubmit some benchmark testing over the weekend. Will report back on Monday. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 1640: > 1638: // Used by aesctr_encrypt. > 1639: void StubGenerator::ev_add128(XMMRegister xmmdst, XMMRegister xmmsrc1, XMMRegister xmmsrc2, > 1640: int vector_len, KRegister ktmp, XMMRegister ones, Register rscratch) { `rscratch` seems unused and can be removed. ------------- PR Review: https://git.openjdk.org/jdk/pull/15427#pullrequestreview-1595768007 PR Review Comment: https://git.openjdk.org/jdk/pull/15427#discussion_r1305683508 From chagedorn at openjdk.org Wed Aug 30 09:38:43 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 09:38:43 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: <3kebbmU9_Pg_lad1wIBtzNNWTu0hOmw6NpuVubJX0e4=.1e536c0f-155a-4334-990a-35a391b3c25c@github.com> References: <3kebbmU9_Pg_lad1wIBtzNNWTu0hOmw6NpuVubJX0e4=.1e536c0f-155a-4334-990a-35a391b3c25c@github.com> Message-ID: On Fri, 25 Aug 2023 13:51:37 GMT, Christian Hagedorn wrote: > I'll resubmit some benchmark testing over the weekend. Will report back on Monday. Results look good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15427#issuecomment-1695427574 From duke at openjdk.org Wed Aug 30 09:55:35 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Wed, 30 Aug 2023 09:55:35 GMT Subject: RFR: 8314837: 5 compiled/codecache tests ignore VM flags Message-ID: TestSegmentedCodeCacheOption, TestCodeHeapSizeOptions, and TestPrintCodeCacheOption create processes with various flags. These flags include interpreter, tiered compilation, or segmented code cache, and they may conflict with the additionally specified vm flags. If propagating the flags and overwriting their values, the tests may not run in the intended way. This change adds `@requires vm.flagless` to these tests and keeps them creating processes while ignoring flags. CodeCacheFullCountTest creates a process with specific flags: ReservedCodeCacheSize, UseCodeCacheFlushing, and MethodFlushing. This change requires `vm.flagless` for the same reason. CheckCodeCacheInfo creates a process to print the code cache info while enabling Verbose. Both PrintCodeCache and Verbose are unlikely to conflict with additionally specified vm flags in a significant way, and the info printed stays the same. This change propagates the vm flags. CheckCodeCacheInfo passes in fastdebug build. make test TEST="test/hotspot/jtreg/compiler/codecache/CheckCodeCacheInfo.java" JTREG="JAVA_OPTIONS=-XX:-TieredCompilation" make test TEST="test/hotspot/jtreg/compiler/codecache/CheckCodeCacheInfo.java" JTREG="JAVA_OPTIONS=-Xint" make test TEST="test/hotspot/jtreg/compiler/codecache/CheckCodeCacheInfo.java" JTREG="JAVA_OPTIONS=-XX:-PrintCodeCache" ------------- Commit messages: - 8314837: 5 compiled/codecache tests ignore VM flags Changes: https://git.openjdk.org/jdk/pull/15485/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15485&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314837 Stats: 8 lines in 5 files changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15485.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15485/head:pull/15485 PR: https://git.openjdk.org/jdk/pull/15485 From chagedorn at openjdk.org Wed Aug 30 10:52:10 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 10:52:10 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 09:50:25 GMT, Andrew Haley wrote: > Performance improvement. This also reduces the delta between mainline head and the backportts in 11u and 8u. Looks good to me, thanks for the update. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15427#pullrequestreview-1602426362 From pli at openjdk.org Wed Aug 30 11:28:56 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 30 Aug 2023 11:28:56 GMT Subject: RFR: 8312332: C2: Refactor SWPointer out from SuperWord [v3] In-Reply-To: References: Message-ID: <32FTzUEtQbPz1n_94tD82SOBFltzh8wQEmUOaLxLZ58=.110902e4-eab0-486e-af9f-fbd0eb2f2578@github.com> > As discussed in JDK-8308994, we should first do some refactoring work before proceeding with the new post loop vectorization. In this patch, we have done the following. > > 1) We have created new C2 source files `vectorization.[cpp|hpp]` for shared logics and utilities for C2's auto-vectorization. So far we have moved class `SWPointer` and `VectorElementSizeStats` here from `superword.[cpp|hpp]`. > > 2) We have decoupled `SWPointer` from class `SuperWord` and renamed it to `VPointer` as it will be used by vectorizers other than SuperWord. The original class `SWPointer` and its inner class `Tracer` both have a `_slp` field initialized in their constructors. In this patch, we have replaced them by other fields and re-written the constructors for the same functionality. Original `SWPointer::invariant()` calls function `SuperWord::find_pre_loop_end()` for loop invariant checks. To help decoupling, we moved function `find_pre_loop_end()` to class `CountedLoopNode`. As function `SWPointer::Tracer::invariant_1()` is tightly coupled with `SuperWord` but only prints some debug messages, we temporarily removed it in this patch. We will consider adding it back after later refactoring of `SuperWord` so we added a `TODO` at its call site in this patch. > > 3) We have a lot of memory phi node checks in loop optimizations. So we added a utility function `is_memory_phi()` in `node.hpp`. > > Tested tier1~3 on x86 and AArch64. Also manually verified that option `VectorizeDebug` in compiler directives still works well. Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: Move the cache of _pre_loop_end to CountedLoopNode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15013/files - new: https://git.openjdk.org/jdk/pull/15013/files/c6cbec36..0ca709ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15013&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15013&range=01-02 Stats: 61 lines in 5 files changed: 27 ins; 25 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/15013.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15013/head:pull/15013 PR: https://git.openjdk.org/jdk/pull/15013 From pli at openjdk.org Wed Aug 30 11:29:18 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 30 Aug 2023 11:29:18 GMT Subject: RFR: 8312332: C2: Refactor SWPointer out from SuperWord [v3] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 12:23:14 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorization.cpp line 131: >> >>> 129: bool VPointer::invariant(Node* n) const { >>> 130: NOT_PRODUCT(Tracer::Depth dd;) >>> 131: // TODO: Add more trace output for invariant check after later refactoring >> >> We generally don't like `TODO`s in the code. Best is to just drop it in the code and file an RFE if you think it is really important. >> >> When did this even trace anything? >> `_slp->_lpt->is_member(_slp->_phase->get_loop(n_c)) != (int)_slp->in_bb(n)` >> >> Do you think this tracing is relevant enough? > > If it should never happen: can we add an assert somewhere instead? I don't know when this can happen. This was added in jdk9 without a test case. If this can happen, it should be in some real corner cases. But definitely, the trace message is not important and it's safe to drop that. `TODO` is removed in my latest commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15013#discussion_r1310121093 From pli at openjdk.org Wed Aug 30 11:41:11 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 30 Aug 2023 11:41:11 GMT Subject: RFR: 8312332: C2: Refactor SWPointer out from SuperWord [v3] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 11:58:28 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Move the cache of _pre_loop_end to CountedLoopNode > > src/hotspot/share/opto/vectorization.cpp line 145: > >> 143: Node* n_c = phase()->get_ctrl(n); >> 144: return phase()->is_dominator(n_c, pre_loop_end->loopnode()); >> 145: } > > Is `pre_loop_end != nullptr` possible here? Before your patch we always found `_slp->pre_loop_head()`. > I'm just worried that if we do not find it, then we still return `is_not_member`, but `n` is still located in the space between pre and post loop. > What do you think about this? > > And: would it make sense to cache the `pre_loop_head` in the `VPointer`? Yes, I have found that `pre_loop_end` could be null when we construct `VPointer` in `SuperWord::output()` - see the code in `superword.cpp` (L2574, L2580). It's null because `CountedLoopNode::is_canonical_loop_entry()` returns null at this time. Before my patch, it cannot be null as we cached `_pre_loop_end` in the SuperWord class. To address your concern, my latest commit moves the cache of `_pre_loop_end` from `SuperWord` to `CountedLoopNode`. Some more asserts are also added to make sure it's used for main loops only. (Caching it in `VPointer` doesn't help). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15013#discussion_r1310133543 From pli at openjdk.org Wed Aug 30 11:48:10 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 30 Aug 2023 11:48:10 GMT Subject: RFR: 8312332: C2: Refactor SWPointer out from SuperWord [v3] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 12:06:05 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Move the cache of _pre_loop_end to CountedLoopNode > > src/hotspot/share/opto/vectorization.cpp line 50: > >> 48: _nstack(nstack), _analyze_only(analyze_only), _stack_idx(0) >> 49: #ifndef PRODUCT >> 50: , _tracer((phase->C->directive()->VectorizeDebugOption & 2) > 0) > > You should also refactor the accessors for `VectorizeDebugOption`. I would move it from SuperWord to `vectorization.hpp/cpp` somehow. We should only do the "masking" `& 2` in one single place. Right, but doing this requires moving more handles including `phase` to `vectorization.hpp/cpp`. Perhaps we need to create a new class there first. I have created another task (https://bugs.openjdk.org/browse/JDK-8315361) and would like to do more refactoring later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15013#discussion_r1310141583 From pli at openjdk.org Wed Aug 30 11:56:10 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 30 Aug 2023 11:56:10 GMT Subject: RFR: 8312332: C2: Refactor SWPointer out from SuperWord [v3] In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 12:17:42 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Move the cache of _pre_loop_end to CountedLoopNode > > src/hotspot/share/opto/vectorization.cpp line 40: > >> 38: #endif >> 39: >> 40: VPointer::VPointer(MemNode* mem, PhaseIdealLoop* phase, IdealLoopTree* lpt, > > You could also call it `LPointer` or `LoopPointer`. `VPointer` sounds like `VectorPointer` - but it is not a pointer of a vector but a scalar memop. That could be confusing. But you could also argue it is a `VectorizationPointer`, and hence `VPointer.` My idea is keeping the name as short as possible. I think it should be `VectorizationPointer` so `VPointer`. In my opinion, `LoopPointer` also sounds like a pointer to a loop but it isn't. I'd like to have more suggestions from other reviewers about the name. BTW: `VPointer` is also used for vector memop in `SuperWord::output()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15013#discussion_r1310151670 From adinn at openjdk.org Wed Aug 30 13:58:09 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 30 Aug 2023 13:58:09 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 09:50:25 GMT, Andrew Haley wrote: > Performance improvement. This also reduces the delta between mainline head and the backportts in 11u and 8u. Also looks good to me. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15427#pullrequestreview-1602773867 From roland at openjdk.org Wed Aug 30 14:37:09 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 30 Aug 2023 14:37:09 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination [v2] In-Reply-To: References: Message-ID: On Tue, 29 Aug 2023 08:41:57 GMT, Christian Hagedorn wrote: > > Opaque1 nodes fold away after loop opts which guarantees the parse predicate are removed too after loop opts. In the new code, without the Opaque1 nodes, what causes the parse predicate to be removed after loop opts? > > When creating a new `ParsePredicateNode`, we are registering it for post loop opts IGVN: Makes sense. Thanks for the details. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15449#issuecomment-1699305470 From roland at openjdk.org Wed Aug 30 14:43:14 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 30 Aug 2023 14:43:14 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination [v2] In-Reply-To: References: Message-ID: On Tue, 29 Aug 2023 08:39:39 GMT, Christian Hagedorn wrote: >> This is the last clean-up PR before the complete fix for Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). >> >> This patch includes: >> - Removal of `ConI`->`Opaque1`->`Conv2B` input nodes for `ParsePredicateNodes` with the following additional changes: >> - Adjusting `ParsePredicateNode` to block unwanted optimizations (added empty `ParsePredicateNode::Ideal()`). >> - Changing `Compile::_parse_predicate_opaqs` to not store `Opaque1Nodes` to keep track of Parse Predicates but instead storing `ParsePredicateNodes` directly. Renamed to `Compile::_parse_predicates` and adjusted related methods. >> - Removed asserts matching `Opaque1` -> `Conv2B` shape. >> - Cleaning up `eliminate_useless_predicates()`: >> - Adjust code to find useful/useless Parse Predicates with the new `Compile::_parse_predicates` list with `ParsePredicateNodes` instead of `Opaque1Nodes`. >> - Changing `ParsePredicateNode` to carry a `_useless` state which simplifies the elimination of useless predicates with `eliminate_useless_predicates()` and during IGVN (added `ParsePredicateNode::Value()` for that which also removes the predicate once we are in post loop opts IGVN). >> - Some refactoring/clean-ups of involved code. >> >> Testing: tier1-7 + some fuzzer testing >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > remove hash_delete() Looks good to me. src/hotspot/share/opto/multnode.cpp line 226: > 224: > 225: // we need a ParsePredicate node for predicate reasons > 226: if (reason != Deoptimization::Reason_none && !iff->is_ParsePredicate()) { Unrelated to your change but this code doesn't seem to do what the comment says. `reason != Deoptimization::Reason_none` is not "predicate reasons". I think this needs to be cleaned up at some point. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15449#pullrequestreview-1602873411 PR Review Comment: https://git.openjdk.org/jdk/pull/15449#discussion_r1310387192 From chagedorn at openjdk.org Wed Aug 30 14:51:10 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 14:51:10 GMT Subject: RFR: 8314580: PhaseIdealLoop::transform_long_range_checks fails with assert "was tested before" In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 07:57:34 GMT, Roland Westrelin wrote: > For long counted loops, `PhaseIdealLoop::create_loop_nest()` first > goes over the loop body to collect range checks, then transforms the > long counted loop into a loop nest and then goes over the list of > range checks it collected to transfrom them. For that last step, > `PhaseIdealLoop::transform_long_range_checks()` needs to extract the > parameters of the range check from the range check expression. It > should still recognize the range check expression even though the loop > was transformed in the meantime. That's what fails here. The reason is > that the range check expression uses the long loop increment as input > which, in the creation of the loop nest, is transformed to `outer > phi + inner incr`. That breaks pattern matching of the range check > expression. I propose removing the transformation: > > > incr=>(outer_phi+inner_incr) > > > entireley. After looking at this code again, I don't think it's > needed. The transformation: > > > phi=>(outer_phi+inner_phi) > > > should be all that's needed to correctly transform the loop. Looks reasonable to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15411#pullrequestreview-1602892785 From chagedorn at openjdk.org Wed Aug 30 14:57:13 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 14:57:13 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 14:39:11 GMT, Roland Westrelin wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> remove hash_delete() > > src/hotspot/share/opto/multnode.cpp line 226: > >> 224: >> 225: // we need a ParsePredicate node for predicate reasons >> 226: if (reason != Deoptimization::Reason_none && !iff->is_ParsePredicate()) { > > Unrelated to your change but this code doesn't seem to do what the comment says. `reason != Deoptimization::Reason_none` is not "predicate reasons". I think this needs to be cleaned up at some point. You're right. I've had a look at all the usages if `is_uncommon_trap_if_pattern()` and it seems that we only use this method for `Reason_none. So, I think I can simplify this method. I can do it in this PR as I'm touching this code here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15449#discussion_r1310409215 From duke at openjdk.org Wed Aug 30 15:06:24 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 30 Aug 2023 15:06:24 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Wed, 30 Aug 2023 08:48:09 GMT, Alan Bateman wrote: > > Hi Vladimir, Just verified that the test/jdk/java/util/Arrays/Sorting.java is triggering the intrinsic without additional flags > > Just to add that Sorting.java has short and long run modes. The default when running with jtreg or make run-test is the short run so that it doesn't take too long. It might be useful to try it without -shortrun to give the intrinsic a better work out. Hi Alan, The tests in Sorting.java were run in both short and long modes. The screenshot showing the usage of intrinsics was from long mode run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1699355402 From duke at openjdk.org Wed Aug 30 15:14:28 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 30 Aug 2023 15:14:28 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Wed, 30 Aug 2023 08:48:09 GMT, Alan Bateman wrote: >>> > > Hi, We already have correctness tests. See test/jdk/java/util/Arrays/Sorting.java >>> > > The latest version you can find in PR https://github.com/openjdk/jdk/pull/13568/files >>> > >>> > >>> > Does test/jdk/java/util/Arrays/Sorting.java trigger usage of this intrinsic without additional flags? @vamsi-parasa can you check? >>> >>> Sure Vladimir (@vnkozlov). Will check if test/jdk/java/util/Arrays/Sorting.java is triggering the intrinsic without additional flags and let you know. >> >> Hi Vladimir, >> Just verified that the test/jdk/java/util/Arrays/Sorting.java is triggering the intrinsic without additional flags as shown in the output snapshot below: >> ![image](https://github.com/openjdk/jdk/assets/23087109/a2d4edb1-9377-4f92-bed2-3e40bc5a7654) > >> Hi Vladimir, Just verified that the test/jdk/java/util/Arrays/Sorting.java is triggering the intrinsic without additional flags > > Just to add that Sorting.java has short and long run modes. The default when running with jtreg or make run-test is the short run so that it doesn't take too long. It might be useful to try it without -shortrun to give the intrinsic a better work out. > @AlanBateman If it helps, the changes made by @vamsi-parasa to DualPivotQuickSort.java don't change the logic as written in Java. They only carve out the functionality into separate Java methods retaining the meaning exactly as before. These Java methods are then optimized through a stub. Hi Alan, As Sandhya (@sviswa7) mentioned, this PR does not make any logic changes to the DualPivotQuickSort.java. The code was refactored to separate out the partitioning logic (dual pivot and single pivot) into separate methods. These methods are being intrinsified using AVX512 to do vectorized partitioning preserving the logic of Java side. Similarly, the methods to sort small arrays (insertionSort/mixedInsertionSort) are being intrinsified as well to use AVX512 sort while preserving the logic on Java side. Could you please go through the changes and provide your comments and suggestions? Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1699367300 From chagedorn at openjdk.org Wed Aug 30 15:20:12 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 30 Aug 2023 15:20:12 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 14:54:37 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/multnode.cpp line 226: >> >>> 224: >>> 225: // we need a ParsePredicate node for predicate reasons >>> 226: if (reason != Deoptimization::Reason_none && !iff->is_ParsePredicate()) { >> >> Unrelated to your change but this code doesn't seem to do what the comment says. `reason != Deoptimization::Reason_none` is not "predicate reasons". I think this needs to be cleaned up at some point. > > You're right. I've had a look at all the usages if `is_uncommon_trap_if_pattern()` and it seems that we only use this method for `Reason_none. So, I think I can simplify this method. I can do it in this PR as I'm touching this code here. Okay, I've missed one usage which is not `Deoptimization::Reason` - but let me still have a look if I can clean this method up somehow. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15449#discussion_r1310442185 From duke at openjdk.org Wed Aug 30 16:05:26 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 30 Aug 2023 16:05:26 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29] In-Reply-To: References: <2s4hZgw7KRQ5AoYzTZ3-8BS5V6JAslioJXeil_jvfQA=.fb1a1a31-8726-494b-b77c-a66565a963da@github.com> Message-ID: <13CHm6P2gQzkmUNYGXDlwXtnXjx0_NUnzTKHC2FiMJ4=.af15a026-e777-4510-8f18-3668e5007c88@github.com> On Tue, 29 Aug 2023 20:23:24 GMT, Vladimir Kozlov wrote: >> The shared library approach is being followed currently as an initial implementation to demonstrate the value of AVX512 sorting. This will be followed up in future with support for Windows as well. >> If it is ok with you, the shared library approach could be pursued for now to be later replaced with specialized assembly stubs (which are agnostic to OS and compiler) when AVX512 sort is enabled for Windows. Please let us know. > > I am okay with such incremental approach. Please, file RFE to replace library with stubs in a future (it could be still separate library but with assembler code). Thank you Vladimir! Please see the link to Windows RFE to replace library with assembly stubs here : https://bugs.openjdk.org/browse/JDK-8315382 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1310504519 From kvn at openjdk.org Wed Aug 30 16:37:13 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Aug 2023 16:37:13 GMT Subject: RFR: 8314997: Missing optimization opportunities due to missing try_clean_mem_phi() calls [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 06:38:50 GMT, Christian Hagedorn wrote: >> While working on a Valhalla bug, I've noticed that we sometimes miss `RegionNode::try_clean_mem_phi()` calls to remove a useless diamond >> >> If >> True False >> Region >> >> with only a single memory phi. This blocks further optimizations like converting a loop into a counted one. The code in Valhalla looks slightly different but the problem is also reproducible in mainline. >> >> **Problem** >> >> In the test case, a region is transformed in IGVN such that it merges a diamond without any dependencies on both paths. The region has two phis. One of them is a memory phi which could be transformed by `RegionNode::try_clean_mem_phi()`. But when processing the region with its two phis in IGVN, we do not optimize the memory phi away because `has_unique_phi()` is false and we bail out: >> https://github.com/openjdk/jdk/blob/725ec0ce1b463b21cd4c5287cf4ccbee53ec7349/src/hotspot/share/opto/cfgnode.cpp#L450-L471 >> >> Later in IGVN, the second phi dies and we only have the single memory phi left. But the region will not be added to the IGVN worklist again to re-apply `try_clean_mem_phi()`. We therefore miss the removal of the diamond and we fail to apply further optimizations. In the test case, we fail to convert the loop into a counted loop. >> >> **Proposed Fix** >> >> The fix I propose is to try to apply `try_clean_mem_phi()` whenever a region is merging a diamond with the assumption that the transformation of a memory phi does not hurt when being applied without being able to remove the region with the diamond (because there are other phis left that cannot be removed). Another option would be to re-add the region to the IGVN worklist when the second last phi dies. But the first approach seems simpler and less invasive. >> >> I've also applied some clean-ups and added an IR test. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Vladimir's review Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15445#pullrequestreview-1603101029 From kvn at openjdk.org Wed Aug 30 21:13:39 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Aug 2023 21:13:39 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 09:50:25 GMT, Andrew Haley wrote: > Performance improvement. This also reduces the delta between mainline head and the backportts in 11u and 8u. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15427#pullrequestreview-1603491358 From kvn at openjdk.org Wed Aug 30 21:21:02 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Aug 2023 21:21:02 GMT Subject: RFR: 8314837: 5 compiled/codecache tests ignore VM flags In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 09:47:58 GMT, Yi-Fan Tsai wrote: > TestSegmentedCodeCacheOption, TestCodeHeapSizeOptions, and TestPrintCodeCacheOption create processes with various flags. These flags include interpreter, tiered compilation, or segmented code cache, and they may conflict with the additionally specified vm flags. If propagating the flags and overwriting their values, the tests may not run in the intended way. This change adds `@requires vm.flagless` to these tests and keeps them creating processes while ignoring flags. > > CodeCacheFullCountTest creates a process with specific flags: ReservedCodeCacheSize, UseCodeCacheFlushing, and MethodFlushing. This change requires `vm.flagless` for the same reason. > > CheckCodeCacheInfo creates a process to print the code cache info while enabling Verbose. Both PrintCodeCache and Verbose are unlikely to conflict with additionally specified vm flags in a significant way, and the info printed stays the same. This change propagates the vm flags. > > CheckCodeCacheInfo passes in fastdebug build. > > make test TEST="test/hotspot/jtreg/compiler/codecache/CheckCodeCacheInfo.java" JTREG="JAVA_OPTIONS=-XX:-TieredCompilation" > make test TEST="test/hotspot/jtreg/compiler/codecache/CheckCodeCacheInfo.java" JTREG="JAVA_OPTIONS=-Xint" > make test TEST="test/hotspot/jtreg/compiler/codecache/CheckCodeCacheInfo.java" JTREG="JAVA_OPTIONS=-XX:-PrintCodeCache" I think your description is not accurate. I sounds like this change will prevent passing external test flags into tests. But synopsis of this RFE correctly says `tests ignore VM flags`. And main RFE [8314823](https://bugs.openjdk.org/browse/JDK-8314823) has even more explicit synopsis `Update or mark as vm.flagless tests which ignores external VM flags`. So adding `@requires vm.flagless` simply marks tests which ignore external flags and don't run them in configurations with external flags specified. I suggest to add comment to CheckCodeCacheInfo.java before `ProcessTools.createTestJvm()` call to say that: ProcessTools.createTestJvm - creates a ProcessBuilder to run java with all the test framework arguments applied. this text is from David H. comment in 8314823. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15485#issuecomment-1699856903 From kvn at openjdk.org Wed Aug 30 21:26:06 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Aug 2023 21:26:06 GMT Subject: RFR: 8314580: PhaseIdealLoop::transform_long_range_checks fails with assert "was tested before" In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 07:57:34 GMT, Roland Westrelin wrote: > For long counted loops, `PhaseIdealLoop::create_loop_nest()` first > goes over the loop body to collect range checks, then transforms the > long counted loop into a loop nest and then goes over the list of > range checks it collected to transfrom them. For that last step, > `PhaseIdealLoop::transform_long_range_checks()` needs to extract the > parameters of the range check from the range check expression. It > should still recognize the range check expression even though the loop > was transformed in the meantime. That's what fails here. The reason is > that the range check expression uses the long loop increment as input > which, in the creation of the loop nest, is transformed to `outer > phi + inner incr`. That breaks pattern matching of the range check > expression. I propose removing the transformation: > > > incr=>(outer_phi+inner_incr) > > > entireley. After looking at this code again, I don't think it's > needed. The transformation: > > > phi=>(outer_phi+inner_phi) > > > should be all that's needed to correctly transform the loop. Looks good to me too. Thank you for explaining the issue in details. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15411#pullrequestreview-1603519597 From duke at openjdk.org Wed Aug 30 22:42:48 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Wed, 30 Aug 2023 22:42:48 GMT Subject: RFR: 8314837: 5 compiled/codecache tests ignore VM flags [v2] In-Reply-To: References: Message-ID: > TestSegmentedCodeCacheOption, TestCodeHeapSizeOptions, and TestPrintCodeCacheOption create processes with various flags. These flags include interpreter, tiered compilation, or segmented code cache, and they may conflict with the additionally specified vm flags. If propagating the flags and overwriting their values, the tests may not run in the intended way. This change adds `@requires vm.flagless` to these tests and keeps them creating processes while ignoring flags. > > CodeCacheFullCountTest creates a process with specific flags: ReservedCodeCacheSize, UseCodeCacheFlushing, and MethodFlushing. This change requires `vm.flagless` for the same reason. > > CheckCodeCacheInfo creates a process to print the code cache info while enabling Verbose. Both PrintCodeCache and Verbose are unlikely to conflict with additionally specified vm flags in a significant way, and the info printed stays the same. This change propagates the vm flags. > > CheckCodeCacheInfo passes in fastdebug build. > > make test TEST="test/hotspot/jtreg/compiler/codecache/CheckCodeCacheInfo.java" JTREG="JAVA_OPTIONS=-XX:-TieredCompilation" > make test TEST="test/hotspot/jtreg/compiler/codecache/CheckCodeCacheInfo.java" JTREG="JAVA_OPTIONS=-Xint" > make test TEST="test/hotspot/jtreg/compiler/codecache/CheckCodeCacheInfo.java" JTREG="JAVA_OPTIONS=-XX:-PrintCodeCache" Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Mark CheckCodeCacheInfo flagless ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15485/files - new: https://git.openjdk.org/jdk/pull/15485/files/07f394c2..ab642336 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15485&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15485&range=00-01 Stats: 7 lines in 1 file changed: 0 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/15485.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15485/head:pull/15485 PR: https://git.openjdk.org/jdk/pull/15485 From duke at openjdk.org Wed Aug 30 22:57:00 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Wed, 30 Aug 2023 22:57:00 GMT Subject: RFR: 8314837: 5 compiled/codecache tests ignore VM flags [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 22:42:48 GMT, Yi-Fan Tsai wrote: >> Mark 5 codecache tests which ignore VM flags with `@requires vm.flagless`. These tests specify the code cache flags to create processes and verify their behaviors. There is no need to rerun them with external flags. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Mark CheckCodeCacheInfo flagless I marked CheckCodeCacheInfo with `vm.flagless`. The tested feature doesn't depend on other flags. The irrelevant flag `UnlockDiagnosticVMOptions` is also removed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15485#issuecomment-1699956610 From sviswanathan at openjdk.org Thu Aug 31 00:01:01 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 31 Aug 2023 00:01:01 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 09:50:25 GMT, Andrew Haley wrote: > Performance improvement. This also reduces the delta between mainline head and the backportts in 11u and 8u. Looks good. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15427#pullrequestreview-1603650994 From kvn at openjdk.org Thu Aug 31 02:54:09 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 31 Aug 2023 02:54:09 GMT Subject: RFR: 8314837: 5 compiled/codecache tests ignore VM flags [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 22:42:48 GMT, Yi-Fan Tsai wrote: >> Mark 5 codecache tests which ignore VM flags with `@requires vm.flagless`. These tests specify the code cache flags to create processes and verify their behaviors. There is no need to rerun them with external flags. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Mark CheckCodeCacheInfo flagless Good. You need second review. @lmesnik, please look. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15485#pullrequestreview-1603776301 PR Comment: https://git.openjdk.org/jdk/pull/15485#issuecomment-1700294875 From kvn at openjdk.org Thu Aug 31 02:54:10 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 31 Aug 2023 02:54:10 GMT Subject: RFR: 8314837: 5 compiled/codecache tests ignore VM flags [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 22:51:55 GMT, Yi-Fan Tsai wrote: > I marked CheckCodeCacheInfo with `vm.flagless`. The tested feature doesn't depend on other flags. The irrelevant flag `UnlockDiagnosticVMOptions` is also removed. Okay, this will do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15485#issuecomment-1700293520 From lmesnik at openjdk.org Thu Aug 31 03:09:00 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 31 Aug 2023 03:09:00 GMT Subject: RFR: 8314837: 5 compiled/codecache tests ignore VM flags [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 22:51:55 GMT, Yi-Fan Tsai wrote: >> Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: >> >> Mark CheckCodeCacheInfo flagless > > I marked CheckCodeCacheInfo with `vm.flagless`. The tested feature doesn't depend on other flags. The irrelevant flag `UnlockDiagnosticVMOptions` is also removed. @yftsai The preferred fix is to use 'ProcessTools.createTestJvm ' to fork JVM with tested flags for most of tests It helps to ensure that corresponding functionality works for different modes like C2/C1 only, JMCI compiler and other modes. I think it would be better just to use it for test/hotspot/jtreg/compiler/codecache/CheckCodeCacheInfo.java test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java. The vm.flagless is usually used for tests which are incompatible with any or most of external flags. It is fine to use it for cli tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15485#issuecomment-1700303916 From aph at openjdk.org Thu Aug 31 08:34:10 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 31 Aug 2023 08:34:10 GMT Subject: Integrated: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: Message-ID: <7o3-3MX-YW4ZkbMnCy0QApRRiMHbSrBNueg1ze4Pd3E=.131a59ff-e2e0-4a85-b65f-e5540c252440@github.com> On Fri, 25 Aug 2023 09:50:25 GMT, Andrew Haley wrote: > Performance improvement. This also reduces the delta between mainline head and the backportts in 11u and 8u. This pull request has now been integrated. Changeset: b594f01f Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/b594f01fe4872d255f0f2fd2b1a908660e39f426 Stats: 32 lines in 2 files changed: 4 ins; 0 del; 28 mod 8314748: 1-10% regressions on Crypto micros Reviewed-by: chagedorn, adinn, kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/15427 From stefank at openjdk.org Thu Aug 31 09:08:09 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 31 Aug 2023 09:08:09 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: Message-ID: <5UhhrgXDor4j4NVCtbYQ8br_0EGYh805QObEzm0ewTY=.0327ddc1-f7ae-475f-9213-a6f038120b0c@github.com> On Fri, 25 Aug 2023 09:50:25 GMT, Andrew Haley wrote: > Performance improvement. This also reduces the delta between mainline head and the backportts in 11u and 8u. @theRealAph This seems to be causing a tier1 build failure in our CI pipeline. The error message is: src/hotspot/cpu/x86/macroAssembler_x86.cpp:2755), pid=7238, tid=7480 assert(rscratch != noreg || always_reachable(src)) failed: missing ... # Problematic frame: # V [libjvm.so+0x12f17b9] MacroAssembler::evmovdquq(XMMRegister, AddressLiteral, int, Register)+0x1a9 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15427#issuecomment-1700649395 From aph at openjdk.org Thu Aug 31 09:33:11 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 31 Aug 2023 09:33:11 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: <5UhhrgXDor4j4NVCtbYQ8br_0EGYh805QObEzm0ewTY=.0327ddc1-f7ae-475f-9213-a6f038120b0c@github.com> References: <5UhhrgXDor4j4NVCtbYQ8br_0EGYh805QObEzm0ewTY=.0327ddc1-f7ae-475f-9213-a6f038120b0c@github.com> Message-ID: On Thu, 31 Aug 2023 09:04:46 GMT, Stefan Karlsson wrote: > @theRealAph This seems to be causing a tier1 build failure in our CI pipeline. The error message is: > > ``` > src/hotspot/cpu/x86/macroAssembler_x86.cpp:2755), pid=7238, tid=7480 > assert(rscratch != noreg || always_reachable(src)) failed: missing > ... > # Problematic frame: > # V [libjvm.so+0x12f17b9] MacroAssembler::evmovdquq(XMMRegister, AddressLiteral, int, Register)+0x1a9 > ``` Hmm, okay. Please create a bugzilla entry for this, with full logs attached. I'll produce a fix today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15427#issuecomment-1700690138 From stefank at openjdk.org Thu Aug 31 09:49:10 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 31 Aug 2023 09:49:10 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: Message-ID: <8Nr0aV7Mrx0P9fufYiwAYmIPLzV22WBowioebPqonCE=.550e2a97-34ab-4053-a33b-d4125579c791@github.com> On Fri, 25 Aug 2023 09:50:25 GMT, Andrew Haley wrote: > Performance improvement. This also reduces the delta between mainline head and the backportts in 11u and 8u. https://bugs.openjdk.org/browse/JDK-8315445 I can't attach full logs because it contains Oracle internal information. Tell me if something crucial is missing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15427#issuecomment-1700716062 From aph at openjdk.org Thu Aug 31 09:58:10 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 31 Aug 2023 09:58:10 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: <8Nr0aV7Mrx0P9fufYiwAYmIPLzV22WBowioebPqonCE=.550e2a97-34ab-4053-a33b-d4125579c791@github.com> References: <8Nr0aV7Mrx0P9fufYiwAYmIPLzV22WBowioebPqonCE=.550e2a97-34ab-4053-a33b-d4125579c791@github.com> Message-ID: On Thu, 31 Aug 2023 09:46:04 GMT, Stefan Karlsson wrote: > https://bugs.openjdk.org/browse/JDK-8315445 > > I can't attach full logs because it contains Oracle internal information. Tell me if something crucial is missing. I need as much of the stack trace as you can give me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15427#issuecomment-1700729798 From aph-open at littlepinkcloud.com Thu Aug 31 10:40:27 2023 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Thu, 31 Aug 2023 11:40:27 +0100 Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: <8Nr0aV7Mrx0P9fufYiwAYmIPLzV22WBowioebPqonCE=.550e2a97-34ab-4053-a33b-d4125579c791@github.com> Message-ID: <8760f54c-44b9-05c4-8121-809a7503affb@littlepinkcloud.com> On 8/31/23 10:58, Andrew Haley wrote: > On Thu, 31 Aug 2023 09:46:04 GMT, Stefan Karlsson wrote: > >> https://bugs.openjdk.org/browse/JDK-8315445 >> >> I can't attach full logs because it contains Oracle internal information. Tell me if something crucial is missing. > > I need as much of the stack trace as you can give me. Never mind, I found the mistake. It happened when I made the last change, to remove an "unused" register. I have a fix, and it's ready as soon as I have a bugid. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at openjdk.org Thu Aug 31 11:02:10 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 31 Aug 2023 11:02:10 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: <8Nr0aV7Mrx0P9fufYiwAYmIPLzV22WBowioebPqonCE=.550e2a97-34ab-4053-a33b-d4125579c791@github.com> Message-ID: On Thu, 31 Aug 2023 09:54:55 GMT, Andrew Haley wrote: > > https://bugs.openjdk.org/browse/JDK-8315445 > > I can't attach full logs because it contains Oracle internal information. Tell me if something crucial is missing. > > I need as much of the stack trace as you can give me. Never mind, I found the mistake. It happened when I made the last change, to remove an "unused" register. I have a fix, and it's ready as soon as I have a bugid. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15427#issuecomment-1700824861 From stefank at openjdk.org Thu Aug 31 11:10:11 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 31 Aug 2023 11:10:11 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: <8Nr0aV7Mrx0P9fufYiwAYmIPLzV22WBowioebPqonCE=.550e2a97-34ab-4053-a33b-d4125579c791@github.com> Message-ID: On Thu, 31 Aug 2023 10:59:20 GMT, Andrew Haley wrote: > > > https://bugs.openjdk.org/browse/JDK-8315445 > > > I can't attach full logs because it contains Oracle internal information. Tell me if something crucial is missing. > > > > > > I need as much of the stack trace as you can give me. > > Never mind, I found the mistake. It happened when I made the last change, to remove an "unused" register. I have a fix, and it's ready as soon as I have a bugid. You have the bugid above, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15427#issuecomment-1700834948 From aph at openjdk.org Thu Aug 31 11:20:16 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 31 Aug 2023 11:20:16 GMT Subject: RFR: JDK-8315445: 8314748 causes crashes in x64 builds Message-ID: The crash happens during this phase: [2023-08-31T08:51:27,258Z] Optimizing the exploded image This is the failure mode: # Internal Error (src/hotspot/cpu/x86/macroAssembler_x86.cpp:2755), pid=20045, tid=20157 # assert(rscratch != noreg || always_reachable(src)) failed: missing # # JRE version: OpenJDK Runtime Environment (22.0+14) (fastdebug build 22-ea+14-963) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-ea+14-963, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x12f17b9] MacroAssembler::evmovdquq(XMMRegister, AddressLiteral, int, Register)+0x1a9 ------------- Commit messages: - JDK-8315445: 8314748 causes crashes in x64 builds Changes: https://git.openjdk.org/jdk/pull/15512/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15512&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315445 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15512/head:pull/15512 PR: https://git.openjdk.org/jdk/pull/15512 From chagedorn at openjdk.org Thu Aug 31 11:27:02 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 31 Aug 2023 11:27:02 GMT Subject: RFR: JDK-8315445: 8314748 causes crashes in x64 builds In-Reply-To: References: Message-ID: <8gPGG3cuc7ZinEVtSsdA_S9BL4t5TLi0rtHTqezELWE=.429ebe80-f738-4a6b-a31e-3db3b2c047a5@github.com> On Thu, 31 Aug 2023 11:11:59 GMT, Andrew Haley wrote: > The crash happens during this phase: > [2023-08-31T08:51:27,258Z] Optimizing the exploded image > > This is the failure mode: > > # Internal Error (src/hotspot/cpu/x86/macroAssembler_x86.cpp:2755), pid=20045, tid=20157 > # assert(rscratch != noreg || always_reachable(src)) failed: missing > # > # JRE version: OpenJDK Runtime Environment (22.0+14) (fastdebug build 22-ea+14-963) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-ea+14-963, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x12f17b9] MacroAssembler::evmovdquq(XMMRegister, AddressLiteral, int, Register)+0x1a9 I've totally missed that it in the review - thanks for fixing it quickly! Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15512#pullrequestreview-1604538906 From shade at openjdk.org Thu Aug 31 11:34:01 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 31 Aug 2023 11:34:01 GMT Subject: RFR: JDK-8315445: 8314748 causes crashes in x64 builds In-Reply-To: References: Message-ID: On Thu, 31 Aug 2023 11:11:59 GMT, Andrew Haley wrote: > The crash happens during this phase: > [2023-08-31T08:51:27,258Z] Optimizing the exploded image > > This is the failure mode: > > # Internal Error (src/hotspot/cpu/x86/macroAssembler_x86.cpp:2755), pid=20045, tid=20157 > # assert(rscratch != noreg || always_reachable(src)) failed: missing > # > # JRE version: OpenJDK Runtime Environment (22.0+14) (fastdebug build 22-ea+14-963) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-ea+14-963, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x12f17b9] MacroAssembler::evmovdquq(XMMRegister, AddressLiteral, int, Register)+0x1a9 Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15512#pullrequestreview-1604549920 From aph at openjdk.org Thu Aug 31 12:28:01 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 31 Aug 2023 12:28:01 GMT Subject: RFR: JDK-8315445: 8314748 causes crashes in x64 builds In-Reply-To: <8gPGG3cuc7ZinEVtSsdA_S9BL4t5TLi0rtHTqezELWE=.429ebe80-f738-4a6b-a31e-3db3b2c047a5@github.com> References: <8gPGG3cuc7ZinEVtSsdA_S9BL4t5TLi0rtHTqezELWE=.429ebe80-f738-4a6b-a31e-3db3b2c047a5@github.com> Message-ID: On Thu, 31 Aug 2023 11:24:19 GMT, Christian Hagedorn wrote: > I've totally missed that it in the review - thanks for fixing it quickly! Looks good and trivial. It was very hard to spot. Default arguments [sometimes] considered harmful. 8314748 is only going to fail with certain memory layouts, and I can't reproduce that. Should I wait for all the auto-tests to complete, or just push, on the grounds that it's not going to make things any worse? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15512#issuecomment-1700939285 From chagedorn at openjdk.org Thu Aug 31 12:40:01 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 31 Aug 2023 12:40:01 GMT Subject: RFR: JDK-8315445: 8314748 causes crashes in x64 builds In-Reply-To: References: <8gPGG3cuc7ZinEVtSsdA_S9BL4t5TLi0rtHTqezELWE=.429ebe80-f738-4a6b-a31e-3db3b2c047a5@github.com> Message-ID: On Thu, 31 Aug 2023 12:25:32 GMT, Andrew Haley wrote: > It was very hard to spot. Default arguments [sometimes] considered harmful. Indeed! I've run a quick sanity tier1 testing which just finished and looked good. So, I guess this is good to go as the original testing before the additional clean-up looked good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15512#issuecomment-1700955452 From chagedorn at openjdk.org Thu Aug 31 12:50:35 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 31 Aug 2023 12:50:35 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination [v3] In-Reply-To: References: Message-ID: > This is the last clean-up PR before the complete fix for Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > This patch includes: > - Removal of `ConI`->`Opaque1`->`Conv2B` input nodes for `ParsePredicateNodes` with the following additional changes: > - Adjusting `ParsePredicateNode` to block unwanted optimizations (added empty `ParsePredicateNode::Ideal()`). > - Changing `Compile::_parse_predicate_opaqs` to not store `Opaque1Nodes` to keep track of Parse Predicates but instead storing `ParsePredicateNodes` directly. Renamed to `Compile::_parse_predicates` and adjusted related methods. > - Removed asserts matching `Opaque1` -> `Conv2B` shape. > - Cleaning up `eliminate_useless_predicates()`: > - Adjust code to find useful/useless Parse Predicates with the new `Compile::_parse_predicates` list with `ParsePredicateNodes` instead of `Opaque1Nodes`. > - Changing `ParsePredicateNode` to carry a `_useless` state which simplifies the elimination of useless predicates with `eliminate_useless_predicates()` and during IGVN (added `ParsePredicateNode::Value()` for that which also removes the predicate once we are in post loop opts IGVN). > - Some refactoring/clean-ups of involved code. > > Testing: tier1-7 + some fuzzer testing > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Clean up is_uncommon_trap_if_pattern(), is_uncommon_trap_proj(), variable names, and types. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15449/files - new: https://git.openjdk.org/jdk/pull/15449/files/763a6b4a..2fe396e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15449&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15449&range=01-02 Stats: 117 lines in 9 files changed: 10 ins; 17 del; 90 mod Patch: https://git.openjdk.org/jdk/pull/15449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15449/head:pull/15449 PR: https://git.openjdk.org/jdk/pull/15449 From chagedorn at openjdk.org Thu Aug 31 12:50:36 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 31 Aug 2023 12:50:36 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination [v2] In-Reply-To: References: Message-ID: On Tue, 29 Aug 2023 08:39:39 GMT, Christian Hagedorn wrote: >> This is the last clean-up PR before the complete fix for Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). >> >> This patch includes: >> - Removal of `ConI`->`Opaque1`->`Conv2B` input nodes for `ParsePredicateNodes` with the following additional changes: >> - Adjusting `ParsePredicateNode` to block unwanted optimizations (added empty `ParsePredicateNode::Ideal()`). >> - Changing `Compile::_parse_predicate_opaqs` to not store `Opaque1Nodes` to keep track of Parse Predicates but instead storing `ParsePredicateNodes` directly. Renamed to `Compile::_parse_predicates` and adjusted related methods. >> - Removed asserts matching `Opaque1` -> `Conv2B` shape. >> - Cleaning up `eliminate_useless_predicates()`: >> - Adjust code to find useful/useless Parse Predicates with the new `Compile::_parse_predicates` list with `ParsePredicateNodes` instead of `Opaque1Nodes`. >> - Changing `ParsePredicateNode` to carry a `_useless` state which simplifies the elimination of useless predicates with `eliminate_useless_predicates()` and during IGVN (added `ParsePredicateNode::Value()` for that which also removes the predicate once we are in post loop opts IGVN). >> - Some refactoring/clean-ups of involved code. >> >> Testing: tier1-7 + some fuzzer testing >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > remove hash_delete() I've pushed an update with the following changes: - `is_uncommon_trap_if_pattern()` and `is_uncommon_trap_proj()`: - Changed `Deoptimization::Deopt_Reason` into a default argument for `is_uncommon_trap_if_pattern()` and `is_uncommon_trap_proj()` as most calls are only interested if the `If` just represents such a pattern. This allowed us to remove all the `Deoptimization::Reason_none` arguments in the calling code. - Changed them into `const` methods. - Removed/Moved Parse Predicate checking in `is_uncommon_trap_if_pattern()` as this should be the responsibility of the caller. The method simply wants to check if there is such a pattern with the given deopt reason. - `create_new_if_for_predicate()`: - The continuation projection is always a Parse Predicate success projection (i.e. true projection). Changed types and variable names accordingly. - Other renaming and removing dead code. - Changed some variable names to better highlight that where Parse Predicate success projections are passed around. But I could also split these changes in a separate PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15449#issuecomment-1700971776 From aph at openjdk.org Thu Aug 31 12:54:09 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 31 Aug 2023 12:54:09 GMT Subject: Integrated: JDK-8315445: 8314748 causes crashes in x64 builds In-Reply-To: References: Message-ID: On Thu, 31 Aug 2023 11:11:59 GMT, Andrew Haley wrote: > The crash happens during this phase: > [2023-08-31T08:51:27,258Z] Optimizing the exploded image > > This is the failure mode: > > # Internal Error (src/hotspot/cpu/x86/macroAssembler_x86.cpp:2755), pid=20045, tid=20157 > # assert(rscratch != noreg || always_reachable(src)) failed: missing > # > # JRE version: OpenJDK Runtime Environment (22.0+14) (fastdebug build 22-ea+14-963) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-ea+14-963, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x12f17b9] MacroAssembler::evmovdquq(XMMRegister, AddressLiteral, int, Register)+0x1a9 This pull request has now been integrated. Changeset: 29ff1e45 Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/29ff1e45b910c07711c4f4c3d821712dd9a1e3ba Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8315445: 8314748 causes crashes in x64 builds Reviewed-by: chagedorn, shade ------------- PR: https://git.openjdk.org/jdk/pull/15512 From adinn at openjdk.org Thu Aug 31 14:13:10 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 31 Aug 2023 14:13:10 GMT Subject: RFR: 8314748: 1-10% regressions on Crypto micros In-Reply-To: References: Message-ID: On Fri, 25 Aug 2023 09:50:25 GMT, Andrew Haley wrote: > Performance improvement. This also reduces the delta between mainline head and the backportts in 11u and 8u. Yes, it is the elided rscratch (r15) argument to the `evmovdquq` call at stubGenerator_x86_64_aes.cpp:2134 that is causing the problem. Ah, I see this has already been fixed by JDK-8315445. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15427#issuecomment-1701119996 PR Comment: https://git.openjdk.org/jdk/pull/15427#issuecomment-1701125678 From tholenstein at openjdk.org Thu Aug 31 15:10:08 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 31 Aug 2023 15:10:08 GMT Subject: RFR: 8310220: IGV: dump graph after each IGVN step at level 4 [v2] In-Reply-To: References: Message-ID: On Wed, 30 Aug 2023 08:55:44 GMT, Roberto Casta?eda Lozano wrote: >> This changeset instruments Iterative GVN (IGVN) in C2 to dump the Ideal graph after each effective step (i.e. when the graph is rewritten or the recorded types are refined). This enables fine-grained tracing of IGVN transformation sequences using Ideal Graph Visualizer. This technique has proved useful for the investigation of [JDK-8303513](https://bugs.openjdk.org/browse/JDK-8303513), and can be also useful for educational purposes: >> >> ![igv-level4](https://github.com/openjdk/jdk/assets/8792647/56dc9729-d5eb-44f3-8614-dc72e17f1bef) >> >> These new dumps are emitted at print level 4 (`PrintIdealGraphLevel=4`), the highest level of detail. >> >> Following [feedback](https://bugs.openjdk.org/browse/JDK-8310220?focusedCommentId=14590132&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14590132) and offline discussions with Christian Hagedorn, the changeset also dumps the Ideal graph before and after IGVN at print level 3. This makes it possible to identify the source of graph changes between IGVN and other phases such as loop transformations. The existing phase `PHASE_MACH_ANALYSIS` is also promoted to print level 3, since it prints a single graph per compilation unit only (see print level documentation updates in this changeset). These additional changes increase the number of graph dumps per compilation at print level 3 by around 1.5x: >> >> ![igv-level3](https://github.com/openjdk/jdk/assets/8792647/9bccc78b-13b8-428d-8c98-ef3f0f769f4c) >> >> Finally, the verbose and rarely used bytecode parsing dumps are relegated to a new print level 5, which leaves the number of graphs per compilation at level 4 roughly as before the changeset. >> >> #### Testing >> >> - tier1-3 (linux-x64; release and debug mode). >> >> - Verified that thousands of new IGVN graph dumps are correctly opened and visualized with the Ideal Graph Visualizer, at print levels 3 to 5. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Update compile phase list in IR test framework > - Fix typo > - Move bytecode parse dumping to a new IGV dump level 5 > - Merge branch 'master' into JDK-8310220 > - Dump graph before IGVN (by popular demand) and after IGVN (for symmetry) > - Update IGV's README > - Promote PHASE_MACH_ANALYSIS dump to print level 3 (since it runs once per compilation) > - Dump Ideal graph after each IGVN step (in print level 4) Looks good to me. Thanks for adding this @robcasloz ! ------------- Marked as reviewed by tholenstein (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14537#pullrequestreview-1604987911 From roland at openjdk.org Thu Aug 31 16:01:40 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 31 Aug 2023 16:01:40 GMT Subject: RFR: 8315088: C2: assert(wq.size() - before == EMPTY_LOOP_SIZE) failed: expect the EMPTY_LOOP_SIZE nodes of this body if empty Message-ID: The assert fires in code that looks for counted loops that would be empty if not part of a loop strip mining nest (that is some nodes in the the loop body are kept alive because of the safepoint in the outer loop, the loop would otherwise be empty). That logic starts by finding the set of `EMPTY_LOOP_SIZE` nodes that are core to all counted loops (exit test, iv phi etc) and store them in the `empty_loop_nodes` list. Finding the "core" nodes is performed by `IdealLoopTree::collect_loop_core_nodes`. It enqueues the backedge of the loop in `empty_loop_nodes` and then iteratively pushes inputs of nodes in `empty_loop_nodes` that belongs to the loop until it can't make progress. There should be `EMPTY_LOOP_SIZE` of those nodes. It's possible that that 2 or more loops are involved in the process: a node n in loop 1 could be kept alive by the safepoint of loop 2 so loop 2 needs to be empty for n to become dead and possibly loop 1 to be empty. When that happens while the logic is in the process of determining if loop 1 is empty, it needs to also collect the `EMPTY_LOOP_SIZE` nodes that make the "core" of loop 2. They are enqueued to the `empty_loop_nodes` list too. The failure happens when `empty_loop_nodes` already has the "core" nodes of loop 1 and "core" nodes for loop 2 are collected. In the process`IdealLoopTree::collect_loop_core_nodes` iterates on all `empty_loop_nodes` (loop 1 nodes included, not just starting from the loop 2 backedge). That's inefficient (loop 1 nodes can't help find nodes that belong to loop 2) and wrong because there could be an edge from a node in loop 1 to a node in the body of loop 2. That node in loop 2 is them pushed to the `empty_loop_nodes` list. The fix I propose is to only iterate over loop 2 nodes when loop 2 is processed. ------------- Commit messages: - test - fix Changes: https://git.openjdk.org/jdk/pull/15520/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15520&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315088 Stats: 54 lines in 2 files changed: 53 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15520.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15520/head:pull/15520 PR: https://git.openjdk.org/jdk/pull/15520 From roland at openjdk.org Thu Aug 31 16:02:10 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 31 Aug 2023 16:02:10 GMT Subject: RFR: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination [v3] In-Reply-To: References: Message-ID: On Thu, 31 Aug 2023 12:50:35 GMT, Christian Hagedorn wrote: >> This is the last clean-up PR before the complete fix for Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). >> >> This patch includes: >> - Removal of `ConI`->`Opaque1`->`Conv2B` input nodes for `ParsePredicateNodes` with the following additional changes: >> - Adjusting `ParsePredicateNode` to block unwanted optimizations (added empty `ParsePredicateNode::Ideal()`). >> - Changing `Compile::_parse_predicate_opaqs` to not store `Opaque1Nodes` to keep track of Parse Predicates but instead storing `ParsePredicateNodes` directly. Renamed to `Compile::_parse_predicates` and adjusted related methods. >> - Removed asserts matching `Opaque1` -> `Conv2B` shape. >> - Cleaning up `eliminate_useless_predicates()`: >> - Adjust code to find useful/useless Parse Predicates with the new `Compile::_parse_predicates` list with `ParsePredicateNodes` instead of `Opaque1Nodes`. >> - Changing `ParsePredicateNode` to carry a `_useless` state which simplifies the elimination of useless predicates with `eliminate_useless_predicates()` and during IGVN (added `ParsePredicateNode::Value()` for that which also removes the predicate once we are in post loop opts IGVN). >> - Some refactoring/clean-ups of involved code. >> >> Testing: tier1-7 + some fuzzer testing >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Clean up is_uncommon_trap_if_pattern(), is_uncommon_trap_proj(), variable names, and types. Update looks good to me. Thanks for making the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15449#issuecomment-1701312424 From sviswanathan at openjdk.org Thu Aug 31 18:32:02 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 31 Aug 2023 18:32:02 GMT Subject: RFR: 8314085: Fixing scope from benchmark to thread for JMH tests having shared state In-Reply-To: References: Message-ID: On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote: > In addition to the issue [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing the scope from benchmark to thread for below benchmark files having shared state, also which fixes few of the benchmarks scalability problems. > > org/openjdk/bench/java/io/DataInputStreamTest.java > org/openjdk/bench/java/lang/ArrayClone.java > org/openjdk/bench/java/lang/StringCompareToDifferentLength.java > org/openjdk/bench/java/lang/StringCompareToIgnoreCase.java > org/openjdk/bench/java/lang/StringComparisons.java > org/openjdk/bench/java/lang/StringEquals.java > org/openjdk/bench/java/lang/StringFormat.java > org/openjdk/bench/java/lang/StringReplace.java > org/openjdk/bench/java/lang/StringSubstring.java > org/openjdk/bench/java/lang/StringTemplateFMT.java > org/openjdk/bench/java/lang/constant/MethodTypeDescFactories.java > org/openjdk/bench/java/lang/constant/ReferenceClassDescResolve.java > org/openjdk/bench/java/lang/invoke/MethodHandlesConstant.java > org/openjdk/bench/java/lang/invoke/MethodHandlesIdentity.java > org/openjdk/bench/java/lang/invoke/MethodHandlesThrowException.java > org/openjdk/bench/java/lang/invoke/MethodTypeAppendParams.java > org/openjdk/bench/java/lang/invoke/MethodTypeChangeParam.java > org/openjdk/bench/java/lang/invoke/MethodTypeChangeReturn.java > org/openjdk/bench/java/lang/invoke/MethodTypeDropParams.java > org/openjdk/bench/java/lang/invoke/MethodTypeGenerify.java > org/openjdk/bench/java/lang/invoke/MethodTypeInsertParams.java > org/openjdk/bench/java/security/CipherSuiteBench.java > org/openjdk/bench/java/time/GetYearBench.java > org/openjdk/bench/java/time/InstantBench.java > org/openjdk/bench/java/time/format/DateTimeFormatterWithPaddingBench.java > org/openjdk/bench/java/util/ListArgs.java > org/openjdk/bench/java/util/LocaleDefaults.java > org/openjdk/bench/java/util/TestAdler32.java > org/openjdk/bench/java/util/TestCRC32.java > org/openjdk/bench/java/util/TestCRC32C.java > org/openjdk/bench/java/util/regex/Exponential.java > org/openjdk/bench/java/util/regex/Primality.java > org/openjdk/bench/java/util/regex/Trim.java > org/openjdk/bench/javax/crypto/AESReinit.java > org/openjdk/bench/jdk/incubator/vector/LoadMaskedIOOBEBenchmark.java > org/openjdk/bench/vm/compiler/Rotation.java > org/openjdk/bench/vm/compiler/x86/ConvertF2I.java > org/openjdk/bench/vm/compiler/x86/BasicRules.java > > Please review and provide your feedback. > > Thanks, > Swati Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15230#pullrequestreview-1605442680 From duke at openjdk.org Thu Aug 31 18:45:39 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 31 Aug 2023 18:45:39 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v31] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > > **Arrays.sort performance data using JMH benchmarks for arrays with random data** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | > | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | > | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | > | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | > | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | > | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | > | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | > | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | > | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | > | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | > | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | > | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | > | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | > | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | > | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | > | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | > | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | > | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | > | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | > | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | > | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | > | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | > | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | > | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | > | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | > | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | > | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | > | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | > | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | > | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | > | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | > | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | > | ArraysSort.longSort | 1000 | 10.449 | 6.239 | 1.7 | > | ArraysSort.longSort | 10000 | 307.074 | 70.284 | **4.4** | > | ArraysSor... Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - update build script - Merge branch 'master' of https://git.openjdk.org/jdk into avx512sort - Clean up parameters passed to arrayPartition; update the check to load library - Remove unnecessary import in Arrays.java - Move sort and partition intrinsics from Arrays.java to DPQS.java - Fix unused assignment in DPQS.java and space in Arrays.java - add parallelSort benchmarking - Update copyright for DPQS.java; replace avx512 pivot calculation with scalar version - Update avx512-common-qsort.h - Decomposed DPQS using AVX512 partitioning and AVX512 sort (for small arrays). Works for serial and parallel sort. - ... and 22 more: https://git.openjdk.org/jdk/compare/c8acab1d...1746eedd ------------- Changes: https://git.openjdk.org/jdk/pull/14227/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=30 Stats: 3502 lines in 21 files changed: 2995 ins; 280 del; 227 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Thu Aug 31 18:45:39 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 31 Aug 2023 18:45:39 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: <3MsIs3kNxvxNOftvjnsisc7eWu6CEb-BbBsHJnj9SH4=.64c640da-61c4-49b2-9e19-02de020d2976@github.com> References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> <3MsIs3kNxvxNOftvjnsisc7eWu6CEb-BbBsHJnj9SH4=.64c640da-61c4-49b2-9e19-02de020d2976@github.com> Message-ID: On Tue, 29 Aug 2023 17:00:33 GMT, Srinivas Vamsi Parasa wrote: >> make/modules/java.base/Lib.gmk line 240: >> >>> 238: >>> 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64)+$(INCLUDE_COMPILER2), true+true+true) >>> 240: $(eval $(call SetupJdkLibrary, BUILD_LIB_X86_64, \ >> >> As this is a C++ lib, consider using g++ for linking by setting: >> >> TOOLCHAIN := TOOLCHAIN_LINK_CXX > > Thanks Erik. Will update Lib.gmk to use g++ for linking. Please see the linking updated to use g++ in the latest commit. >> make/modules/java.base/Lib.gmk line 247: >> >>> 245: LDFLAGS := $(LDFLAGS_JDKLIB) \ >>> 246: $(call SET_SHARED_LIBRARY_ORIGIN), \ >>> 247: LDFLAGS_linux := -Wl$(COMMA)--no-as-needed, \ >> >> This is set by default since JDK-8314554. > > Thanks Erik. Will update Lib.gmk accordingly. Based on your suggestion, please see the LDFLAGS_linux removed in the latest commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1312061806 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1312061283 From erikj at openjdk.org Thu Aug 31 19:43:18 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Thu, 31 Aug 2023 19:43:18 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v31] In-Reply-To: References: Message-ID: <7md3I9bm2H5mJ5u4eVOmzAwk30cVVkpQM20OtiJY0CE=.b795aae3-0602-45d1-afee-f3590ba51432@github.com> On Thu, 31 Aug 2023 18:45:39 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - update build script > - Merge branch 'master' of https://git.openjdk.org/jdk into avx512sort > - Clean up parameters passed to arrayPartition; update the check to load library > - Remove unnecessary import in Arrays.java > - Move sort and partition intrinsics from Arrays.java to DPQS.java > - Fix unused assignment in DPQS.java and space in Arrays.java > - add parallelSort benchmarking > - Update copyright for DPQS.java; replace avx512 pivot calculation with scalar version > - Update avx512-common-qsort.h > - Decomposed DPQS using AVX512 partitioning and AVX512 sort (for small arrays). Works for serial and parallel sort. > - ... and 22 more: https://git.openjdk.org/jdk/compare/c8acab1d...1746eedd make/modules/java.base/Lib.gmk line 255: > 253: TARGETS += $(BUILD_LIB_X86_64) > 254: endif > 255: endif Indentation looks off here (https://openjdk.org/groups/build/doc/code-conventions.html) Suggestion: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64)+$(INCLUDE_COMPILER2), true+true+true) ifeq ($(TOOLCHAIN_TYPE), gcc) $(eval $(call SetupJdkLibrary, BUILD_LIB_X86_64, \ NAME := x86_64, \ TOOLCHAIN := TOOLCHAIN_LINK_CXX, \ OPTIMIZATION := HIGH, \ CFLAGS := $(CFLAGS_JDKLIB), \ CXXFLAGS := $(CXXFLAGS_JDKLIB), \ LDFLAGS := $(LDFLAGS_JDKLIB) \ $(call SET_SHARED_LIBRARY_ORIGIN), \ LIBS := $(LIBCXX), \ LIBS_linux := -lc -lm -ldl, \ )) TARGETS += $(BUILD_LIB_X86_64) endif endif I'm also still wondering about the library name. It's very generic for something that seems to be rather specific. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1312139279 From duke at openjdk.org Thu Aug 31 20:37:19 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 31 Aug 2023 20:37:19 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v32] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > > **Arrays.sort performance data using JMH benchmarks for arrays with random data** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | > | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | > | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | > | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | > | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | > | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | > | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | > | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | > | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | > | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | > | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | > | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | > | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | > | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | > | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | > | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | > | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | > | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | > | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | > | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | > | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | > | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | > | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | > | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | > | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | > | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | > | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | > | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | > | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | > | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | > | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | > | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | > | ArraysSort.longSort | 1000 | 10.449 | 6.239 | 1.7 | > | ArraysSort.longSort | 10000 | 307.074 | 70.284 | **4.4** | > | ArraysSor... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update make/modules/java.base/Lib.gmk Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/1746eedd..a0f006b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=30-31 Stats: 14 lines in 1 file changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From xliu at openjdk.org Thu Aug 31 21:07:20 2023 From: xliu at openjdk.org (Xin Liu) Date: Thu, 31 Aug 2023 21:07:20 GMT Subject: RFR: 8314319: LogCompilation doesn't reset lateInlining when it encounters a failure. In-Reply-To: <63KaTi-Lq_cMNUNr25_oi_iUHVG3tC7kTxmtb0h_Q1s=.cfc158e8-7c3f-49a5-a97e-4efd248da9f6@github.com> References: <63KaTi-Lq_cMNUNr25_oi_iUHVG3tC7kTxmtb0h_Q1s=.cfc158e8-7c3f-49a5-a97e-4efd248da9f6@github.com> Message-ID: On Tue, 22 Aug 2023 02:48:26 GMT, Xin Liu wrote: > This patch fixed a bug in LogCompilation. A compilation may encounter a failure after it processes > '' tag. Sometimes, C2 compiler would retry after tweaking options. In this case, it would retry it > without subsume_load. If we don't reset lateInlining, we may have trouble in the retry run. > > We also develop a unittest to verify that. A strip jit.xml is placed in test/resources/ directory. > > It's worth noting that 'mvn test' reports the 2 tests passed even without this patch. We can see the stacktrace > of exceptions. This isn't an accident. There are 2 reasons: > 1. LogParser::parse swallows any throwable in its exception handler. > 2. surefire runs in parallel and can't capture the failure. > > I am not sure they are by design. I manage to fix those 2 problems, but fixing them is beyond the scope of this > patch. I would like to hear reviewer's feedbacks first. Hi, Can I get reviewed for this change? thanks, --lx ------------- PR Comment: https://git.openjdk.org/jdk/pull/15375#issuecomment-1701780766 From duke at openjdk.org Thu Aug 31 21:31:40 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 31 Aug 2023 21:31:40 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v33] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > > **Arrays.sort performance data using JMH benchmarks for arrays with random data** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | > | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | > | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | > | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | > | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | > | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | > | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | > | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | > | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | > | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | > | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | > | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | > | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | > | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | > | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | > | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | > | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | > | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | > | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | > | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | > | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | > | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | > | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | > | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | > | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | > | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | > | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | > | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | > | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | > | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | > | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | > | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | > | ArraysSort.longSort | 1000 | 10.449 | 6.239 | 1.7 | > | ArraysSort.longSort | 10000 | 307.074 | 70.284 | **4.4** | > | ArraysSor... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Change name of the avxsort library to libx86_64_sort ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/a0f006b6..0ec5f52d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=31-32 Stats: 18 lines in 7 files changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Thu Aug 31 21:34:58 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 31 Aug 2023 21:34:58 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30] In-Reply-To: References: <2pgGFUpvYiaaQ0Oj81o5YPjTHOOYWhE26H0VJzVgyIE=.50633006-98e5-4258-8629-b883652cddc7@github.com> Message-ID: On Mon, 28 Aug 2023 23:29:43 GMT, Erik Joelsson wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up parameters passed to arrayPartition; update the check to load library > > make/modules/java.base/Lib.gmk line 241: > >> 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64)+$(INCLUDE_COMPILER2), true+true+true) >> 240: $(eval $(call SetupJdkLibrary, BUILD_LIB_X86_64, \ >> 241: NAME := x86_64, \ > > This looks like a rather generic name for a library. I would expect something a bit more descriptive. > > I also noted that @vnkozlov questioned needing a separate library for this and I didn't really find an answer. What do we gain from separating this into a separate dynamic library? Hi Erik, As per your suggestion, the name of the library is updated to `libx86_64_sort`. Thank you for fixing the indentation in the build script! Please see the changes in the latest commit that was pushed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1312308146 From ecaspole at openjdk.org Thu Aug 31 22:33:47 2023 From: ecaspole at openjdk.org (Eric Caspole) Date: Thu, 31 Aug 2023 22:33:47 GMT Subject: RFR: 8314319: LogCompilation doesn't reset lateInlining when it encounters a failure. In-Reply-To: <63KaTi-Lq_cMNUNr25_oi_iUHVG3tC7kTxmtb0h_Q1s=.cfc158e8-7c3f-49a5-a97e-4efd248da9f6@github.com> References: <63KaTi-Lq_cMNUNr25_oi_iUHVG3tC7kTxmtb0h_Q1s=.cfc158e8-7c3f-49a5-a97e-4efd248da9f6@github.com> Message-ID: On Tue, 22 Aug 2023 02:48:26 GMT, Xin Liu wrote: > This patch fixed a bug in LogCompilation. A compilation may encounter a failure after it processes > '' tag. Sometimes, C2 compiler would retry after tweaking options. In this case, it would retry it > without subsume_load. If we don't reset lateInlining, we may have trouble in the retry run. > > We also develop a unittest to verify that. A strip jit.xml is placed in test/resources/ directory. > > It's worth noting that 'mvn test' reports the 2 tests passed even without this patch. We can see the stacktrace > of exceptions. This isn't an accident. There are 2 reasons: > 1. LogParser::parse swallows any throwable in its exception handler. > 2. surefire runs in parallel and can't capture the failure. > > I am not sure they are by design. I manage to fix those 2 problems, but fixing them is beyond the scope of this > patch. I would like to hear reviewer's feedbacks first. Looks good, thanks for figuring this out. ------------- Marked as reviewed by ecaspole (Committer). PR Review: https://git.openjdk.org/jdk/pull/15375#pullrequestreview-1605859361 From duke at openjdk.org Thu Aug 31 23:05:19 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Thu, 31 Aug 2023 23:05:19 GMT Subject: RFR: 8314837: 5 compiled/codecache tests ignore VM flags [v3] In-Reply-To: References: Message-ID: <3h5xqxujg6DLhDgx7ROWEuXutZbJHOQKeAMOLjbX93o=.27206bad-6326-4eed-bb69-946542c0f61c@github.com> > Mark 5 codecache tests which ignore VM flags with `@requires vm.flagless`. These tests specify the code cache flags to create processes and verify their behaviors. There is no need to rerun them with external flags. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Propagate external vm flags to processes in CheckCodeCacheInfo and CodeCacheFullCountTest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15485/files - new: https://git.openjdk.org/jdk/pull/15485/files/ab642336..c4d936dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15485&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15485&range=01-02 Stats: 6 lines in 2 files changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/15485.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15485/head:pull/15485 PR: https://git.openjdk.org/jdk/pull/15485 From kvn at openjdk.org Thu Aug 31 23:34:37 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 31 Aug 2023 23:34:37 GMT Subject: RFR: 8314319: LogCompilation doesn't reset lateInlining when it encounters a failure. In-Reply-To: <63KaTi-Lq_cMNUNr25_oi_iUHVG3tC7kTxmtb0h_Q1s=.cfc158e8-7c3f-49a5-a97e-4efd248da9f6@github.com> References: <63KaTi-Lq_cMNUNr25_oi_iUHVG3tC7kTxmtb0h_Q1s=.cfc158e8-7c3f-49a5-a97e-4efd248da9f6@github.com> Message-ID: On Tue, 22 Aug 2023 02:48:26 GMT, Xin Liu wrote: > This patch fixed a bug in LogCompilation. A compilation may encounter a failure after it processes > '' tag. Sometimes, C2 compiler would retry after tweaking options. In this case, it would retry it > without subsume_load. If we don't reset lateInlining, we may have trouble in the retry run. > > We also develop a unittest to verify that. A strip jit.xml is placed in test/resources/ directory. > > It's worth noting that 'mvn test' reports the 2 tests passed even without this patch. We can see the stacktrace > of exceptions. This isn't an accident. There are 2 reasons: > 1. LogParser::parse swallows any throwable in its exception handler. > 2. surefire runs in parallel and can't capture the failure. > > I am not sure they are by design. I manage to fix those 2 problems, but fixing them is beyond the scope of this > patch. I would like to hear reviewer's feedbacks first. Looks good. I ran tier1 testing to make sure it passed our source code validation (for your new test). It passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15375#pullrequestreview-1605904457 From duke at openjdk.org Thu Aug 31 23:44:00 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Thu, 31 Aug 2023 23:44:00 GMT Subject: RFR: 8314837: 5 compiled/codecache tests ignore VM flags [v4] In-Reply-To: References: Message-ID: > Mark 5 codecache tests which ignore VM flags with `@requires vm.flagless`. These tests specify the code cache flags to create processes and verify their behaviors. There is no need to rerun them with external flags. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Mark CodeCacheFullCountTest incompatible with interpreter-only mode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15485/files - new: https://git.openjdk.org/jdk/pull/15485/files/c4d936dd..f7b81b2d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15485&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15485&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15485.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15485/head:pull/15485 PR: https://git.openjdk.org/jdk/pull/15485