From duke at openjdk.org Sun Oct 1 07:13:44 2023 From: duke at openjdk.org (Michael Felt) Date: Sun, 1 Oct 2023 07:13:44 GMT Subject: RFR: 8314488: Compile the JDK as C++17 In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 01:41:16 GMT, Julian Waters wrote: > Implementation of [JEP draft: Compile the JDK as C++17](https://bugs.openjdk.org/browse/JDK-8310260) Requiring xlc17 aka openxl means any AIX rte requirements change as well. Binary compatibility, rte , etc. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14988#issuecomment-1741985638 From mdoerr at openjdk.org Sun Oct 1 14:31:34 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Sun, 1 Oct 2023 14:31:34 GMT Subject: RFR: 8314488: Compile the JDK as C++17 In-Reply-To: References: Message-ID: <-rJEmwlkOH--K4APPA1emJ0KM_IOu7uLO5bKqMM0od8=.a77d00e0-fa83-4ab6-835d-d7720a888534@github.com> On Mon, 24 Jul 2023 01:41:16 GMT, Julian Waters wrote: > Implementation of [JEP draft: Compile the JDK as C++17](https://bugs.openjdk.org/browse/JDK-8310260) We're not changing it for existing releases. I don't think non-LTS releases play a significant role regarding such compatibility. Next LTS is supposed to be JDK 25 (2025-09-16, https://www.java.com/releases/). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14988#issuecomment-1742099242 From mdoerr at openjdk.org Sun Oct 1 14:44:42 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Sun, 1 Oct 2023 14:44:42 GMT Subject: RFR: 8316523: Relativize esp in interpreter frames (PowerPC only) In-Reply-To: References: Message-ID: <2QKDcwnCG3vl_Hqy7YyLP-5PRLJEHuga3Sf6LulmxKE=.139f5e77-10a5-498b-8c71-91b8fdafd761@github.com> On Sat, 30 Sep 2023 11:57:49 GMT, Fredrik Bredberg wrote: > Relativize esp (Expression Stack Pointer on PowerPC) in interpreter frames. > > By changing the "esp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles "esp" on PowerPC. The relativization of other interpreter frame members are handled in other subtasks to [JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296). > > It has been sanity tested onr PowerPC using Qemu. Thanks for taking care of PPC64! Looks correct. Seems like `relativize_one` and `derelativize_one` are no longer used and should better get removed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15999#issuecomment-1742102641 From tschatzl at openjdk.org Mon Oct 2 08:09:50 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 2 Oct 2023 08:09:50 GMT Subject: RFR: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration [v4] In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 17:26:56 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' into 8315503-code-root-scan-imbalance >> - iwalulya review - more (gtest) cleanup >> - iwalulya review >> - initial version that seems to work >> >> Contains kludge to avoid modification of currently scanned code root set. >> Ought to be fixed differently. >> >> Contains debug code in table scanners of CodeRootSet/CardSet to find out problems with table growing >> >> Hashcode hack for code root set, using copy&paste ZHash >> >> Shrink table after clean >> >> Bulk removal of nmethods from code root sets after class unloading. From Ivan. >> >> Cleanup, resize after bulk delete, hashcode verification > > Still LGTM! Thanks @walulyai @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/15811#issuecomment-1742516392 From azafari at openjdk.org Mon Oct 2 08:13:53 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 2 Oct 2023 08:13:53 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v6] In-Reply-To: References: <-ALHHMcYPfciG6g2sOT-XIEVTf1pA6XXa93eNXQamD4=.88329bf9-627b-4d78-93fc-299550fc2be0@github.com> Message-ID: On Thu, 28 Sep 2023 16:51:17 GMT, Serguei Spitsyn wrote: > The serviceability files look good. By being paranoid I'd suggest to run more tiers, eg. 3-4. tiers 1-4 are passed. Thanks for your advice. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1742556855 From tschatzl at openjdk.org Mon Oct 2 08:33:16 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 2 Oct 2023 08:33:16 GMT Subject: Integrated: 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 08:04:23 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that modifies the code root (remembered) set to use the CHT as internal representation. > > This removes lots of locking (inhibiting throughput), provides automatic balancing for the code root scan phase, and (parallel) bulk unregistering of nmethdos during code cache unloading improving performance of various pauses that deal with code root sets. > > With a stress test that frequently loads and unloads 6000 classes and associated methods from them we could previously see the following issues: > > During collection pauses: > > [4179,965s][gc,phases ] GC(273) Evacuate Collection Set: 812,18ms > [..] > [4179,965s][gc,phases ] GC(273) Code Root Scan (ms): Min: 0,00, Avg: 59,03, Max: 775,12, Diff: 775,12, Sum: 944,44, Workers: 16 > [...] > [4179,965s][gc,phases ] GC(273) Termination (ms): Min: 0,03, Avg: 643,90, Max: 690,96, Diff: 690,93, Sum: 10302,47, Workers: 16 > > > Code root scan now reduces to ~22ms max on average in this case. > > We have recently seen some imbalances in code root scan and long Remark pauses (thankfully not to that extreme) in other real-world applications too: > > [2466.979s][gc,phases ] GC(131) Code Root Scan (ms): Min: 0.0, Avg: 5.7, Max: 46.4, Diff: 46.4, Sum: 57.0, Workers: 10 > > > Some random comment: > * the mutex for the CHT had to be decreased in priority by one to not conflict with `CodeCache_lock`. This does not seem to be detrimental otherwise. At the same time, I had to move the locks at `nosafepoint-3` to `nosafepoint-4` as well to keep previous ordering. All mutexes with uses of `nosafepoint` as their rank seem to be good now. > > Testing: tier1-5 > > Thanks, > Thomas This pull request has now been integrated. Changeset: 795e5dcc Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/795e5dcc856491031b87a1f2a942681a582673ab Stats: 382 lines in 13 files changed: 218 ins; 114 del; 50 mod 8315503: G1: Code root scan causes long GC pauses due to imbalanced iteration Co-authored-by: Ivan Walulya Reviewed-by: iwalulya, ayang ------------- PR: https://git.openjdk.org/jdk/pull/15811 From epeter at openjdk.org Mon Oct 2 09:07:42 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Oct 2023 09:07:42 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Mon, 4 Sep 2023 01:26:59 GMT, Pengfei Li wrote: >> Thanks @vnkozlov and @eme64, I just created https://github.com/openjdk/jdk/pull/14824 for the legacy code cleanup. > >> @pfustc This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! > > This pull request is not dead. I'm currently doing some refactoring and part of this work in separate pull requests. I will come back after those. @pfustc feel free to ping me when I should re-review! Are you still working on some refactorings? What are your plans? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1742653580 From fbredberg at openjdk.org Mon Oct 2 09:25:06 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Mon, 2 Oct 2023 09:25:06 GMT Subject: RFR: 8316523: Relativize esp in interpreter frames (PowerPC only) [v2] In-Reply-To: References: Message-ID: > Relativize esp (Expression Stack Pointer on PowerPC) in interpreter frames. > > By changing the "esp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles "esp" on PowerPC. The relativization of other interpreter frame members are handled in other subtasks to [JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296). > > It has been sanity tested onr PowerPC using Qemu. Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Updated after review. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15999/files - new: https://git.openjdk.org/jdk/pull/15999/files/7f7c9172..fc665a22 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15999&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15999&range=00-01 Stats: 12 lines in 1 file changed: 0 ins; 12 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15999/head:pull/15999 PR: https://git.openjdk.org/jdk/pull/15999 From mdoerr at openjdk.org Mon Oct 2 09:53:27 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Oct 2023 09:53:27 GMT Subject: RFR: 8316523: Relativize esp in interpreter frames (PowerPC only) [v2] In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 09:25:06 GMT, Fredrik Bredberg wrote: >> Relativize esp (Expression Stack Pointer on PowerPC) in interpreter frames. >> >> By changing the "esp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. >> >> This subtask only handles "esp" on PowerPC. The relativization of other interpreter frame members are handled in other subtasks to [JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296). >> >> It has been sanity tested onr PowerPC using Qemu. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated after review. Thanks! This looks good. @reinrich: Maybe you can provide a 2nd review? ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15999#pullrequestreview-1652433640 From jsjolen at openjdk.org Mon Oct 2 10:26:49 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 2 Oct 2023 10:26:49 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 07:37:26 GMT, Liming Liu wrote: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Some nits, but looks good to me. src/hotspot/share/runtime/os.hpp line 230: > 228: // Some platforms may have special treatments for pretouch, while most > 229: // platforms do the same. So the common part of the code was extract here to > 230: // avoid copying it around. Small grammar fixes: - treatment in singular - do the same *thing* - was extract*ed* - Missing comma after "So" ```c++ // Some platforms may have special treatment for pretouch, while most // platforms do the same thing. So, the common part of the code was extracted here to // avoid copying it around. ------------- PR Review: https://git.openjdk.org/jdk/pull/15781#pullrequestreview-1652473163 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1342517099 From jsjolen at openjdk.org Mon Oct 2 10:26:51 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 2 Oct 2023 10:26:51 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly In-Reply-To: References: <0_bb5A_nBfNRkwlnhCv_hNO8fW7Dt5ktqt-Z3DCy-pk=.e9dcd499-7c06-4283-9e27-690f91c70f05@github.com> Message-ID: On Tue, 19 Sep 2023 09:22:46 GMT, Liming Liu wrote: >> src/hotspot/os/linux/os_linux.cpp line 2914: >> >>> 2912: // will initially always use small pages. >>> 2913: page_size = UseTransparentHugePages ? (size_t)os::vm_page_size() : page_size; >>> 2914: pretouch_memory_fallback(start, end, page_size); >> >> This assignment should be to a new variable named `pretouch_page_size` since it will only be used by `pretouch_memory_fallback`, and otherwise modifies the function parameter and also shadows the variable from `class os`. Declaring it `const` would get you extra points from readers, but I accept that is not the style here. > > The code is moved from the Linux-specific code in pretouchTask.cpp. I think it would be fine here. The moved code was from a `#ifdef` block, which is why the overwriting of `page_size` was done. I prefer Peter's style suggestion of declaring a new `const` variable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1342514403 From ayang at openjdk.org Mon Oct 2 13:17:14 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 2 Oct 2023 13:17:14 GMT Subject: RFR: 8317314: Remove unimplemented ObjArrayKlass::oop_oop_iterate_elements_bounded In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 12:34:20 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15985#issuecomment-1742996512 From ayang at openjdk.org Mon Oct 2 13:20:25 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 2 Oct 2023 13:20:25 GMT Subject: Integrated: 8317314: Remove unimplemented ObjArrayKlass::oop_oop_iterate_elements_bounded In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 12:34:20 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. This pull request has now been integrated. Changeset: 2637e8dd Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/2637e8ddc4ffe102418139f501fc0be8e9c5317b Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod 8317314: Remove unimplemented ObjArrayKlass::oop_oop_iterate_elements_bounded Reviewed-by: dcubed ------------- PR: https://git.openjdk.org/jdk/pull/15985 From mdoerr at openjdk.org Mon Oct 2 13:21:14 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Oct 2023 13:21:14 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object I've run more tests with the `Parse::load_interpreter_state` experiment and found one which fails: vmTestbase/nsk/jvmti/GetObjectMonitorUsage/objmonusage006/TestDescription.java Interesting that so few tests are sensitive to unlock ordering issues. I still wonder if a scenario like this can happen: - Objects A and B are locked by the interpreter in a method which has balanced monitors. - A Java debugger does anything which causes the interpreter frame monitor stack slots to get exchanged (I don't know what it could be.) That could be considered legal because the slots have no defined order. It is even allowed to reuse empty slots on demand. - A later part of the method gets OSR compiled which is not prevented, because the method has balanced monitors. The JIT compiler expects the interpreter frame monitor slots to be in a certain order which is no longer true. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1743001654 From ngasson at openjdk.org Mon Oct 2 13:53:06 2023 From: ngasson at openjdk.org (Nick Gasson) Date: Mon, 2 Oct 2023 13:53:06 GMT Subject: RFR: 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 [v2] In-Reply-To: References: Message-ID: > Building a fastdebug image on a machine without LSE (e.g. A72) or explicitly disabling LSE results in: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (0xe0000000), pid=64585, tid=64619 > # stop: Header is not fast-locked > # > # JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-git-a2391a92c) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-a2391a92c, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # J 1373 c2 sun.nio.ch.NativeThreadSet.add()I java.base (155 bytes) @ 0x0000ffff7ccdf110 [0x0000ffff7ccdef80+0x0000000000000190] > # > > > When UseLSE is false `MacroAssembler::cmpxchg()` uses rscratch1 as a temporary to store the result of the store-exclusive instruction. However rscratch1 may also be one of the registers passed as t1 or t2 to `MacroAssembler::lightweight_lock()` and holding a live value which is then clobbered. Fixed by ensuring rscratch1 is never passed as one of these temporaries. Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: Review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15978/files - new: https://git.openjdk.org/jdk/pull/15978/files/ed2222c8..1608efd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15978&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15978&range=00-01 Stats: 17 lines in 6 files changed: 7 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/15978.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15978/head:pull/15978 PR: https://git.openjdk.org/jdk/pull/15978 From ngasson at openjdk.org Mon Oct 2 13:53:08 2023 From: ngasson at openjdk.org (Nick Gasson) Date: Mon, 2 Oct 2023 13:53:08 GMT Subject: RFR: 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 12:49:57 GMT, Andrew Haley wrote: > ``` > - assert_different_registers(obj, hdr, t1, t2); > + assert_different_registers(obj, hdr, t1, t2, rscratch1, rscratch2); > ``` This is a bit trickier as we'd need to add an additional temporary to `LIR_OpLock` which would affect other platforms (from the call in `C1_MacroAssembler::lock_object()` which passes rscratch2). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15978#issuecomment-1743050687 From jvernee at openjdk.org Mon Oct 2 16:07:09 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 2 Oct 2023 16:07:09 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v33] In-Reply-To: References: Message-ID: <0r5bNt-ez79b7DrOJUuHCPguBQkn3MtEJdoQFqVuWxA=.86265508-9e7d-4fab-a851-35c5c138255d@github.com> > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Remove PIP annotation from jdk.incubator.vector ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/17dacbbd..cc89a519 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=31-32 Stats: 4 lines in 2 files changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From aph at openjdk.org Mon Oct 2 16:26:53 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 2 Oct 2023 16:26:53 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: <5AhJwuTy6zJ4nhANPaeUYTZdQBuc96LefywHhhGA0Rc=.0ad39664-4961-4ea2-8d82-30d801207fa9@github.com> Message-ID: On Fri, 29 Sep 2023 18:48:42 GMT, Patricio Chilano Mateo wrote: >>> If this is a native frame the current FP would point to the current's frame lowest address. The value stored there would be the sender's FP. >> >> Yes to both of those. >> >>> If the sender is also a native frame, then that value would just point to the lowest address of that frame. >> >> Maybe. I think that's the way GCC works, but not the ABI. All the ABI guarantees is that the frame pointers form a chain. And I don't know that GCC always does it this way, e.g. with variable-size arrays. At least this needs a comment. > > Ah, is your concern about setting the right _unextended_sp/_sp value when the caller is also a native frame? If that's the case then yes we would need to know where the frame records are stored to know what to do. I guess the assumption was already that they were stored at the lowest address and that's why we pass fr->link() for the sp when constructing the sender frame. But in any case that sp value is not used to get the sender so even if the value is not correct we won't crash. If that was your concern I can add a comment for that. OK, I think I get it now. The `sender_sp` may or may not really be the actual SP in a native frame (although with current GCC it is) but we do know that there is a chain of {frame pointer, PC} pairs. If we find a PC that is in a code buffer somewhere in that chain, we can find the size of the corresponding Java stack frame, and by subtraction the "real" SP of that Java frame. So why is MacOS different? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1342907734 From aph at openjdk.org Mon Oct 2 17:12:58 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 2 Oct 2023 17:12:58 GMT Subject: RFR: 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 [v2] In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 13:53:06 GMT, Nick Gasson wrote: >> Building a fastdebug image on a machine without LSE (e.g. A72) or explicitly disabling LSE results in: >> >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (0xe0000000), pid=64585, tid=64619 >> # stop: Header is not fast-locked >> # >> # JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-git-a2391a92c) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-a2391a92c, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) >> # Problematic frame: >> # J 1373 c2 sun.nio.ch.NativeThreadSet.add()I java.base (155 bytes) @ 0x0000ffff7ccdf110 [0x0000ffff7ccdef80+0x0000000000000190] >> # >> >> >> When UseLSE is false `MacroAssembler::cmpxchg()` uses rscratch1 as a temporary to store the result of the store-exclusive instruction. However rscratch1 may also be one of the registers passed as t1 or t2 to `MacroAssembler::lightweight_lock()` and holding a live value which is then clobbered. Fixed by ensuring rscratch1 is never passed as one of these temporaries. > > Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15978#pullrequestreview-1653162998 From aph at openjdk.org Mon Oct 2 17:13:10 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 2 Oct 2023 17:13:10 GMT Subject: RFR: 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 13:47:39 GMT, Nick Gasson wrote: > > ``` > > - assert_different_registers(obj, hdr, t1, t2); > > + assert_different_registers(obj, hdr, t1, t2, rscratch1, rscratch2); > > ``` > > This is a bit trickier as we'd need to add an additional temporary to `LIR_OpLock` which would affect other platforms (from the call in `C1_MacroAssembler::lock_object()` which passes rscratch2). Argh. OK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15978#issuecomment-1743425606 From pchilanomate at openjdk.org Mon Oct 2 18:46:50 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 2 Oct 2023 18:46:50 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: <5AhJwuTy6zJ4nhANPaeUYTZdQBuc96LefywHhhGA0Rc=.0ad39664-4961-4ea2-8d82-30d801207fa9@github.com> Message-ID: On Mon, 2 Oct 2023 16:23:26 GMT, Andrew Haley wrote: > OK, I think I get it now. The sender_sp may or may not really be the actual SP in a native frame (although with current GCC it is) but we do know that there is a chain of {frame pointer, PC} pairs. If we find a PC that is in a code buffer somewhere in that chain, we can find the size of the corresponding Java stack frame, and by subtraction the "real" SP of that Java frame. > Exactly. > So why is MacOS different? > Clang saves the frame records at the bottom of the frame (highest address), so using fr->sender_sp() works fine there. I can change it to have the same fix as gcc if we don't want to rely on that assumption. The only reason why I went with that simpler fix is that I think knowing that the sender sp is always two words above the current rfp would allow to walk the stack in some cases whereas with the other fix we would crash. Like if a frame passes the os::is_first_C_frame() check but fr->link() is not really the sender's rfp, doing rfp + 2 would still give a valid sender sp, whereas with the other calculation it would set a wrong value. But maybe that's very unlikely. I still added a new test that will fail if the location of the frame records change. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1343032146 From eosterlund at openjdk.org Mon Oct 2 19:23:29 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Oct 2023 19:23:29 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: References: Message-ID: On Wed, 27 Sep 2023 12:38:37 GMT, Andrew Haley wrote: > > > Do we need an ISB on AArch64-specifc code? There, the guard value is data, not an immediate field. > > > > In other words, what instruction has just been patched that we need to make visible? > > > > > > On AArch64 we only use synchronous cross-modifying code, we just hide the expensive in slow paths using a epoch trick that proves that most executions don't need a fence. So that should all be fine. Sometimes I wonder if we should use that trick on x86_64 as well. > > > > I don't understand this reply. On AArch64 we don't patch code, we patch data. So why do we need to add a missing ISB to AArch64? Generational ZGC patches code too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1743618934 From sspitsyn at openjdk.org Mon Oct 2 23:19:14 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 2 Oct 2023 23:19:14 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered Message-ID: The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. The fix includes: - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function The fix also includes a couple of minor unification tweaks: - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` which have a little bit more optimized check for the `JVMTI_PHASE_PRIMORDIAL`. - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` Testing: ran mach5 tiers 1-6. All tests are passed. ------------- Commit messages: - 8316233: VirtualThreadStart events should not be thread-filtered Changes: https://git.openjdk.org/jdk/pull/16019/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16019&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316233 Stats: 35 lines in 3 files changed: 9 ins; 11 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/16019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16019/head:pull/16019 PR: https://git.openjdk.org/jdk/pull/16019 From kbarrett at openjdk.org Tue Oct 3 01:34:26 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Oct 2023 01:34:26 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility Message-ID: Please review this new facility, providing a general mechanism for intrusive doubly-linked lists. A class supports inclusion in a list by having an IntrusiveListEntry member, and providing structured information about how to access that member. A class supports inclusion in multiple lists by having multiple IntrusiveListEntry members, with different lists specified to use different members. The IntrusiveList class template provides the list management. It is modelled on bidirectional containers such as std::list and boost::intrusive::list, providing many of the expected member types and functions. (Note that the member types use the Standard's naming conventions.) (Not all standard container requirements are met; some operations are not presently supported because they haven't been needed yet.) This includes iteration support using (mostly) standard-conforming iterator types (they are presently missing iterator_category member types, pending being able to include so we can use std::bidirectional_iterator_tag). This change only provides the new facility, and doesn't include any uses of it. It is intended to replace the 4-5 (or maybe more) competing intrusive doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of those alterantives, this proposal provides a suite of unit tests. An example of a place that I think might benefit from this is G1's region handling. There are various places where G1 iterates over all regions in order to do something with those which satisfy some property (humongous regions, regions in the collection set, &etc). If it were trivial to create new region sublists (and this facility makes that easy), some of these could be turned into direct iteration over only the regions of interest. Some specific points to consider when reviewing this proposal: (1) This proposal follows Standard Library API conventions, which differ from HotSpot in various ways. (1a) Lists and iterators provide various type members, with names per the Standard Library. There has been discussion of using some parts of the Standard Library eventually, in which case this would be important. But for now some of the naming choices are atypical for HotSpot. (1b) Some of the function signatures follow the Standard Library APIs even though the reasons for that form might not apply to HotSpot. For example, the list pop operations don't return the removed value. For node-based containers in Standard Library that would introduce exception safety problems, so the Standard Library API is designed to avoid that. HotSpot doesn't use exceptions, so the exception safety issue isn't relevant. And for an intrusive container like this, there isn't an exception safety problem anyway for pop operations. (2) It has been suggested this class should be named "List" rather than "IntrusiveList", as this should be the "list" one usually reaches for in HotSpot code (because no allocation is involved in list manipulation), and if one wanted the other kind of list then just use std::list (if we permitted use of some of the Standard Library). (3) This proposal use a pointer-to-data-member to access the embedded list entry in an object. NonblockingQueue and LockFreeStack use a function for that access. There was a version of MSVC where the pointer to data member approach didn't work. It also didn't work on some versions of Solaris Studio. Those are no longer problems for us. We could change those other classes, or we could change this to similarly use a function, or we could have the inconsistency. (4) This proposal provides support for a parameterized IntrusiveList allocation base. If we were using the Standard Library we would need a solution for heap allocation of Standard Library containers. If we had such, it might be simpler to use that same approach for this class, rather than using the allocation base class approach. This of course assumes that there is a need for heap/resource/arena allocated lists. Testing: mach5 tier1, building for all Oracle-supported platforms and running the new unit tests. ------------- Commit messages: - modernize and expand - old prototype Changes: https://git.openjdk.org/jdk/pull/15896/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8189088 Stats: 3190 lines in 3 files changed: 3190 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15896/head:pull/15896 PR: https://git.openjdk.org/jdk/pull/15896 From dholmes at openjdk.org Tue Oct 3 02:13:42 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Oct 2023 02:13:42 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 14:00:25 GMT, Martin Doerr wrote: > `for (index = mcnt; ...` Shouldn't that be `for (index = mcnt -1; ...` as index < mcnt? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1744053970 From dholmes at openjdk.org Tue Oct 3 05:24:05 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Oct 2023 05:24:05 GMT Subject: RFR: 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free In-Reply-To: References: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> Message-ID: On Fri, 29 Sep 2023 07:13:14 GMT, Aleksey Shipilev wrote: >> To increase performance by avoiding a thread-state transition (native -> in_vm) we change the three "raw" allocation functions in Unsafe to be UNSAFE_LEAF rather than UNSAFE_ENTRY. >> >> It is hard to track through all the related code to prove this is a safe change, but I could not spot anything obvious and testing indicated no issues (my main concern was potential missing WXWrite on macOS Aarch64). >> >> Testing: >> - tiers 1-7 on linux and macos x64 and Aarch64, plus Windows x64 >> >> Thanks > > Looks reasonable. I clicked through some of the os::{malloc,realloc,free} implementations, and nothing pops out as requiring the VM mode. Thanks for the reviews @shipilev and @dean-long ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15977#issuecomment-1744172523 From dholmes at openjdk.org Tue Oct 3 05:24:07 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Oct 2023 05:24:07 GMT Subject: Integrated: 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free In-Reply-To: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> References: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> Message-ID: <-d-MWrEI6oIdGYnGoCJgwloPUCazv1Hez-pB3bgma18=.b53f5e7d-c1ec-4ea2-a8ea-ad8279d40454@github.com> On Fri, 29 Sep 2023 06:48:05 GMT, David Holmes wrote: > To increase performance by avoiding a thread-state transition (native -> in_vm) we change the three "raw" allocation functions in Unsafe to be UNSAFE_LEAF rather than UNSAFE_ENTRY. > > It is hard to track through all the related code to prove this is a safe change, but I could not spot anything obvious and testing indicated no issues (my main concern was potential missing WXWrite on macOS Aarch64). > > Testing: > - tiers 1-7 on linux and macos x64 and Aarch64, plus Windows x64 > > Thanks This pull request has now been integrated. Changeset: 26c21f50 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/26c21f50a39a4ae0425b6e7ae63afbdaf627e710 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free Reviewed-by: shade, dlong ------------- PR: https://git.openjdk.org/jdk/pull/15977 From dholmes at openjdk.org Tue Oct 3 05:24:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Oct 2023 05:24:06 GMT Subject: RFR: 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free In-Reply-To: References: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> Message-ID: On Fri, 29 Sep 2023 08:17:18 GMT, Maurizio Cimadamore wrote: >> To increase performance by avoiding a thread-state transition (native -> in_vm) we change the three "raw" allocation functions in Unsafe to be UNSAFE_LEAF rather than UNSAFE_ENTRY. >> >> It is hard to track through all the related code to prove this is a safe change, but I could not spot anything obvious and testing indicated no issues (my main concern was potential missing WXWrite on macOS Aarch64). >> >> Testing: >> - tiers 1-7 on linux and macos x64 and Aarch64, plus Windows x64 >> >> Thanks > > Thanks for taking care of this @dholmes-ora. Do you know if Unsafe::copyMemory, or Unsafe::setMemory can also receive same treatment? These are bulk operations, so they are less sensitive to the transition cost - but for small copies it can still be a factor. @mcimadamore setMemory and copyMemory are targeting Java arrays not native memory so they have to be safepoint-aware and so cannot be leaf operations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15977#issuecomment-1744178613 From thartmann at openjdk.org Tue Oct 3 07:14:16 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Oct 2023 07:14:16 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 14:03:30 GMT, Damon Fenacci wrote: > # Issue > An intermittent _Memory Pool not found_ error has been noticed when running a few tests (_vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java_, _vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java_) on _macosx_aarch64_ (production build) with non-segmented code cache. > > ## Origin > The issue originates from the fact that aarch64 architecture is a weakly ordered memory architecture, i.e. it _permits the observation and completion of memory accesses in a different order from the program order_. > > More precisely: while calling `CodeHeapPool::get_memory_usage`, the `used` and `committed` variables are retrieved > https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/services/memoryPool.cpp#L181-L182 > and these are computed based on different variables saved in memory in `CodeCache::allocate` (during `heap->allocate` and `heap->expand_by` to be precise) .https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/code/codeCache.cpp#L535-L537 > The problem happens when first `heap->expand_by` gets called (which _increases_ `committed`) and then `heap->allocate` gets called in a second loop pass (which _increases_ `used`). Although stores in `CodeCache::allocate` happen in the this order, when reading from memory in `CodeHeapPool::get_memory_usage` it can happen that `used` has the newly computed value, while `committed` is still "old" (because of ARM?s weak memory order). This is a problem, since `committed` must be > than `used`. > > # Solution > > To avoid this situation we must assure that values used to calculate `committed` are actually saved before the values used to calculate `used` and that the opposite be true for reading. To enforce this we acquire a `CodeCache_lock` while reading `used` and `committed` in `CodeHeapPool::get_memory_usage` (which should actually be the convention when accessing CodeCache data). Nice analysis, Damon. The fix looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15819#pullrequestreview-1654484966 From thartmann at openjdk.org Tue Oct 3 07:29:29 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Oct 2023 07:29:29 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 13:17:55 GMT, Martin Doerr wrote: > Interesting that so few tests are sensitive to unlock ordering issues. Right, I think we definitely need more tests. At least a targeted regression test for this particular issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1744357486 From dholmes at openjdk.org Tue Oct 3 07:38:58 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Oct 2023 07:38:58 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 15:19:03 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/utilities/vmError.cpp line 434: >> >>> 432: return invalid; >>> 433: } >>> 434: if (fr.is_interpreted_frame() || (fr.cb() != nullptr && fr.cb()->frame_size() > 0)) { >> >> This part of the fix is unclear to me. How do the old conditions relate to the new ones? > > The second part of the condition includes the previous checks for is_compiled_frame(), is_native_frame(), is_runtime_frame() plus any other frame that would use sender_for_compiled_frame() when calling frame::sender(), like the safepoint stub. So what is left that is not covered by that condition? Just wondering if there is a simpler form that makes it somewhat clearer what this actually tests for as the old condition was easily understandable and the new one is obscure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1343635939 From ngasson at openjdk.org Tue Oct 3 08:20:43 2023 From: ngasson at openjdk.org (Nick Gasson) Date: Tue, 3 Oct 2023 08:20:43 GMT Subject: Integrated: 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 08:12:06 GMT, Nick Gasson wrote: > Building a fastdebug image on a machine without LSE (e.g. A72) or explicitly disabling LSE results in: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (0xe0000000), pid=64585, tid=64619 > # stop: Header is not fast-locked > # > # JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-git-a2391a92c) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-a2391a92c, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # J 1373 c2 sun.nio.ch.NativeThreadSet.add()I java.base (155 bytes) @ 0x0000ffff7ccdf110 [0x0000ffff7ccdef80+0x0000000000000190] > # > > > When UseLSE is false `MacroAssembler::cmpxchg()` uses rscratch1 as a temporary to store the result of the store-exclusive instruction. However rscratch1 may also be one of the registers passed as t1 or t2 to `MacroAssembler::lightweight_lock()` and holding a live value which is then clobbered. Fixed by ensuring rscratch1 is never passed as one of these temporaries. This pull request has now been integrated. Changeset: b6a97c07 Author: Nick Gasson URL: https://git.openjdk.org/jdk/commit/b6a97c078043862b20bd8e1d1b8ccb8699995515 Stats: 41 lines in 10 files changed: 15 ins; 0 del; 26 mod 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 Reviewed-by: rkennke, aph ------------- PR: https://git.openjdk.org/jdk/pull/15978 From thartmann at openjdk.org Tue Oct 3 08:46:14 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Oct 2023 08:46:14 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v2] In-Reply-To: References: Message-ID: On Wed, 27 Sep 2023 23:34:10 GMT, Cesar Soares Lucas wrote: >> ### Description >> >> Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. >> >> Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. >> >> The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. >> >> ### Benchmarking >> >> **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. >> **Note 2:** Marging of error was negligible. >> >> | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | >> |--------------------------------------|------------------|-------------------| >> | TestTrapAfterMerge | 19.515 | 13.386 | >> | TestArgEscape | 33.165 | 33.254 | >> | TestCallTwoSide | 70.547 | 69.427 | >> | TestCmpAfterMerge | 16.400 | 2.984 | >> | TestCmpMergeWithNull_Second | 27.204 | 27.293 | >> | TestCmpMergeWithNull | 8.248 | 4.920 | >> | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | >> | TestCondAfterMergeWithNull | 6.265 | 5.078 | >> | TestCondLoadAfterMerge | 12.713 | 5.163 | >> | TestConsecutiveSimpleMerge | 30.863 | 4.068 | >> | TestDoubleIfElseMerge | 16.069 | 2.444 | >> | TestEscapeInCallAfterMerge | 23.111 | 22.924 | >> | TestGlobalEscape | 14.459 | 14.425 | >> | TestIfElseInLoop | 246.061 | 42.786 | >> | TestLoadAfterLoopAlias | 45.808 | 45.812 | >> ... > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in test. I didn't look at this in detail yet but submitted testing. I see the following failures. `compiler/eliminateAutobox/TestByteBoxing.java` with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/loopnode.cpp:2178), pid=951972, tid=951999 # assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc Current CompileTask: C2: 1438 263 % b compiler.eliminateAutobox.TestByteBoxing::main @ 1358 (1805 bytes) Stack: [0x00007f0efc9cb000,0x00007f0efcacb000], sp=0x00007f0efcac57a0, free space=1001k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc (loopnode.cpp:2178) V [libjvm.so+0x1256ead] PathFrequency::to(Node*)+0x70d (loopPredicate.cpp:988) V [libjvm.so+0x1258b49] PhaseIdealLoop::loop_predication_impl(IdealLoopTree*)+0x8e9 (loopPredicate.cpp:1462) V [libjvm.so+0x125989a] IdealLoopTree::loop_predication(PhaseIdealLoop*)+0x9a (loopPredicate.cpp:1536) V [libjvm.so+0x12a28d7] PhaseIdealLoop::build_and_optimize()+0xf57 (loopnode.cpp:4582) V [libjvm.so+0x9ee7fb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1114) V [libjvm.so+0x9e9db6] Compile::Optimize()+0xdf6 (compile.cpp:2362) `compiler/eliminateAutobox/TestByteBoxing.java` with `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers`: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (workspace/open/src/hotspot/share/opto/loopnode.cpp:6035), pid=1353611, tid=1353627 # Error: ShouldNotReachHere() # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x129062c] PhaseIdealLoop::verify_strip_mined_scheduling(Node*, Node*)+0x26c Current CompileTask: C2: 547 68 b compiler.eliminateAutobox.TestDoubleBoxing::sump (48 bytes) Stack: [0x00007f1814966000,0x00007f1814a66000], sp=0x00007f1814a60c20, free space=1003k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x129062c] PhaseIdealLoop::verify_strip_mined_scheduling(Node*, Node*)+0x26c (loopnode.cpp:6035) V [libjvm.so+0x12a0fb0] PhaseIdealLoop::build_loop_late_post_work(Node*, bool)+0x420 (loopnode.cpp:6222) V [libjvm.so+0x12a166d] PhaseIdealLoop::build_loop_late(VectorSet&, Node_List&, Node_Stack&)+0xbd (loopnode.cpp:6045) V [libjvm.so+0x12a1f9d] PhaseIdealLoop::build_and_optimize()+0x61d (loopnode.cpp:4461) V [libjvm.so+0x9ee7fb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1114) V [libjvm.so+0x9e9498] Compile::Optimize()+0x4d8 (compile.cpp:2354) Same failures with other tests in `compiler/eliminateAutobox/` `compiler/intrinsics/unsafe/AllocateUninitializedArray.java` with `-XX:-TieredCompilation -XX:+AlwaysIncrementalInline`: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/narrowptrnode.cpp:84), pid=2114638, tid=2114665 # assert(t != TypeNarrowKlass::NULL_PTR) failed: null klass? # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x140c554] DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4 Current CompileTask: C2: 5582 123 compiler.intrinsics.unsafe.AllocateUninitializedArray::testOK (110 bytes) Stack: [0x00007fbb8b172000,0x00007fbb8b272000], sp=0x00007fbb8b26cce0, free space=1003k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x140c554] DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4 (narrowptrnode.cpp:84) V [libjvm.so+0x12a659a] PhaseIdealLoop::split_thru_phi(Node*, Node*, int)+0x30a (loopopts.cpp:103) V [libjvm.so+0x12aa620] PhaseIdealLoop::split_if_with_blocks_pre(Node*)+0x270 (loopopts.cpp:1165) V [libjvm.so+0x12af47f] PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x15f (loopopts.cpp:1877) V [libjvm.so+0x12a291f] PhaseIdealLoop::build_and_optimize()+0xf9f (loopnode.cpp:4572) V [libjvm.so+0x9ee7fb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1114) V [libjvm.so+0x9e9d51] Compile::Optimize()+0xd91 (compile.cpp:2171) ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15825#pullrequestreview-1654667506 From aph at openjdk.org Tue Oct 3 09:19:44 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 3 Oct 2023 09:19:44 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: <5AhJwuTy6zJ4nhANPaeUYTZdQBuc96LefywHhhGA0Rc=.0ad39664-4961-4ea2-8d82-30d801207fa9@github.com> Message-ID: On Mon, 2 Oct 2023 18:43:52 GMT, Patricio Chilano Mateo wrote: > > So why is MacOS different? > > Clang saves the frame records at the bottom of the frame (highest address), so using fr->sender_sp() works fine there. Huh, so it does. I never knew that. > I can change it to have the same fix as gcc if we don't want to rely on that assumption. The only reason why I went with that simpler fix is that I think knowing that the sender sp is always two words above the current rfp would allow to walk the stack in some cases whereas with the other fix we would crash. I guess so, but clang might change that tomorrow. But OK, from what you say it makes sense. > Like if a frame passes the os::is_first_C_frame() check but fr->link() is not really the sender's rfp, We're confident of two things: the frame pointers are a continuous chain through foreign code, and we can unwind frames we create ourselves in the VM. As to where exactly the frame pointer chain is in the stack frame, there are no guarantees: it might be in the middle, and indeed it is in the middle if a function has any local variables that are variable-sized arrays. > doing rfp + 2 would still give a valid sender sp, whereas with the other calculation it would set a wrong value. But maybe that's very unlikely. I still added a new test that will fail if the location of the frame records change. What do you think? That's a good idea. Depending on the internal details of some other open source project is pathological coupling. At the best this is a code smell. Having said that, to fix it properly we'd have to use an unwinder library, and that's a much bigger project. So, OK, I guess this patch is an improvement. One final thing, though. I'm looking at `jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64Frame.java` and I see `AARCH64Frame::sender` if (cb != null) { return senderForCompiledFrame(map, cb); } // Must be native-compiled frame, i.e. the marshaling code for native // methods that exists in the core system. return new AARCH64Frame(getSenderSP(), getLink(), getSenderPC()); We try to keep the agent code and the HotSpot frame code in step. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1343783730 From aph at openjdk.org Tue Oct 3 09:36:38 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 3 Oct 2023 09:36:38 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: <5AhJwuTy6zJ4nhANPaeUYTZdQBuc96LefywHhhGA0Rc=.0ad39664-4961-4ea2-8d82-30d801207fa9@github.com> Message-ID: On Tue, 3 Oct 2023 09:16:46 GMT, Andrew Haley wrote: >>> OK, I think I get it now. The sender_sp may or may not really be the actual SP in a native frame (although with current GCC it is) but we do know that there is a chain of {frame pointer, PC} pairs. If we find a PC that is in a code buffer somewhere in that chain, we can find the size of the corresponding Java stack frame, and by subtraction the "real" SP of that Java frame. >>> >> Exactly. >> >>> So why is MacOS different? >>> >> Clang saves the frame records at the bottom of the frame (highest address), so using fr->sender_sp() works fine there. I can change it to have the same fix as gcc if we don't want to rely on that assumption. The only reason why I went with that simpler fix is that I think knowing that the sender sp is always two words above the current rfp would allow to walk the stack in some cases whereas with the other fix we would crash. Like if a frame passes the os::is_first_C_frame() check but fr->link() is not really the sender's rfp, doing rfp + 2 would still give a valid sender sp, whereas with the other calculation it would set a wrong value. But maybe that's very unlikely. I still added a new test that will fail if the location of the frame records change. What do you think? > >> > So why is MacOS different? >> >> Clang saves the frame records at the bottom of the frame (highest address), so using fr->sender_sp() works fine there. > > Huh, so it does. I never knew that. > >> I can change it to have the same fix as gcc if we don't want to rely on that assumption. The only reason why I went with that simpler fix is that I think knowing that the sender sp is always two words above the current rfp would allow to walk the stack in some cases whereas with the other fix we would crash. > > I guess so, but clang might change that tomorrow. But OK, from what you say it makes sense. > >> Like if a frame passes the os::is_first_C_frame() check but fr->link() is not really the sender's rfp, > > We're confident of two things: the frame pointers are a continuous chain through foreign code, and we can unwind frames we create ourselves in the VM. As to where exactly the frame pointer chain is in the stack frame, there are no guarantees: it might be in the middle, and indeed it is in the middle if a function has any local variables that are variable-sized arrays. > >> doing rfp + 2 would still give a valid sender sp, whereas with the other calculation it would set a wrong value. But maybe that's very unlikely. I still added a new test that will fail if the location of the frame records change. What do you think? > > That's a good idea. > > Depending on the internal details of some other open source project is pathological coupling. At the best this is a code smell. Having said that, to fix it properly we'd have to use an unwinder library, and that's a much bigger project. So, OK, I guess this patch is an improvement. > > One final thing, though. I'm looking at `jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64Frame.java` and I see `AARCH64Frame::sender` > > > if (cb != null) { > return senderForCompiledFrame(map, cb); > } > > // Must be native-compiled frame, i.e. the marshaling code for native > // methods that exists in the core system. > return new AARCH64Frame(getSenderSP(), getLink(), getSenderPC()); > > > We try to keep the agent code and the HotSpot frame code in step. NB, I'm not suggesting you should fix AARCH64Frame.java in this, just noting that this looks like the same bug. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1343809003 From aph at openjdk.org Tue Oct 3 11:05:34 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 3 Oct 2023 11:05:34 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 16:35:18 GMT, Patricio Chilano Mateo wrote: >> Please review the following patch. As explained in the bug comments the problem is that os::get_sender_for_C_frame() always constructs a frame as if the sender is also a native C/C++ frame. Setting a correct value for _unextended_sp is important to avoid crashes if this value is later used to get that frame's caller, which will happen if we end up calling frame::sender_for_compiled_frame(). >> >> The issue exists on aarch64 for both linux and macos but the fix for linux is different. The "Procedure Call Standard for the Arm 64-bit Architecture" doesn't specify a location for the frame record within a stack frame (6.4.6), and gcc happens to choose to save it the top of the frame (lowest address) rather than the bottom. This means that changing fr->link() for fr->sender_sp() won't work. The fix is to use the value of fr->link() but adjusted using the code blob frame size before setting it as the _unextended_sp of the sender frame. While working on this fix I realized the issue is not only when the sender is a native nmethod but with all frames associated with a CodeBlob with a frame size > 0 (runtime stub, safepoint stub, etc) so the check takes that into account. I also made a small fix to next_frame() since these mentioned frames should also use frame::sender(). >> >> I created a new test to verify that walking the stack over a native nmethod or runtime stub now works okay. I'll try to add a reliable test case for walking over a safepoint stub too. I tested the fix by running the new test and also running tiers1-4 in mach5. I'll run the upper tiers too. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - add comment to tests > - use driver + @requires vm.flagless Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15972#pullrequestreview-1654932547 From tschatzl at openjdk.org Tue Oct 3 13:04:03 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 3 Oct 2023 13:04:03 GMT Subject: RFR: 8317350: Move code cache purging out of CodeCache::UnloadingScope Message-ID: Hi all, please review this refactoring that moves actual code cache flushing/purging out of `CodeCache::UnloadingScope`. Reasons: * I prefer that a destructor does not do anything substantial - in some cases, 90% of time is spent in the destructor in that extracted method (due to https://bugs.openjdk.org/browse/JDK-8316959) * imho it does not fit the class which does nothing but sets/resets some code cache unloading behavior (probably should be renamed to `UnloadingBehaviorScope` too in a separate CR). * other existing methods at that level are placed out of that (or any other) scope object too - which is already the case for when doing concurrent unloading. * putting it there makes future logging of the various phases a little bit easier, not having `GCTraceTimer` et al. in various places. Testing: gha Thanks, Thomas ------------- Commit messages: - 8317350 move codecache purging out of unloading scope Changes: https://git.openjdk.org/jdk/pull/16011/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16011&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317350 Stats: 60 lines in 7 files changed: 26 ins; 6 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/16011.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16011/head:pull/16011 PR: https://git.openjdk.org/jdk/pull/16011 From kbarrett at openjdk.org Tue Oct 3 14:31:25 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Oct 2023 14:31:25 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v2] In-Reply-To: References: Message-ID: > Please review this new facility, providing a general mechanism for intrusive > doubly-linked lists. A class supports inclusion in a list by having an > IntrusiveListEntry member, and providing structured information about how to > access that member. A class supports inclusion in multiple lists by having > multiple IntrusiveListEntry members, with different lists specified to use > different members. > > The IntrusiveList class template provides the list management. It is modelled > on bidirectional containers such as std::list and boost::intrusive::list, > providing many of the expected member types and functions. (Note that the > member types use the Standard's naming conventions.) (Not all standard > container requirements are met; some operations are not presently supported > because they haven't been needed yet.) This includes iteration support using > (mostly) standard-conforming iterator types (they are presently missing > iterator_category member types, pending being able to include so we > can use std::bidirectional_iterator_tag). > > This change only provides the new facility, and doesn't include any uses of > it. It is intended to replace the 4-5 (or maybe more) competing intrusive > doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of > those alterantives, this proposal provides a suite of unit tests. > > An example of a place that I think might benefit from this is G1's region > handling. There are various places where G1 iterates over all regions in order > to do something with those which satisfy some property (humongous regions, > regions in the collection set, &etc). If it were trivial to create new region > sublists (and this facility makes that easy), some of these could be turned > into direct iteration over only the regions of interest. > > Some specific points to consider when reviewing this proposal: > > (1) This proposal follows Standard Library API conventions, which differ from > HotSpot in various ways. > > (1a) Lists and iterators provide various type members, with names per the > Standard Library. There has been discussion of using some parts of the > Standard Library eventually, in which case this would be important. But for > now some of the naming choices are atypical for HotSpot. > > (1b) Some of the function signatures follow the Standard Library APIs even > though the reasons for that form might not apply to HotSpot. For example, the > list pop operations don't return the removed value. For node-based containers > in Standard Library that would introduce exception... Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: comment tweaks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15896/files - new: https://git.openjdk.org/jdk/pull/15896/files/689a26a3..3ae62f6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=00-01 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/15896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15896/head:pull/15896 PR: https://git.openjdk.org/jdk/pull/15896 From stefank at openjdk.org Tue Oct 3 14:33:51 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 3 Oct 2023 14:33:51 GMT Subject: RFR: 8316880: AArch64: "stop: Header is not fast-locked" with -XX:-UseLSE since JDK-8315880 [v2] In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 13:53:06 GMT, Nick Gasson wrote: >> Building a fastdebug image on a machine without LSE (e.g. A72) or explicitly disabling LSE results in: >> >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (0xe0000000), pid=64585, tid=64619 >> # stop: Header is not fast-locked >> # >> # JRE version: OpenJDK Runtime Environment (22.0) (fastdebug build 22-internal-git-a2391a92c) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-a2391a92c, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) >> # Problematic frame: >> # J 1373 c2 sun.nio.ch.NativeThreadSet.add()I java.base (155 bytes) @ 0x0000ffff7ccdf110 [0x0000ffff7ccdef80+0x0000000000000190] >> # >> >> >> When UseLSE is false `MacroAssembler::cmpxchg()` uses rscratch1 as a temporary to store the result of the store-exclusive instruction. However rscratch1 may also be one of the registers passed as t1 or t2 to `MacroAssembler::lightweight_lock()` and holding a live value which is then clobbered. Fixed by ensuring rscratch1 is never passed as one of these temporaries. > > Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions Hi, while poking around in the locking code we also found this usage of rscratch1, which seems to be problematic: cmpxchg(tmp, zr, rthread, Assembler::xword, /*acquire*/ true, /*release*/ true, /*weak*/ false, rscratch1); // Sets flags for result if (LockingMode != LM_LIGHTWEIGHT) { // Store a non-null value into the box to avoid looking like a re-entrant // lock. The fast-path monitor unlock code checks for // markWord::monitor_value so use markWord::unused_mark which has the // relevant bit set, and also matches ObjectSynchronizer::enter. mov(tmp, (address)markWord::unused_mark().value()); str(tmp, Address(box, BasicLock::displaced_header_offset_in_bytes())); } br(Assembler::EQ, cont); // CAS success means locking succeeded cmp(rscratch1, rthread); br(Assembler::NE, cont); // Check for recursive locking I think we can use the new `tmp3Reg` instead of `rscratch1` here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15978#issuecomment-1745100334 From cslucas at openjdk.org Tue Oct 3 16:40:42 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Oct 2023 16:40:42 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v2] In-Reply-To: References: Message-ID: <5Wj8SVRwRqlVyO2I1Os9_3WvW476UMPh8KsbDrJOwEo=.5565c026-f723-43f7-ab7e-910aeef95cbe@github.com> On Tue, 3 Oct 2023 08:43:46 GMT, Tobias Hartmann wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typo in test. > > I didn't look at this in detail yet but submitted testing. I see the following failures. > > `compiler/eliminateAutobox/TestByteBoxing.java` with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/workspace/open/src/hotspot/share/opto/loopnode.cpp:2178), pid=951972, tid=951999 > # assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed > # > # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc > > Current CompileTask: > C2: 1438 263 % b compiler.eliminateAutobox.TestByteBoxing::main @ 1358 (1805 bytes) > > Stack: [0x00007f0efc9cb000,0x00007f0efcacb000], sp=0x00007f0efcac57a0, free space=1001k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc (loopnode.cpp:2178) > V [libjvm.so+0x1256ead] PathFrequency::to(Node*)+0x70d (loopPredicate.cpp:988) > V [libjvm.so+0x1258b49] PhaseIdealLoop::loop_predication_impl(IdealLoopTree*)+0x8e9 (loopPredicate.cpp:1462) > V [libjvm.so+0x125989a] IdealLoopTree::loop_predication(PhaseIdealLoop*)+0x9a (loopPredicate.cpp:1536) > V [libjvm.so+0x12a28d7] PhaseIdealLoop::build_and_optimize()+0xf57 (loopnode.cpp:4582) > V [libjvm.so+0x9ee7fb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1114) > V [libjvm.so+0x9e9db6] Compile::Optimize()+0xdf6 (compile.cpp:2362) > > > `compiler/eliminateAutobox/TestByteBoxing.java` with `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers`: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (workspace/open/src/hotspot/share/opto/loopnode.cpp:6035), pid=1353611, tid=1353627 > # Error: ShouldNotReachHere() > # > # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartma... Thank you @TobiHartmann . I'll take a look into the failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1745344848 From mdoerr at openjdk.org Tue Oct 3 16:41:45 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Oct 2023 16:41:45 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Tue, 3 Oct 2023 02:10:34 GMT, David Holmes wrote: > > `for (index = mcnt; ...` > > Shouldn't that be `for (index = mcnt -1; ...` as index < mcnt? I'm using pre-decrement, so the 1st iteration starts with mcnt-1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1745346623 From pchilanomate at openjdk.org Tue Oct 3 19:27:40 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 3 Oct 2023 19:27:40 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: <5AhJwuTy6zJ4nhANPaeUYTZdQBuc96LefywHhhGA0Rc=.0ad39664-4961-4ea2-8d82-30d801207fa9@github.com> Message-ID: On Tue, 3 Oct 2023 09:34:12 GMT, Andrew Haley wrote: >>> > So why is MacOS different? >>> >>> Clang saves the frame records at the bottom of the frame (highest address), so using fr->sender_sp() works fine there. >> >> Huh, so it does. I never knew that. >> >>> I can change it to have the same fix as gcc if we don't want to rely on that assumption. The only reason why I went with that simpler fix is that I think knowing that the sender sp is always two words above the current rfp would allow to walk the stack in some cases whereas with the other fix we would crash. >> >> I guess so, but clang might change that tomorrow. But OK, from what you say it makes sense. >> >>> Like if a frame passes the os::is_first_C_frame() check but fr->link() is not really the sender's rfp, >> >> We're confident of two things: the frame pointers are a continuous chain through foreign code, and we can unwind frames we create ourselves in the VM. As to where exactly the frame pointer chain is in the stack frame, there are no guarantees: it might be in the middle, and indeed it is in the middle if a function has any local variables that are variable-sized arrays. >> >>> doing rfp + 2 would still give a valid sender sp, whereas with the other calculation it would set a wrong value. But maybe that's very unlikely. I still added a new test that will fail if the location of the frame records change. What do you think? >> >> That's a good idea. >> >> Depending on the internal details of some other open source project is pathological coupling. At the best this is a code smell. Having said that, to fix it properly we'd have to use an unwinder library, and that's a much bigger project. So, OK, I guess this patch is an improvement. >> >> One final thing, though. I'm looking at `jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/aarch64/AARCH64Frame.java` and I see `AARCH64Frame::sender` >> >> >> if (cb != null) { >> return senderForCompiledFrame(map, cb); >> } >> >> // Must be native-compiled frame, i.e. the marshaling code for native >> // methods that exists in the core system. >> return new AARCH64Frame(getSenderSP(), getLink(), getSenderPC()); >> >> >> We try to keep the agent code and the HotSpot frame code in step. > > NB, I'm not suggesting you should fix AARCH64Frame.java in this, just noting that this looks like the same bug. Yes, it's also the same code we have in frame::sender_raw(). I'm not sure how we can get to that native frame case when we use frame::sender() though. So if we start from the last Java frame we shouldn't find a native frame in the chain. And when we start from os::current_frame() we use VMError::print_native_stack() which uses os::get_sender_for_C_frame() for native frames. The only case I see is the debug utility ps(), which in case there is no last Java frame, creates a frame with os::current_frame() and then uses frame::sender(). So seems either we should change that last line in sender_raw() to be os::get_sender_for_C_frame() or replace it with an assert(false, "") and fix ps(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1344619813 From pchilanomate at openjdk.org Tue Oct 3 19:32:39 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 3 Oct 2023 19:32:39 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: On Tue, 3 Oct 2023 07:36:22 GMT, David Holmes wrote: >> The second part of the condition includes the previous checks for is_compiled_frame(), is_native_frame(), is_runtime_frame() plus any other frame that would use sender_for_compiled_frame() when calling frame::sender(), like the safepoint stub. > > So what is left that is not covered by that condition? Just wondering if there is a simpler form that makes it somewhat clearer what this actually tests for as the old condition was easily understandable and the new one is obscure. The entry frame and upcall stub frame cases, which do not expect a call to frame::sender() unless there are more frames to walk (see those respective cases in frame::sender_raw()). The native frame case. And then I guess all cases where we crash in some stub routine where the pc belongs to the CodeCache but doesn't match any particular frame (interpreter, compiled, runtime stub, etc), like if we crash in the generate_cont_thaw() stub. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1344625821 From yzheng at openjdk.org Tue Oct 3 20:25:06 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 3 Oct 2023 20:25:06 GMT Subject: RFR: 8317452: [JVMCI] Export symbols used by lightweight locking to JVMCI compilers. Message-ID: Export JavaThread::_lock_stack, LockStack::_top, LockStack::_end_offset, and ObjectMonitor::ANONYMOUS_OWNER to JVMCI compilers. Deprecate JVMCIUseFastLocking ------------- Commit messages: - [JVMCI] Export symbols used by lightweight locking to JVMCI compilers. Changes: https://git.openjdk.org/jdk/pull/16032/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16032&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317452 Stats: 14 lines in 5 files changed: 11 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16032/head:pull/16032 PR: https://git.openjdk.org/jdk/pull/16032 From mdoerr at openjdk.org Tue Oct 3 21:01:03 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Oct 2023 21:01:03 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object I have tried to test on x86 with this patch: diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp index 2154601f2f2..3666d1490fc 100644 --- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp @@ -863,7 +863,7 @@ void C2_MacroAssembler::fast_unlock(Register objReg, Register boxReg, Register t jccb (Assembler::notZero, CheckSucc); // Without cast to int32_t this style of movptr will destroy r10 which is typically obj. movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD); - jmpb (DONE_LABEL); + jmp (DONE_LABEL); // Try to avoid passing control into the slow_path ... bind (CheckSucc); diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.cpp b/src/hotspot/cpu/x86/macroAssembler_x86.cpp index 26135c65418..a95149c2be5 100644 --- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp @@ -9836,6 +9836,15 @@ void MacroAssembler::lightweight_unlock(Register obj, Register hdr, Register tmp assert(hdr == rax, "header must be in rax for cmpxchg"); assert_different_registers(obj, hdr, tmp); + if (UseNewCode) { + Label tos_ok; + movl(tmp, Address(r15_thread, JavaThread::lock_stack_top_offset())); + cmpptr(obj, Address(r15_thread, tmp, Address::times_1, -oopSize)); + jcc(Assembler::equal, tos_ok); + STOP("Top of lock-stack does not match the unlocked object"); + bind(tos_ok); + } + // Mark-word must be lock_mask now, try to swing it back to unlocked_value. movptr(tmp, hdr); // The expected old value orptr(tmp, markWord::unlocked_value); The assertion fires in C1 compiled methods and prevents me from getting far enough to run the same test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1745715640 From jkarthikeyan at openjdk.org Tue Oct 3 23:41:25 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 3 Oct 2023 23:41:25 GMT Subject: RFR: 8316918: Optimize conversions duplicated across phi nodes Message-ID: Hi all, I've created this changeset which introduces a minor optimization that de-duplicates primitive type conversion nodes when behind a phi, by replacing it with a single conversion that follows after the phi. In addition, it cleans up the conversion node classes by introducing a common superclass to host shared behavior. This transformation is beneficial as it reduces the size of the IR and the generated code, and is a fairly frequent pattern. Most notably, array creation with a non-constant size parameter contains a duplicated ConvI2L in a branch, and when this transformation is applied the entire branch is able to be removed as the transformation has allowed it to realize that there is only one unique input. In the future, I would like to do this transformation more generally with other types of pure operations with shared inputs, but I figured that this is a good starting point. Here are some performance benchmarks from my (Zen 3) machine: Baseline Patch Improvement Benchmark Mode Cnt Score Error Units Score Error Units PhiDuplicatedConversion.testDouble2Float avgt 12 679.987 ? 29.678 ns/op / 592.162 ? 11.354 ns/op + 12.9% PhiDuplicatedConversion.testDouble2Int avgt 12 737.388 ? 24.690 ns/op / 651.517 ? 12.950 ns/op + 11.6% PhiDuplicatedConversion.testDouble2Long avgt 12 685.582 ? 24.236 ns/op / 662.577 ? 16.498 ns/op + 3.3% PhiDuplicatedConversion.testFloat2Double avgt 12 670.812 ? 22.945 ns/op / 641.940 ? 15.954 ns/op + 4.3% PhiDuplicatedConversion.testFloat2Int avgt 12 703.796 ? 21.627 ns/op / 652.882 ? 14.300 ns/op + 7.2% PhiDuplicatedConversion.testFloat2Long avgt 12 682.821 ? 22.023 ns/op / 651.343 ? 13.281 ns/op + 4.6% PhiDuplicatedConversion.testInt2Double avgt 12 694.062 ? 15.567 ns/op / 637.920 ? 8.959 ns/op + 8.0% PhiDuplicatedConversion.testInt2Float avgt 12 709.544 ? 20.454 ns/op / 637.696 ? 7.011 ns/op + 10.1% PhiDuplicatedConversion.testInt2Long avgt 12 660.117 ? 22.712 ns/op / 637.106 ? 10.776 ns/op + 3.4% PhiDuplicatedConversion.testLong2Double avgt 12 666.147 ? 18.828 ns/op / 635.747 ? 6.524 ns/op + 4.5% PhiDuplicatedConversion.testLong2Float avgt 12 675.239 ? 16.210 ns/op / 640.328 ? 6.551 ns/op + 5.1% PhiDuplicatedConversion.testLong2Int avgt 12 665.644 ? 13.507 ns/op / 637.952 ? 10.948 ns/op + 4.1% Testing: tier1-3, linux x86_64 Reviews and comments would be appreciated! ------------- Commit messages: - Push duplicated convert nodes through phis Changes: https://git.openjdk.org/jdk/pull/16036/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16036&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316918 Stats: 610 lines in 8 files changed: 512 ins; 35 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/16036.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16036/head:pull/16036 PR: https://git.openjdk.org/jdk/pull/16036 From jjoo at openjdk.org Tue Oct 3 23:51:11 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 3 Oct 2023 23:51:11 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v25] In-Reply-To: References: Message-ID: <_mQlT8Rc9VZ70eMt5A3vz6mWhyZcRaTbPo8WRut4RQs=.657a01e6-ed65-49dc-895a-7355dee5b222@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Update logic to use cmpxchg rather than add ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/3eae6bba..9b5772fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=23-24 Stats: 15 lines in 2 files changed: 10 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From kbarrett at openjdk.org Wed Oct 4 03:36:50 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 4 Oct 2023 03:36:50 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: > Please review this new facility, providing a general mechanism for intrusive > doubly-linked lists. A class supports inclusion in a list by having an > IntrusiveListEntry member, and providing structured information about how to > access that member. A class supports inclusion in multiple lists by having > multiple IntrusiveListEntry members, with different lists specified to use > different members. > > The IntrusiveList class template provides the list management. It is modelled > on bidirectional containers such as std::list and boost::intrusive::list, > providing many of the expected member types and functions. (Note that the > member types use the Standard's naming conventions.) (Not all standard > container requirements are met; some operations are not presently supported > because they haven't been needed yet.) This includes iteration support using > (mostly) standard-conforming iterator types (they are presently missing > iterator_category member types, pending being able to include so we > can use std::bidirectional_iterator_tag). > > This change only provides the new facility, and doesn't include any uses of > it. It is intended to replace the 4-5 (or maybe more) competing intrusive > doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of > those alterantives, this proposal provides a suite of unit tests. > > An example of a place that I think might benefit from this is G1's region > handling. There are various places where G1 iterates over all regions in order > to do something with those which satisfy some property (humongous regions, > regions in the collection set, &etc). If it were trivial to create new region > sublists (and this facility makes that easy), some of these could be turned > into direct iteration over only the regions of interest. > > Some specific points to consider when reviewing this proposal: > > (1) This proposal follows Standard Library API conventions, which differ from > HotSpot in various ways. > > (1a) Lists and iterators provide various type members, with names per the > Standard Library. There has been discussion of using some parts of the > Standard Library eventually, in which case this would be important. But for > now some of the naming choices are atypical for HotSpot. > > (1b) Some of the function signatures follow the Standard Library APIs even > though the reasons for that form might not apply to HotSpot. For example, the > list pop operations don't return the removed value. For node-based containers > in Standard Library that would introduce exception... Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: add IntrusiveListEntry::is_attached() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15896/files - new: https://git.openjdk.org/jdk/pull/15896/files/3ae62f6c..e85271eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=01-02 Stats: 13 lines in 2 files changed: 13 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15896/head:pull/15896 PR: https://git.openjdk.org/jdk/pull/15896 From dholmes at openjdk.org Wed Oct 4 03:37:56 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 4 Oct 2023 03:37:56 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: <6pRhDN9M7oowUPzTc-1M66Xg1r3STQbazbLlqBbB-UI=.15fff1eb-9038-4761-b874-cb073cce6ffb@github.com> On Tue, 3 Oct 2023 16:38:44 GMT, Martin Doerr wrote: > > > `for (index = mcnt; ...` > > > > > > Shouldn't that be `for (index = mcnt -1; ...` as index < mcnt? > > I'm using pre-decrement, so the 1st iteration starts with mcnt-1. Ugghh! a side-effect in the condition check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1746080248 From adeel.iqbal at hotmail.com Wed Oct 4 06:40:38 2023 From: adeel.iqbal at hotmail.com (Adeel Iqbal) Date: Wed, 4 Oct 2023 06:40:38 +0000 Subject: Merge Operation of iconst_0 and istore_0 Message-ID: hi all, kindly ignore my english and technical writing please. my project is to merge the operation of above mentioned two bytecode instructions and form a new superoperator or you may say superinstruction in openjdk. I'm unable to understand which files need to be modified for this operation, can you please guide me in brief or send me some link / reference for any similar operation details. Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Wed Oct 4 07:16:30 2023 From: david.holmes at oracle.com (David Holmes) Date: Wed, 4 Oct 2023 17:16:30 +1000 Subject: Merge Operation of iconst_0 and istore_0 In-Reply-To: References: Message-ID: <6d5c9d1e-5348-491d-a25a-458b2ea2401b@oracle.com> On 4/10/2023 4:40 pm, Adeel Iqbal wrote: > hi all, > kindly ignore my english and technical writing please. > my project is to merge the operation of above mentioned two bytecode > instructions and form a new superoperator or you may say > superinstruction in openjdk. > I'm unable to understand which files need to be modified for this > operation, can you please guide me in brief or send me some link / > reference for any similar operation details. Sorry this sounds like a "homework" problem. These mailing lists are not here to help with that. Regards, David > Thanks in advance From yzheng at openjdk.org Wed Oct 4 07:37:03 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 4 Oct 2023 07:37:03 GMT Subject: RFR: 8317452: [JVMCI] Export symbols used by lightweight locking to JVMCI compilers. [v2] In-Reply-To: References: Message-ID: <6EPEOGNeJjBWLU7zk3gosuHv3ZCsnWlXPO1g7JKyByk=.2322e29a-a265-463f-8883-9f1a37360faf@github.com> > Export JavaThread::_lock_stack, LockStack::_top, LockStack::_end_offset, and ObjectMonitor::ANONYMOUS_OWNER to JVMCI compilers. Deprecate JVMCIUseFastLocking Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: remove JVMCIUseFastLocking. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16032/files - new: https://git.openjdk.org/jdk/pull/16032/files/5f37b12a..ed5e31a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16032&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16032&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16032/head:pull/16032 PR: https://git.openjdk.org/jdk/pull/16032 From dnsimon at openjdk.org Wed Oct 4 07:37:32 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 4 Oct 2023 07:37:32 GMT Subject: RFR: 8317452: [JVMCI] Export symbols used by lightweight locking to JVMCI compilers. [v2] In-Reply-To: <6EPEOGNeJjBWLU7zk3gosuHv3ZCsnWlXPO1g7JKyByk=.2322e29a-a265-463f-8883-9f1a37360faf@github.com> References: <6EPEOGNeJjBWLU7zk3gosuHv3ZCsnWlXPO1g7JKyByk=.2322e29a-a265-463f-8883-9f1a37360faf@github.com> Message-ID: On Wed, 4 Oct 2023 07:37:03 GMT, Yudi Zheng wrote: >> Export JavaThread::_lock_stack, LockStack::_top, LockStack::_end_offset, and ObjectMonitor::ANONYMOUS_OWNER to JVMCI compilers. Deprecate JVMCIUseFastLocking > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > remove JVMCIUseFastLocking. src/hotspot/share/jvmci/jvmci_globals.hpp line 130: > 128: "Exclude JVMCI compiler threads from benchmark counters") \ > 129: \ > 130: develop(bool, JVMCIUseFastLocking, true, \ You should just delete this option - JVMCI is clearly marked as experimental. What you have in this PR is not proper deprecation. What if someone specifies `-XX:-JVMCIUseFastLocking`? They will get no warning or VM exit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16032#discussion_r1345330993 From yzheng at openjdk.org Wed Oct 4 07:46:35 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 4 Oct 2023 07:46:35 GMT Subject: RFR: 8317452: [JVMCI] Export symbols used by lightweight locking to JVMCI compilers. [v2] In-Reply-To: References: <6EPEOGNeJjBWLU7zk3gosuHv3ZCsnWlXPO1g7JKyByk=.2322e29a-a265-463f-8883-9f1a37360faf@github.com> Message-ID: On Wed, 4 Oct 2023 07:34:48 GMT, Doug Simon wrote: >> Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> remove JVMCIUseFastLocking. > > src/hotspot/share/jvmci/jvmci_globals.hpp line 130: > >> 128: "Exclude JVMCI compiler threads from benchmark counters") \ >> 129: \ >> 130: develop(bool, JVMCIUseFastLocking, true, \ > > You should just delete this option - JVMCI is clearly marked as experimental. > What you have in this PR is not proper deprecation. What if someone specifies `-XX:-JVMCIUseFastLocking`? They will get no warning or VM exit. Indeed. I have deleted the option. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16032#discussion_r1345341668 From stuefe at openjdk.org Wed Oct 4 08:28:38 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 4 Oct 2023 08:28:38 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v5] In-Reply-To: References: Message-ID: > Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. > > Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. > > > 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 > 8b7b69: 0f b6 00 movzbl (%rax),%eax > 8b7b6c: 84 c0 test %al,%al > 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> > 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi > 8b7b7e: 8b 0a mov (%rdx),%ecx > 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> > 8b7b87: 48 d3 e7 shl %cl,%rdi > 8b7b8a: 48 03 3a add (%rdx),%rdi > > > Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. > > > 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> > 8ba309: 48 8b 08 mov (%rax),%rcx > 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? > 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi > 8ba318: 48 d3 e7 shl %cl,%rdi # shift > 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base > 8ba31e: 48 01 cf add %rcx,%rdi # add base > 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx > > --- > > Performance measurements: > > G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. > > I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. > > --- > > Future extensions: > > This patch uses the fact that the encoding base is aligned to metaspace reserve alignment (16 Mb). We only use 16 of those 24 bits of alignment shadow and could us... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ - APH feedback - Merge branch 'master' into optimize-narrow-klass-decoding-in-c++ - fix -UseCCP case - use 16 bit alignment - with raw bit ops ------------- Changes: https://git.openjdk.org/jdk/pull/15389/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=04 Stats: 63 lines in 3 files changed: 37 ins; 13 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/15389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15389/head:pull/15389 PR: https://git.openjdk.org/jdk/pull/15389 From stuefe at openjdk.org Wed Oct 4 08:28:40 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 4 Oct 2023 08:28:40 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v4] In-Reply-To: References: Message-ID: On Mon, 4 Sep 2023 09:28:10 GMT, Andrew Haley wrote: > So, I was wondering if there is there some reason to do all this manually? It looks like an obvious candidate for bitfields. The immediate reason is that a static class member cannot be a bitfield. Another reason is that bitfields are not used much in hotspot, manual bitfiddling is the way things are done (see for instance markword element handling). The only cases where bitfields are used seem to be in interface headers used from outside the hotspot, e.g. jmm.hpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15389#issuecomment-1746379706 From dnsimon at openjdk.org Wed Oct 4 08:43:54 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 4 Oct 2023 08:43:54 GMT Subject: RFR: 8317452: [JVMCI] Export symbols used by lightweight locking to JVMCI compilers. [v2] In-Reply-To: <6EPEOGNeJjBWLU7zk3gosuHv3ZCsnWlXPO1g7JKyByk=.2322e29a-a265-463f-8883-9f1a37360faf@github.com> References: <6EPEOGNeJjBWLU7zk3gosuHv3ZCsnWlXPO1g7JKyByk=.2322e29a-a265-463f-8883-9f1a37360faf@github.com> Message-ID: On Wed, 4 Oct 2023 07:37:03 GMT, Yudi Zheng wrote: >> Export JavaThread::_lock_stack, LockStack::_top, LockStack::_end_offset, and ObjectMonitor::ANONYMOUS_OWNER to JVMCI compilers. Deprecate JVMCIUseFastLocking > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > remove JVMCIUseFastLocking. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16032#pullrequestreview-1656987084 From pli at openjdk.org Wed Oct 4 08:46:05 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 4 Oct 2023 08:46:05 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Mon, 4 Sep 2023 01:26:59 GMT, Pengfei Li wrote: >> Thanks @vnkozlov and @eme64, I just created https://github.com/openjdk/jdk/pull/14824 for the legacy code cleanup. > >> @pfustc This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! > > This pull request is not dead. I'm currently doing some refactoring and part of this work in separate pull requests. I will come back after those. > @pfustc feel free to ping me when I should re-review! Are you still working on some refactorings? What are your plans? I'm temporarily moving to some other projects, but can still follow up JDK patches in part time. My colleague @fg1417 will proceed with the superword refactoring work (probably in separate patches) then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1746411483 From aph at openjdk.org Wed Oct 4 08:47:39 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 4 Oct 2023 08:47:39 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: <5AhJwuTy6zJ4nhANPaeUYTZdQBuc96LefywHhhGA0Rc=.0ad39664-4961-4ea2-8d82-30d801207fa9@github.com> Message-ID: On Tue, 3 Oct 2023 19:24:06 GMT, Patricio Chilano Mateo wrote: > Yes, it's also the same code we have in frame::sender_raw(). I'm not sure how we can get to that native frame case when we use frame::sender() though. So if we start from the last Java frame we shouldn't find a native frame in the chain. And when we start from os::current_frame() we use VMError::print_native_stack() which uses os::get_sender_for_C_frame() for native frames. The only case I see is the debug utility ps(), which in case there is no last Java frame, creates a frame with os::current_frame() and then uses frame::sender(). So seems either we should change that last line in sender_raw() to be os::get_sender_for_C_frame() or replace it with an assert(false, "") and fix ps(). Yes, I see. OK for now, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15972#discussion_r1345433837 From eosterlund at openjdk.org Wed Oct 4 09:52:37 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 4 Oct 2023 09:52:37 GMT Subject: RFR: 8316523: Relativize esp in interpreter frames (PowerPC only) [v2] In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 09:25:06 GMT, Fredrik Bredberg wrote: >> Relativize esp (Expression Stack Pointer on PowerPC) in interpreter frames. >> >> By changing the "esp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. >> >> This subtask only handles "esp" on PowerPC. The relativization of other interpreter frame members are handled in other subtasks to [JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296). >> >> It has been sanity tested onr PowerPC using Qemu. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated after review. Seems reasonable. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15999#pullrequestreview-1657123557 From kbarrett at openjdk.org Wed Oct 4 10:00:38 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 4 Oct 2023 10:00:38 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Wed, 4 Oct 2023 03:36:50 GMT, Kim Barrett wrote: >> Please review this new facility, providing a general mechanism for intrusive >> doubly-linked lists. A class supports inclusion in a list by having an >> IntrusiveListEntry member, and providing structured information about how to >> access that member. A class supports inclusion in multiple lists by having >> multiple IntrusiveListEntry members, with different lists specified to use >> different members. >> >> The IntrusiveList class template provides the list management. It is modelled >> on bidirectional containers such as std::list and boost::intrusive::list, >> providing many of the expected member types and functions. (Note that the >> member types use the Standard's naming conventions.) (Not all standard >> container requirements are met; some operations are not presently supported >> because they haven't been needed yet.) This includes iteration support using >> (mostly) standard-conforming iterator types (they are presently missing >> iterator_category member types, pending being able to include so we >> can use std::bidirectional_iterator_tag). >> >> This change only provides the new facility, and doesn't include any uses of >> it. It is intended to replace the 4-5 (or maybe more) competing intrusive >> doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of >> those alterantives, this proposal provides a suite of unit tests. >> >> An example of a place that I think might benefit from this is G1's region >> handling. There are various places where G1 iterates over all regions in order >> to do something with those which satisfy some property (humongous regions, >> regions in the collection set, &etc). If it were trivial to create new region >> sublists (and this facility makes that easy), some of these could be turned >> into direct iteration over only the regions of interest. >> >> Some specific points to consider when reviewing this proposal: >> >> (1) This proposal follows Standard Library API conventions, which differ from >> HotSpot in various ways. >> >> (1a) Lists and iterators provide various type members, with names per the >> Standard Library. There has been discussion of using some parts of the >> Standard Library eventually, in which case this would be important. But for >> now some of the naming choices are atypical for HotSpot. >> >> (1b) Some of the function signatures follow the Standard Library APIs even >> though the reasons for that form might not apply to HotSpot. For example, the >> list pop operations don't return the removed... > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > add IntrusiveListEntry::is_attached() Regarding the mechanism for accessing the entry of an element, I'm now thinking a function-based mechanism (like NonblockingQueue and LockFreeStack) is better than the pointer-to-data-member mechanism currently used here. The benefit of a function-based mechanism is that it doesn't require the element type to be complete at the point of list declaration. It also avoids the need for an MSVC workaround. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15896#issuecomment-1746538711 From dnsimon at openjdk.org Wed Oct 4 10:18:35 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 4 Oct 2023 10:18:35 GMT Subject: RFR: 8317452: [JVMCI] Export symbols used by lightweight locking to JVMCI compilers. [v2] In-Reply-To: <6EPEOGNeJjBWLU7zk3gosuHv3ZCsnWlXPO1g7JKyByk=.2322e29a-a265-463f-8883-9f1a37360faf@github.com> References: <6EPEOGNeJjBWLU7zk3gosuHv3ZCsnWlXPO1g7JKyByk=.2322e29a-a265-463f-8883-9f1a37360faf@github.com> Message-ID: <_Oo1jRuAOZav1WHjwo4_EiHn79Cb-VR1gmKge8btjrs=.b61b4e1c-7ccd-4ab1-b8ce-13204446b73e@github.com> On Wed, 4 Oct 2023 07:37:03 GMT, Yudi Zheng wrote: >> Export JavaThread::_lock_stack, LockStack::_top, LockStack::_end_offset, and ObjectMonitor::ANONYMOUS_OWNER to JVMCI compilers. Delete JVMCIUseFastLocking option > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > remove JVMCIUseFastLocking. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16032#pullrequestreview-1657172521 From yzheng at openjdk.org Wed Oct 4 10:21:48 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 4 Oct 2023 10:21:48 GMT Subject: Integrated: 8317452: [JVMCI] Export symbols used by lightweight locking to JVMCI compilers. In-Reply-To: References: Message-ID: On Tue, 3 Oct 2023 19:42:28 GMT, Yudi Zheng wrote: > Export JavaThread::_lock_stack, LockStack::_top, LockStack::_end_offset, and ObjectMonitor::ANONYMOUS_OWNER to JVMCI compilers. Delete JVMCIUseFastLocking option This pull request has now been integrated. Changeset: 9718f490 Author: Yudi Zheng Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/9718f490fb76f6712ac8f9c7f5248ca10bf83e6f Stats: 16 lines in 5 files changed: 11 ins; 5 del; 0 mod 8317452: [JVMCI] Export symbols used by lightweight locking to JVMCI compilers. Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/16032 From stuefe at openjdk.org Wed Oct 4 10:49:35 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 4 Oct 2023 10:49:35 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Wed, 4 Oct 2023 03:36:50 GMT, Kim Barrett wrote: >> Please review this new facility, providing a general mechanism for intrusive >> doubly-linked lists. A class supports inclusion in a list by having an >> IntrusiveListEntry member, and providing structured information about how to >> access that member. A class supports inclusion in multiple lists by having >> multiple IntrusiveListEntry members, with different lists specified to use >> different members. >> >> The IntrusiveList class template provides the list management. It is modelled >> on bidirectional containers such as std::list and boost::intrusive::list, >> providing many of the expected member types and functions. (Note that the >> member types use the Standard's naming conventions.) (Not all standard >> container requirements are met; some operations are not presently supported >> because they haven't been needed yet.) This includes iteration support using >> (mostly) standard-conforming iterator types (they are presently missing >> iterator_category member types, pending being able to include so we >> can use std::bidirectional_iterator_tag). >> >> This change only provides the new facility, and doesn't include any uses of >> it. It is intended to replace the 4-5 (or maybe more) competing intrusive >> doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of >> those alterantives, this proposal provides a suite of unit tests. >> >> An example of a place that I think might benefit from this is G1's region >> handling. There are various places where G1 iterates over all regions in order >> to do something with those which satisfy some property (humongous regions, >> regions in the collection set, &etc). If it were trivial to create new region >> sublists (and this facility makes that easy), some of these could be turned >> into direct iteration over only the regions of interest. >> >> Some specific points to consider when reviewing this proposal: >> >> (1) This proposal follows Standard Library API conventions, which differ from >> HotSpot in various ways. >> >> (1a) Lists and iterators provide various type members, with names per the >> Standard Library. There has been discussion of using some parts of the >> Standard Library eventually, in which case this would be important. But for >> now some of the naming choices are atypical for HotSpot. >> >> (1b) Some of the function signatures follow the Standard Library APIs even >> though the reasons for that form might not apply to HotSpot. For example, the >> list pop operations don't return the removed... > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > add IntrusiveListEntry::is_attached() Cursory glance. The list hook element costs 16 bytes on 64-bit. Would it be possible to get a single-linked variant? For many (most?) cases, traversing backward or random access deletion is not needed. src/hotspot/share/utilities/intrusiveList.hpp line 69: > 67: * > 68: * * entry_member is a pointer to class member referring to the > 69: * IntrusiveListEntry subobject of T used by this list. nit, "subobject" confused me a little. Maybe "element" or "member" ? ------------- PR Review: https://git.openjdk.org/jdk/pull/15896#pullrequestreview-1657025470 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1345453689 From epeter at openjdk.org Wed Oct 4 11:45:59 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Oct 2023 11:45:59 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: <5DqcCtzduxkn1rkmtAsYCjRel83ptjDpip2pVxEOdJw=.84a67dc2-050c-464e-9484-d7a0247dde31@github.com> On Wed, 4 Oct 2023 08:42:55 GMT, Pengfei Li wrote: >>> @pfustc This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! >> >> This pull request is not dead. I'm currently doing some refactoring and part of this work in separate pull requests. I will come back after those. > >> @pfustc feel free to ping me when I should re-review! Are you still working on some refactorings? What are your plans? > > I'm temporarily moving to some other projects, but can still follow up JDK patches in part time. My colleague @fg1417 will proceed with the superword refactoring work (probably in separate patches) then. @pfustc @fg1417 Ok, perfect. But please notify me before you start a refactoring. I am considering doing some of my own in the next months. It would be nice to avoid doing work twice ;) If you want, we can try to coordinate over this "Umbrella" RFE: [JDK-8317424](https://bugs.openjdk.org/browse/JDK-8317424) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1746704197 From ayang at openjdk.org Wed Oct 4 11:52:41 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 4 Oct 2023 11:52:41 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v12] In-Reply-To: <50VtEqmFaxK4NnbXqU54rQW8R1YrGDa6HukQOuniupE=.5a5365f1-546a-4c48-a763-9248346c6593@github.com> References: <50VtEqmFaxK4NnbXqU54rQW8R1YrGDa6HukQOuniupE=.5a5365f1-546a-4c48-a763-9248346c6593@github.com> Message-ID: On Thu, 28 Sep 2023 07:41:18 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove stripe size adaptations and cache potentially expensive start array queries Performed additional performance testing on the latest revision of this pull request and https://github.com/openjdk/jdk/compare/master...albertnetymk:jdk:pgc-precise-obj-arr?expand=1 (made sure they were on top of the same master commit) Couldn't identify significant differences when running micro benchmarks from the JBS ticket with different gc-threads; implemented various tweaks but the distinction between the two approaches remains mostly marginal. Both methods exhibit substantial improvements over the master, as demonstrated earlier. No performance difference observed in pjbb2005 between the master, this pull request, and shadow-card-table. (I had difficulty in running `timefold`, so I asked Thomas for help about it.) The cost of malloc + memset for the shadow-card-table is ~0.26ms per 1G of old-gen (each card being 512 bytes) (raw data: 0.553169 ms for 4395946 cards). Since the shadow-card-table approach doesn't result in any noticeable regression, offers better scalability for large-array-objects, and comes with the lowest implementation complexity, I am inclined to settle for the shadow-card-table approach for now and explore more sophisticated optimizations later on. What are others' thoughts on this direction? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1746714371 From rrich at openjdk.org Wed Oct 4 11:59:39 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 4 Oct 2023 11:59:39 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v12] In-Reply-To: <50VtEqmFaxK4NnbXqU54rQW8R1YrGDa6HukQOuniupE=.5a5365f1-546a-4c48-a763-9248346c6593@github.com> References: <50VtEqmFaxK4NnbXqU54rQW8R1YrGDa6HukQOuniupE=.5a5365f1-546a-4c48-a763-9248346c6593@github.com> Message-ID: On Thu, 28 Sep 2023 07:41:18 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove stripe size adaptations and cache potentially expensive start array queries We've had a public holiday yesterday. I'm still working on the version without shadow card table. It is more complex, and finding the issues is time consuming but I'd like to finish the work on it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1746726305 From stuefe at openjdk.org Wed Oct 4 13:50:36 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 4 Oct 2023 13:50:36 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 07:37:26 GMT, Liming Liu wrote: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Hi @limingliu-ampere, good catch. Just to be sure, does this work with concurrent writes to the same pages? As in, it will not break https://bugs.openjdk.org/browse/JDK-8272807 ? Like @kimbarrett, I think this needs a better regression test. Ideally (and probably not that difficult to pull off): start the VM with AlwaysPreTouch, `-Xlog:pagesize`, and +UseTHP. Then, scan smaps to check that the heap is not splintered. Please see https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/runtime/os/TestTracePageSizes.java . It may be that you can just extend that test to include running with UseTHP. I also think a small gtest would be good that tests that a pre-populated page does not lose its content when os::pretouch_memory is called. For examples, see https://github.com/openjdk/jdk/blob/master/test/hotspot/gtest/runtime/test_os.cpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1746915692 From stuefe at openjdk.org Wed Oct 4 14:10:42 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 4 Oct 2023 14:10:42 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly In-Reply-To: References: Message-ID: On Mon, 18 Sep 2023 07:37:26 GMT, Liming Liu wrote: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Side note, does anyone know why we pretouch memory for *explicit* large pages? I would have thought that memory is already online and as "live" as it can get once it is mmapped. src/hotspot/os/linux/os_linux.cpp line 2839: > 2837: #ifndef MADV_POPULATE_WRITE > 2838: #define MADV_POPULATE_WRITE 23 > 2839: #endif Suggestion (we should have done this for other cases too) as a stupid sanity check: #ifndef MADV_POPULATE_WRITE #define MADV_POPULATE_WRITE 23 #else static_assert(MADV_POPULATE_WRITE == 23); #endif src/hotspot/os/linux/os_linux.cpp line 2902: > 2900: p2i(first), p2i(last), page_size, > 2901: os::strerror(err), err); > 2902: } I don't think this breakout is necessary, I'd do it inline in pd_pretouch. I'm unsure about `warning` (this will print warnings by default) here. When exactly would this fail? Would UL logging better, or a native OOM error? If I understand the manpage correctly, one possible error scenario is when this is called for write protected memory, which would be a case for assert. src/hotspot/os/linux/os_linux.cpp line 2905: > 2903: > 2904: void os::pd_pretouch_memory(void *first, void *last, size_t page_size) { > 2905: size_t len = static_cast(last) - static_cast(first) + page_size; Please use `pointer_delta()` and make len const. src/hotspot/os/linux/os_linux.cpp line 2911: > 2909: if (::madvise(first, len, MADV_POPULATE_WRITE) == -1) { > 2910: int err = errno; > 2911: if (err == EINVAL) { // Not supported Would be nice to avoid repeated syscalls to madvise if this fails once; no reason to try again, then. src/hotspot/share/runtime/os.cpp line 2108: > 2106: // granularity, so we can touch anywhere in a page. Touch at the > 2107: // beginning of each page to simplify iteration. > 2108: void* first = align_down(start, page_size); minor nit, since you are touching this, could you make it const too? (void* const) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1746954033 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1345784162 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1345785187 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1345796964 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1345849056 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1345853309 From stuefe at openjdk.org Wed Oct 4 14:10:44 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 4 Oct 2023 14:10:44 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly In-Reply-To: <0oZ06FXPfZ6SEJBdPRrUEiW3tqR7hTWkXUeXdzqVyNo=.e782f692-9152-44db-9847-7292e4afa7a0@github.com> References: <0oZ06FXPfZ6SEJBdPRrUEiW3tqR7hTWkXUeXdzqVyNo=.e782f692-9152-44db-9847-7292e4afa7a0@github.com> Message-ID: On Fri, 29 Sep 2023 07:07:00 GMT, Kim Barrett wrote: >> As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). >> >> Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: >> >> >> >> >> >> >> >> >> >> >> >>
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
> > src/hotspot/share/gc/shared/pretouchTask.cpp line 75: > >> 73: // initially always use small pages. >> 74: page_size = UseTransparentHugePages ? (size_t)os::vm_page_size() : page_size; >> 75: #endif > > I never liked this, so happy to see it gone. It was also the wrong place for this fix since it left out naked calls to os::pretouch_memory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1345851483 From vkempik at openjdk.org Wed Oct 4 16:41:22 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 4 Oct 2023 16:41:22 GMT Subject: RFR: 8295382: Implement SHA-256 Intrinsic on RISC-V [v2] In-Reply-To: References: <1JWd-CDS_jpIDtfu7HJAVmvViShKzVTrCOxDVBZ9GSo=.904a8e56-794c-43d6-8448-de2a1a856f33@github.com> Message-ID: On Mon, 20 Feb 2023 10:53:18 GMT, Ludovic Henry wrote: >> This has been tested with patches currently being submitted to QEMU to add support for Zvkb and Zvknha extensions. >> >> The documentation for the Vector Crypto extension's instructions is available at https://github.com/riscv/riscv-crypto/tree/master/doc/vector/insns > > Ludovic Henry has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into dev/ludovic/vector-crypto-sha > - 8295382: Implement SHA-256 Intrinsic on RISC-V Looks like the extension was ratified recently, Ludovic, do you plan to return to these intrisics ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12208#issuecomment-1747095991 From jjoo at openjdk.org Wed Oct 4 21:23:26 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Wed, 4 Oct 2023 21:23:26 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v26] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Remove header and fix long to jlong ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/9b5772fb..2807c191 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=24-25 Stats: 2 lines in 2 files changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From lmesnik at openjdk.org Wed Oct 4 22:04:11 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 4 Oct 2023 22:04:11 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 23:11:01 GMT, Serguei Spitsyn wrote: > The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. > The fix includes: > - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec > - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask > - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function > > The fix also includes a couple of minor unification tweaks: > - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` which have a little bit more optimized check for the `JVMTI_PHASE_PRIMORDIAL`. > - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` > > Testing: ran mach5 tiers 1-6. All tests are passed. Changes requested by lmesnik (Reviewer). src/hotspot/share/prims/jvmtiExport.cpp line 1552: > 1550: JvmtiEnvThreadStateIterator it(state); > 1551: for (JvmtiEnvThreadState* ets = it.first(); ets != nullptr; ets = it.next(ets)) { > 1552: JvmtiEnv *env = ets->get_env(); This change as well as renaming cur_thread are not related to the main issue. It would be better to separate them. Easier to track and backport if needed. They are mentioned in PR but not in jira bug, hard to find the reason without GitHub. Might be better to copy them in the bug if you want to keep them. src/hotspot/share/prims/jvmtiExport.cpp line 1582: > 1580: // Do not post virtual thread start event for hidden java thread. > 1581: if (JvmtiEventController::is_enabled(JVMTI_EVENT_VIRTUAL_THREAD_START) && > 1582: !thread->is_hidden_from_external_view()) { Do we need this check? I'm not sure that JavaThread executing a virtual thread. Might be better to replace it with assertion? ------------- PR Review: https://git.openjdk.org/jdk/pull/16019#pullrequestreview-1658552952 PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1346517489 PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1346513930 From jjoo at openjdk.org Thu Oct 5 03:00:36 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 5 Oct 2023 03:00:36 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v27] In-Reply-To: References: Message-ID: <0AHhZD1JncFwC7z5rp1uufF4FAMrKR8mQXdDuwKNL4s=.02114c12-5a6b-4d3e-8b7a-421bb3eb8d47@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: add comment and change if defined to ifdef ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/2807c191..590df03b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=25-26 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Thu Oct 5 03:07:24 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 5 Oct 2023 03:07:24 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v27] In-Reply-To: <0AHhZD1JncFwC7z5rp1uufF4FAMrKR8mQXdDuwKNL4s=.02114c12-5a6b-4d3e-8b7a-421bb3eb8d47@github.com> References: <0AHhZD1JncFwC7z5rp1uufF4FAMrKR8mQXdDuwKNL4s=.02114c12-5a6b-4d3e-8b7a-421bb3eb8d47@github.com> Message-ID: On Thu, 5 Oct 2023 03:00:36 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > add comment and change if defined to ifdef Resolved comments and sanity checks pass on all builds: https://github.com/jjoo172/jdk/actions/runs/6411637099 I believe this PR should be RFR once again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1747965305 From svkamath at openjdk.org Thu Oct 5 05:13:19 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 5 Oct 2023 05:13:19 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v3] In-Reply-To: References: Message-ID: > Hi All, > I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. > > Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: > > |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup > |-------------|------------|---------------|------------------|-----------| > |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 > full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 > small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 > small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 > full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 > full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 > small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 > small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 > full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 > small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 > small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 > full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 > full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 > small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 > small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 > full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 > small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 > small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 > full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 > full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 > small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 > small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 > full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 > small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 > small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 > full.AESGCMBench.decryptMultiPart | 65536 | 42649.816 | 47591.587 |1.11 > full.AESGCMBe... Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Reorganized code as per comments, added new instruction addb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15410/files - new: https://git.openjdk.org/jdk/pull/15410/files/2727c199..c92f98ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=01-02 Stats: 663 lines in 5 files changed: 32 ins; 507 del; 124 mod Patch: https://git.openjdk.org/jdk/pull/15410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15410/head:pull/15410 PR: https://git.openjdk.org/jdk/pull/15410 From stuefe at openjdk.org Thu Oct 5 06:06:01 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 5 Oct 2023 06:06:01 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v6] In-Reply-To: References: Message-ID: > Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. > > Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. > > > 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 > 8b7b69: 0f b6 00 movzbl (%rax),%eax > 8b7b6c: 84 c0 test %al,%al > 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> > 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi > 8b7b7e: 8b 0a mov (%rdx),%ecx > 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> > 8b7b87: 48 d3 e7 shl %cl,%rdi > 8b7b8a: 48 03 3a add (%rdx),%rdi > > > Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. > > > 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> > 8ba309: 48 8b 08 mov (%rax),%rcx > 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? > 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi > 8ba318: 48 d3 e7 shl %cl,%rdi # shift > 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base > 8ba31e: 48 01 cf add %rcx,%rdi # add base > 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx > > --- > > Performance measurements: > > G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. > > I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. > > --- > > Future extensions: > > This patch uses the fact that the encoding base is aligned to metaspace reserve alignment (16 Mb). We only use 16 of those 24 bits of alignment shadow and could us... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ - APH feedback - Merge branch 'master' into optimize-narrow-klass-decoding-in-c++ - fix -UseCCP case - use 16 bit alignment - with raw bit ops ------------- Changes: https://git.openjdk.org/jdk/pull/15389/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=05 Stats: 63 lines in 3 files changed: 37 ins; 13 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/15389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15389/head:pull/15389 PR: https://git.openjdk.org/jdk/pull/15389 From djelinski at openjdk.org Thu Oct 5 07:15:16 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 5 Oct 2023 07:15:16 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v3] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 05:13:19 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Reorganized code as per comments, added new instruction addb src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 353: > 351: > 352: // Save rbp and rsp > 353: __ push(rbp); Why do you push rbp here? `__ enter` above is an alias for `push rbp; mov rbp, rsp` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1346906924 From luhenry at openjdk.org Thu Oct 5 08:27:29 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 5 Oct 2023 08:27:29 GMT Subject: RFR: 8295382: Implement SHA-256 Intrinsic on RISC-V [v2] In-Reply-To: References: <1JWd-CDS_jpIDtfu7HJAVmvViShKzVTrCOxDVBZ9GSo=.904a8e56-794c-43d6-8448-de2a1a856f33@github.com> Message-ID: <6qpEKQjp9rWUZz2dB5eII4lR_D3JfvCRI0V2bgX6alw=.469de856-290f-4bd2-aef5-de44d4170188@github.com> On Wed, 4 Oct 2023 15:12:18 GMT, Vladimir Kempik wrote: >> Ludovic Henry has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'master' into dev/ludovic/vector-crypto-sha >> - 8295382: Implement SHA-256 Intrinsic on RISC-V > > Looks like the extension was ratified recently, Ludovic, do you plan to return to these intrisics ? @VladimirKempik yes, I saw that as well. Planning to get back to it by end of month (this one and sha512). ------------- PR Comment: https://git.openjdk.org/jdk/pull/12208#issuecomment-1748361733 From fbredberg at openjdk.org Thu Oct 5 08:42:12 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 5 Oct 2023 08:42:12 GMT Subject: RFR: 8316523: Relativize esp in interpreter frames (PowerPC only) In-Reply-To: <2QKDcwnCG3vl_Hqy7YyLP-5PRLJEHuga3Sf6LulmxKE=.139f5e77-10a5-498b-8c71-91b8fdafd761@github.com> References: <2QKDcwnCG3vl_Hqy7YyLP-5PRLJEHuga3Sf6LulmxKE=.139f5e77-10a5-498b-8c71-91b8fdafd761@github.com> Message-ID: On Sun, 1 Oct 2023 14:41:47 GMT, Martin Doerr wrote: >> Relativize esp (Expression Stack Pointer on PowerPC) in interpreter frames. >> >> By changing the "esp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. >> >> This subtask only handles "esp" on PowerPC. The relativization of other interpreter frame members are handled in other subtasks to [JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296). >> >> It has been sanity tested onr PowerPC using Qemu. > > Thanks for taking care of PPC64! Looks correct. Seems like `relativize_one` and `derelativize_one` are no longer used and should better get removed? Hi @TheRealMDoerr, do you want to wait some more for @reinrich to review this PR, or do you think it's ready to be integrated? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15999#issuecomment-1748382012 From mdoerr at openjdk.org Thu Oct 5 08:42:14 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Oct 2023 08:42:14 GMT Subject: RFR: 8316523: Relativize esp in interpreter frames (PowerPC only) [v2] In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 09:25:06 GMT, Fredrik Bredberg wrote: >> Relativize esp (Expression Stack Pointer on PowerPC) in interpreter frames. >> >> By changing the "esp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. >> >> This subtask only handles "esp" on PowerPC. The relativization of other interpreter frame members are handled in other subtasks to [JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296). >> >> It has been sanity tested onr PowerPC using Qemu. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated after review. Ship it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15999#issuecomment-1748385411 From rrich at openjdk.org Thu Oct 5 08:45:10 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 5 Oct 2023 08:45:10 GMT Subject: RFR: 8316523: Relativize esp in interpreter frames (PowerPC only) [v2] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 08:39:03 GMT, Martin Doerr wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated after review. > > Ship it! > Hi @TheRealMDoerr, do you want to wait some more for @reinrich to review this PR, or do you think it's ready to be integrated? Thanks for doing this! No need to wait for me after Martin has given his thumbs-up. Cheers, Richard. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15999#issuecomment-1748390606 From fbredberg at openjdk.org Thu Oct 5 09:18:12 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 5 Oct 2023 09:18:12 GMT Subject: RFR: 8316523: Relativize esp in interpreter frames (PowerPC only) [v2] In-Reply-To: References: Message-ID: <1jPrc1Nf0HQ9nrLCeTMES5poVCaCNJi45K8rHcckzKA=.377aa43a-b5bb-4449-b002-7a7fc3f4248f@github.com> On Mon, 2 Oct 2023 09:25:06 GMT, Fredrik Bredberg wrote: >> Relativize esp (Expression Stack Pointer on PowerPC) in interpreter frames. >> >> By changing the "esp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. >> >> This subtask only handles "esp" on PowerPC. The relativization of other interpreter frame members are handled in other subtasks to [JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296). >> >> It has been sanity tested onr PowerPC using Qemu. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated after review. About taking care of PPC64. If I get something to work on x86 and aarch64, it's often not too much work to get it to run on the other platforms (like ppc), and by doing so I get a much better understanding of the general concepts, since the concepts may be more thoroughly described in some cpu port than in others. And, if you're new to a large code base, any method that makes you understand the concepts better, is a good thing. As usual, I still need a sponsor to get the show on the road. Cheers guys, Fredrik ------------- PR Comment: https://git.openjdk.org/jdk/pull/15999#issuecomment-1748455026 From fbredberg at openjdk.org Thu Oct 5 10:17:21 2023 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 5 Oct 2023 10:17:21 GMT Subject: Integrated: 8316523: Relativize esp in interpreter frames (PowerPC only) In-Reply-To: References: Message-ID: On Sat, 30 Sep 2023 11:57:49 GMT, Fredrik Bredberg wrote: > Relativize esp (Expression Stack Pointer on PowerPC) in interpreter frames. > > By changing the "esp" member in interpreter frames from being an absolute address into an offset that is relative to the frame pointer, we don't need to change the value as we freeze and thaw frames of virtual threads. This is since we might freeze and thaw from and to different worker threads, so the absolute address to locals might change, but the offset from the frame pointer will be constant. > > This subtask only handles "esp" on PowerPC. The relativization of other interpreter frame members are handled in other subtasks to [JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296). > > It has been sanity tested onr PowerPC using Qemu. This pull request has now been integrated. Changeset: 42be2387 Author: Fredrik Bredberg Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/42be23877cb34055b630f576a6668ca2f46afe40 Stats: 35 lines in 4 files changed: 16 ins; 12 del; 7 mod 8316523: Relativize esp in interpreter frames (PowerPC only) Reviewed-by: mdoerr, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/15999 From rrich at openjdk.org Thu Oct 5 10:22:45 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 5 Oct 2023 10:22:45 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v13] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: - Split work strictly at stripe boundaries - Reset to master ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/50737dda..817b164c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=11-12 Stats: 505 lines in 8 files changed: 197 ins; 272 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Thu Oct 5 10:23:13 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 5 Oct 2023 10:23:13 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v12] In-Reply-To: <50VtEqmFaxK4NnbXqU54rQW8R1YrGDa6HukQOuniupE=.5a5365f1-546a-4c48-a763-9248346c6593@github.com> References: <50VtEqmFaxK4NnbXqU54rQW8R1YrGDa6HukQOuniupE=.5a5365f1-546a-4c48-a763-9248346c6593@github.com> Message-ID: On Thu, 28 Sep 2023 07:41:18 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove stripe size adaptations and cache potentially expensive start array queries With the last version the work is strictly limited to stripes. * Work partitioning happens at stripe boundaries splitting also non-array objects if necessary. * A worker thread accesses only card table entries corresponding to the stripe it's working on. * Object arrays are scanned precisely. * Non-object arrays are not scanned precisely. * The implementation is inspired by Albert's work (especially the `ObjStartCache` which prevents `ObjectStartArray` performance issues with very large arrays). It does not duplicate the card table though. Instead it copies imprecise card marks from (non-array) object start to the first card of stripes reached (see `PSCardTable::pre_scavenge`). * `ObjStartCache` is a class because I need to pass it by value to `find_first_clean_card`. I haven't seen a significant difference between this version, Albert's work and the baseline running the card_scan* tests (except for the improvement over the baseline of course). The new test [`card_scan_big_instances.java`](https://bugs.openjdk.org/secure/attachment/106702/card_scan_big_instances.java) fills the old generation with large (32K) non-array instances. It shows the costs of copying imprecise card marks or the complete card table: Baseline -------- $ H=64g; T=16 ; jdk-baseline/bin/java -Xms${H} -Xmx${H} -XX:+UseParallelGC -XX:ParallelGCThreads=$T -Xlog:gc=trace card_scan_big_instances 24 [0.005s][info][gc] Using Parallel BIG_INSTANCE_SIZE_BYTES:32768 (32K) bigInstancesCount:786432 [6.439s][trace][gc] GC(0) PSYoung generation size at maximum: 22369280K [6.439s][info ][gc] GC(0) Pause Young (Allocation Failure) 16384M->14095M(62805M) 2965.766ms ### System.gc [10.871s][trace][gc] GC(1) PSYoung generation size at maximum: 22369280K [10.871s][info ][gc] GC(1) Pause Young (System.gc()) 24926M->24600M(62805M) 2785.957ms [11.835s][info ][gc] GC(2) Pause Full (System.gc()) 24600M->24600M(62805M) 963.882ms ### System.gc done [14.442s][trace][gc] GC(3) PSYoung generation size at maximum: 22369280K [14.442s][info ][gc] GC(3) Pause Young (Allocation Failure) 40984M->24600M(62805M) 28.970ms [16.967s][trace][gc] GC(4) PSYoung generation size at maximum: 22369280K [16.967s][info ][gc] GC(4) Pause Young (Allocation Failure) 40984M->24600M(62805M) 30.074ms [19.490s][trace][gc] GC(5) PSYoung generation size at maximum: 22369280K [19.490s][info ][gc] GC(5) Pause Young (Allocation Failure) 40984M->24600M(62805M) 30.643ms Albert's first draft -------------------- $ H=64g; T=16 ; jdk-alb/bin/java -Xms${H} -Xmx${H} -XX:+UseParallelGC -XX:ParallelGCThreads=$T -Xlog:gc=trace card_scan_big_instances 24 [0.005s][info][gc] Using Parallel BIG_INSTANCE_SIZE_BYTES:32768 (32K) bigInstancesCount:786432 [6.410s][trace][gc] GC(0) PSYoung generation size at maximum: 22369280K [6.410s][info ][gc] GC(0) Pause Young (Allocation Failure) 16384M->14095M(62805M) 2957.403ms ### System.gc [10.734s][trace][gc] GC(1) PSYoung generation size at maximum: 22369280K [10.734s][info ][gc] GC(1) Pause Young (System.gc()) 24926M->24600M(62805M) 2656.551ms [11.713s][info ][gc] GC(2) Pause Full (System.gc()) 24600M->24600M(62805M) 978.694ms ### System.gc done [14.587s][trace][gc] GC(3) PSYoung generation size at maximum: 22369280K [14.587s][info ][gc] GC(3) Pause Young (Allocation Failure) 40984M->24600M(62805M) 74.870ms [17.132s][trace][gc] GC(4) PSYoung generation size at maximum: 22369280K [17.132s][info ][gc] GC(4) Pause Young (Allocation Failure) 40984M->24600M(62805M) 74.031ms [19.678s][trace][gc] GC(5) PSYoung generation size at maximum: 22369280K [19.678s][info ][gc] GC(5) Pause Young (Allocation Failure) 40984M->24600M(62805M) 71.357ms New --- $ H=64g; T=16 ; jdk-new/bin/java -Xms${H} -Xmx${H} -XX:+UseParallelGC -XX:ParallelGCThreads=$T -Xlog:gc=trace card_scan_big_instances 24 [0.006s][info][gc] Using Parallel BIG_INSTANCE_SIZE_BYTES:32768 (32K) bigInstancesCount:786432 [5.643s][trace][gc] GC(0) PSYoung generation size at maximum: 22369280K [5.643s][info ][gc] GC(0) Pause Young (Allocation Failure) 16384M->14095M(62805M) 2196.615ms ### System.gc [8.875s][trace][gc] GC(1) PSYoung generation size at maximum: 22369280K [8.875s][info ][gc] GC(1) Pause Young (System.gc()) 24926M->24601M(62805M) 1432.776ms [10.303s][info ][gc] GC(2) Pause Full (System.gc()) 24601M->24600M(62805M) 1428.142ms ### System.gc done [13.464s][trace][gc] GC(3) PSYoung generation size at maximum: 22369280K [13.464s][info ][gc] GC(3) Pause Young (Allocation Failure) 40984M->24600M(62805M) 106.833ms [15.929s][trace][gc] GC(4) PSYoung generation size at maximum: 22369280K [15.929s][info ][gc] GC(4) Pause Young (Allocation Failure) 40984M->24600M(62805M) 103.752ms [18.397s][trace][gc] GC(5) PSYoung generation size at maximum: 22369280K [18.397s][info ][gc] GC(5) Pause Young (Allocation Failure) 40984M->24600M(62805M) 106.153ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1748600398 From tschatzl at openjdk.org Thu Oct 5 10:42:10 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 5 Oct 2023 10:42:10 GMT Subject: RFR: 8317318: Serial: Change GenCollectedHeap to SerialHeap in whitebox [v2] In-Reply-To: References: Message-ID: On Sat, 30 Sep 2023 17:27:42 GMT, Albert Mingkun Yang wrote: >> Use more precise type for Serial GC. I also added a ` ShouldNotReachHere()` there, because using serial-heap when serial-gc is not used seems problematic. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > s1-prims lgtm ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15988#pullrequestreview-1659517772 From jsjolen at openjdk.org Thu Oct 5 10:54:18 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 5 Oct 2023 10:54:18 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Wed, 4 Oct 2023 03:36:50 GMT, Kim Barrett wrote: >> Please review this new facility, providing a general mechanism for intrusive >> doubly-linked lists. A class supports inclusion in a list by having an >> IntrusiveListEntry member, and providing structured information about how to >> access that member. A class supports inclusion in multiple lists by having >> multiple IntrusiveListEntry members, with different lists specified to use >> different members. >> >> The IntrusiveList class template provides the list management. It is modelled >> on bidirectional containers such as std::list and boost::intrusive::list, >> providing many of the expected member types and functions. (Note that the >> member types use the Standard's naming conventions.) (Not all standard >> container requirements are met; some operations are not presently supported >> because they haven't been needed yet.) This includes iteration support using >> (mostly) standard-conforming iterator types (they are presently missing >> iterator_category member types, pending being able to include so we >> can use std::bidirectional_iterator_tag). >> >> This change only provides the new facility, and doesn't include any uses of >> it. It is intended to replace the 4-5 (or maybe more) competing intrusive >> doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of >> those alterantives, this proposal provides a suite of unit tests. >> >> An example of a place that I think might benefit from this is G1's region >> handling. There are various places where G1 iterates over all regions in order >> to do something with those which satisfy some property (humongous regions, >> regions in the collection set, &etc). If it were trivial to create new region >> sublists (and this facility makes that easy), some of these could be turned >> into direct iteration over only the regions of interest. >> >> Some specific points to consider when reviewing this proposal: >> >> (1) This proposal follows Standard Library API conventions, which differ from >> HotSpot in various ways. >> >> (1a) Lists and iterators provide various type members, with names per the >> Standard Library. There has been discussion of using some parts of the >> Standard Library eventually, in which case this would be important. But for >> now some of the naming choices are atypical for HotSpot. >> >> (1b) Some of the function signatures follow the Standard Library APIs even >> though the reasons for that form might not apply to HotSpot. For example, the >> list pop operations don't return the removed... > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > add IntrusiveListEntry::is_attached() Hi Kim! I've now read the initial documentation string in the header file. I feel like this is a very well-written and formal text whose audience is someone intimately familiar with C++, but who doesn't know anything about intrusive linked lists. Unfortunately, Hotspot devs are basically the opposite of that. We know intrusive linked lists and when to reach for them but are not C++ standard experts. If you provide us with a rich API, short and concise documentation and code samples on how to use the API, then we're likely to be very happy! Also, since we're not only users, but also future maintainers of this code, some explanatory documentation of the internals are often very appreciated. I'm not asking you to be so informal as to be incorrect, but I think that a lot of this writing can be cut down, especially since we can always read the code if we have any specific questions. This is perhaps more of a taste thing, but I don't mind using some active voice when describing how to use the API. I'm still going through this, but I wanted to get these thoughts posted as this PR is quite large and I don't want to forget anything. Thank you for your efforts on this. src/hotspot/share/utilities/intrusiveList.hpp line 39: > 37: class IntrusiveListImpl; > 38: > 39: /** Most long form comments I see in Hotspot do not use `/** */` and instead uses `//`, can the style be changed to this? src/hotspot/share/utilities/intrusiveList.hpp line 45: > 43: * when inserting objects into the list or referencing list objects, > 44: * and removing an object from a list need not involve destroying the > 45: * object. >As a result, [...] We know what an intrusive linked list is, we have at least 5 of them :)! src/hotspot/share/utilities/intrusiveList.hpp line 63: > 61: * in a list are externally managed, rather than being embedded values > 62: * in the list, the actual type of such objects may be more specific > 63: * than the list's element type. Okay, is there a reason that this shouldn't be true? I assume that what you're saying is that we can have: ```c++ struct Super { IntrusiveListEntry entry; }; struct SubA : public Super {}; struct SubB : public Super {}; void foo() { IntrusiveList my_list; // This my_list may contain SubA, SubB, and Super } And this seems like it should be true for any reasonable intrusive list in C++. src/hotspot/share/utilities/intrusiveList.hpp line 66: > 64: * > 65: * * T is the class of the elements in the list. Must be a possibly > 66: * const-qualified class type. I don't know what it means to be a 'possibly const-qualified class type'. src/hotspot/share/utilities/intrusiveList.hpp line 73: > 71: * * has_size determines whether the list has a size() > 72: * operation, returning the number of elements in the list. If the > 73: * operation is present, it has constant-time complexity. The default Surely that depends on the time complexity of the operation? src/hotspot/share/utilities/intrusiveList.hpp line 78: > 76: * * Base is the base class for the list. This is typically > 77: * used to specify the allocation class. The default is void, indicating > 78: * no allocation class for the list. What's an allocation class? src/hotspot/share/utilities/intrusiveList.hpp line 87: > 85: * iterators and access to const-qualified elements. A const object cannot be > 86: * added to a list whose value type is not const-qualified, as that would be > 87: * an implicit casting away of the const qualifier. Okay, I feel like this can be shortened signifcantly: > A const-qualified type can be part of an IntrusiveList, then you only get const iterators and access to const elements. If you use a non-const type, then you can get both const and non-const iterators and access to elements. You can't add const values to a non-const list, as that would be implicitly casting away the const qualifier. src/hotspot/share/utilities/intrusiveList.hpp line 93: > 91: * argument, a const reference to a removed element. This function should > 92: * "dispose" of the argument object when called, such as by deleting the > 93: * object. The result of the call is ignored. A lot of this information is more easily seen by reading the code. Saying something like: >Operations that remove elements from a list take a disposer function as an argument. A disposer should free up the memory associated with the object. Shorter documentation is quicker to read and easier to maintain. src/hotspot/share/utilities/intrusiveList.hpp line 99: > 97: * specialization of the IntrusiveList class, e.g. > 98: * > 99: * We don't need these tags ------------- PR Review: https://git.openjdk.org/jdk/pull/15896#pullrequestreview-1659471376 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1347183337 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1347196105 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1347192667 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1347193270 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1347194449 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1347195267 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1347200031 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1347206121 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1347194735 From dfenacci at openjdk.org Thu Oct 5 11:05:12 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 5 Oct 2023 11:05:12 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Wed, 20 Sep 2023 14:16:44 GMT, Andrew Haley wrote: >>>There is a CodeCache lock, but it seems to be that the memory tracking is accessing data outside the lock. I don't know how that's supposed to work >> >> Acquiring a CodeCache lock when getting `used` and `committed` here >> https://github.com/openjdk/jdk/blob/12de9b0225363377e9a76729b11698221d4f29f2/src/hotspot/share/services/memoryPool.cpp#L181-L182 >> was actually the solution in the first commit https://github.com/openjdk/jdk/pull/15819/commits/1060bb01f7ad0ca47c38eb0f68b7f2aab6562ac3 but then, asking about the expensiveness of a lock vs storestore/loadload barriers, and the fact that the origin of the issue was actually the order in which store and load actually happen, made me think barriers would be preferable, thing that I now very much doubt. >> You are suggesting that it would make more sense (and be cleaner) to acquire a CodeCache lock instead of using barriers, right? >> >>> ``` >>> // Caller in JDK is responsible for synchronization - >>> // acquire the lock for this memory pool before calling VM >>> MemoryUsage usage = get_memory_usage(); >>> ``` >>> >>> Oops. I can't see the lock being acquired. >> >> I'm wondering if there are more places where this would be needed... >> >> >>> What do you mean by "we?d need a reference to be shared between the 2 places "? >> >> What I meant is mainly that in this specific case the _storing_ and _loading_ happens in very different places in the code. > >> > There is a CodeCache lock, but it seems to be that the memory tracking is accessing data outside the lock. I don't know how that's supposed to work >> >> Acquiring a CodeCache lock when getting `used` and `committed` here >> >> https://github.com/openjdk/jdk/blob/12de9b0225363377e9a76729b11698221d4f29f2/src/hotspot/share/services/memoryPool.cpp#L181-L182 >> >> was actually the solution in the first commit [1060bb0](https://github.com/openjdk/jdk/commit/1060bb01f7ad0ca47c38eb0f68b7f2aab6562ac3) but then, asking about the expensiveness of a lock vs storestore/loadload barriers, and the fact that the origin of the issue was actually the order in which store and load actually happen, made me think barriers would be preferable, thing that I now very much doubt. >> You are suggesting that it would make more sense (and be cleaner) to acquire a CodeCache lock instead of using barriers, right? > > That seems to be the established convention. And I wonder if there are other problems caused by accessing data in this way, outside the CodeCache lock. > >> > ``` >> > // Caller in JDK is responsible for synchronization - >> > // acquire the lock for this memory pool before calling VM >> > MemoryUsage usage = get_memory_usage(); >> > ``` >> > Oops. I can't see the lock being acquired. >> >> I'm wondering if there are more places where this would be needed... > > The comment makes it pretty clear that `get_memory_usage()`really needs to hold the lock for this memory pool. I very strongly suspect this is the real cause of your problem. > >> > What do you mean by "we?d need a reference to be shared between the 2 places "? >> >> What I meant is mainly that in this specific case the _storing_ and _loading_ happens in very different places in the code. > > Sure, it's not best practice. Better than an (apparently) randomly-placed fence, though. @theRealAph do you think the current solution (with `CodeCache_lock` acquisition) could be OK? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15819#discussion_r1347237545 From rrich at openjdk.org Thu Oct 5 15:04:54 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 5 Oct 2023 15:04:54 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v13] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 10:22:45 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: > > - Split work strictly at stripe boundaries > - Reset to master The difference to the baseline in the `card_scan_big_instances.java` test is < 5ms when the card mark copying to the stripes is done in parallel. It would be possible to improve this further: once a thread has completed its part of the copying it could begin scanning its stripes. Not sure if it's worth it. Testing: langtools:tier1 TEST_VM_OPTS="-XX:+UseParallelGC" hotspot:tier1 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1749053411 From rrich at openjdk.org Thu Oct 5 15:24:30 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 5 Oct 2023 15:24:30 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v14] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Parallel copying of imprecise marks to stripes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/817b164c..22fe8496 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=12-13 Stats: 61 lines in 3 files changed: 32 ins; 23 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From duke at openjdk.org Thu Oct 5 23:01:21 2023 From: duke at openjdk.org (duke) Date: Thu, 5 Oct 2023 23:01:21 GMT Subject: Withdrawn: 8303762: [vectorapi] Intrinsification of Vector.slice In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 18:23:42 GMT, Quan Anh Mai wrote: > `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. > > A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. > > Please take a look and have some reviews. Thank you very much. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12909 From kbarrett at openjdk.org Fri Oct 6 02:30:50 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 6 Oct 2023 02:30:50 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v4] In-Reply-To: References: Message-ID: > Please review this new facility, providing a general mechanism for intrusive > doubly-linked lists. A class supports inclusion in a list by having an > IntrusiveListEntry member, and providing structured information about how to > access that member. A class supports inclusion in multiple lists by having > multiple IntrusiveListEntry members, with different lists specified to use > different members. > > The IntrusiveList class template provides the list management. It is modelled > on bidirectional containers such as std::list and boost::intrusive::list, > providing many of the expected member types and functions. (Note that the > member types use the Standard's naming conventions.) (Not all standard > container requirements are met; some operations are not presently supported > because they haven't been needed yet.) This includes iteration support using > (mostly) standard-conforming iterator types (they are presently missing > iterator_category member types, pending being able to include so we > can use std::bidirectional_iterator_tag). > > This change only provides the new facility, and doesn't include any uses of > it. It is intended to replace the 4-5 (or maybe more) competing intrusive > doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of > those alterantives, this proposal provides a suite of unit tests. > > An example of a place that I think might benefit from this is G1's region > handling. There are various places where G1 iterates over all regions in order > to do something with those which satisfy some property (humongous regions, > regions in the collection set, &etc). If it were trivial to create new region > sublists (and this facility makes that easy), some of these could be turned > into direct iteration over only the regions of interest. > > Some specific points to consider when reviewing this proposal: > > (1) This proposal follows Standard Library API conventions, which differ from > HotSpot in various ways. > > (1a) Lists and iterators provide various type members, with names per the > Standard Library. There has been discussion of using some parts of the > Standard Library eventually, in which case this would be important. But for > now some of the naming choices are atypical for HotSpot. > > (1b) Some of the function signatures follow the Standard Library APIs even > though the reasons for that form might not apply to HotSpot. For example, the > list pop operations don't return the removed value. For node-based containers > in Standard Library that would introduce exception... Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: entry accessor function ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15896/files - new: https://git.openjdk.org/jdk/pull/15896/files/e85271eb..0ee7d587 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=02-03 Stats: 116 lines in 3 files changed: 35 ins; 13 del; 68 mod Patch: https://git.openjdk.org/jdk/pull/15896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15896/head:pull/15896 PR: https://git.openjdk.org/jdk/pull/15896 From tschatzl at openjdk.org Fri Oct 6 07:43:28 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 6 Oct 2023 07:43:28 GMT Subject: RFR: 8317440: Lock rank checking fails when code root set is modified with the Servicelock held after JDK-8315503 Message-ID: Hi all, please review this change that fixes lock ranking after recent changes to the code root set, now using a CHT. The issue came up because the lock rank of the CHT lock has been larger than the rank of the Servicethread_lock where it is possible that code roots can be added. The suggested solution is to fix up the lock rankings to work; actually this PR contains two variants: 1) one that statically sets the lock ranks of the CHT lock (and the ThreadSMR_lock that can be used during CHT operation) to something smaller than Servicethread_lock. 2) one that allows setting of the CHT lock rank via parameter as well (the last commit changed the code to variant 1). The other lock ranking changes to Metaspace_lock and ContinuationRelativize_lock are simply undos of the respective changes in [JDK-8315503](https://bugs.openjdk.org/browse/JDK-8315503). Testing: tier1-8 for variant 2), tier 1-7 for variant 1) Thanks, Thomas ------------- Commit messages: - undo earlier changes, fix test - Some more ranking problems... - fix test compilation - 8317440 initial version Changes: https://git.openjdk.org/jdk/pull/16062/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16062&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317440 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16062.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16062/head:pull/16062 PR: https://git.openjdk.org/jdk/pull/16062 From djelinski at openjdk.org Fri Oct 6 07:59:18 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Fri, 6 Oct 2023 07:59:18 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v3] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 05:13:19 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Reorganized code as per comments, added new instruction addb src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 354: > 352: // Save rbp and rsp > 353: __ push(rbp); > 354: __ movq(rbp, rsp); This line breaks stack walking code, at least on Linux; rbp is supposed to be the frame pointer throughout the stub. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 362: > 360: __ vzeroupper(); > 361: __ movq(rsp, rbp); > 362: __ pop(rbp); If you remove `movq(rbp, rsp)` above, you can replace this with: Suggestion: __ lea(rsp, Address (rbp, WINDOWS_ONLY(-7) NOT_WINDOWS(-5) * wordSize)); src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3342: > 3340: __ movdqu(xmm1, xmm2); > 3341: __ vpslldq(xmm2, xmm2, 8, Assembler::AVX_128bit); > 3342: __ vpsrldq(xmm1, xmm1, 8, Assembler::AVX_128bit); You could save a few bytes here: Suggestion: __ vpsrlq(xmm1, xmm6, 63, Assembler::AVX_128bit); __ vpsllq(xmm6, xmm6, 1, Assembler::AVX_128bit); __ vpslldq(xmm2, xmm1, 8, Assembler::AVX_128bit); __ vpsrldq(xmm1, xmm1, 8, Assembler::AVX_128bit); src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3710: > 3708: > 3709: // Generate 8 constants for htbl > 3710: __ call(generate_htbl_8_blks, relocInfo::none); why didn't you inline `generateHtbl_8_block_avx2` here? This method is only used here as far as I can tell. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1348346370 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1348347451 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1348372462 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1348349329 From aph at openjdk.org Fri Oct 6 08:42:45 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 6 Oct 2023 08:42:45 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 14:03:30 GMT, Damon Fenacci wrote: > # Issue > An intermittent _Memory Pool not found_ error has been noticed when running a few tests (_vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java_, _vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java_) on _macosx_aarch64_ (production build) with non-segmented code cache. > > ## Origin > The issue originates from the fact that aarch64 architecture is a weakly ordered memory architecture, i.e. it _permits the observation and completion of memory accesses in a different order from the program order_. > > More precisely: while calling `CodeHeapPool::get_memory_usage`, the `used` and `committed` variables are retrieved > https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/services/memoryPool.cpp#L181-L182 > and these are computed based on different variables saved in memory in `CodeCache::allocate` (during `heap->allocate` and `heap->expand_by` to be precise) .https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/code/codeCache.cpp#L535-L537 > The problem happens when first `heap->expand_by` gets called (which _increases_ `committed`) and then `heap->allocate` gets called in a second loop pass (which _increases_ `used`). Although stores in `CodeCache::allocate` happen in the this order, when reading from memory in `CodeHeapPool::get_memory_usage` it can happen that `used` has the newly computed value, while `committed` is still "old" (because of ARM?s weak memory order). This is a problem, since `committed` must be > than `used`. > > # Solution > > To avoid this situation we must assure that values used to calculate `committed` are actually saved before the values used to calculate `used` and that the opposite be true for reading. To enforce this we acquire a `CodeCache_lock` while reading `used` and `committed` in `CodeHeapPool::get_memory_usage` (which should actually be the convention when accessing CodeCache data). Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15819#pullrequestreview-1661368504 From aph at openjdk.org Fri Oct 6 08:42:46 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 6 Oct 2023 08:42:46 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 11:02:48 GMT, Damon Fenacci wrote: >>> > There is a CodeCache lock, but it seems to be that the memory tracking is accessing data outside the lock. I don't know how that's supposed to work >>> >>> Acquiring a CodeCache lock when getting `used` and `committed` here >>> >>> https://github.com/openjdk/jdk/blob/12de9b0225363377e9a76729b11698221d4f29f2/src/hotspot/share/services/memoryPool.cpp#L181-L182 >>> >>> was actually the solution in the first commit [1060bb0](https://github.com/openjdk/jdk/commit/1060bb01f7ad0ca47c38eb0f68b7f2aab6562ac3) but then, asking about the expensiveness of a lock vs storestore/loadload barriers, and the fact that the origin of the issue was actually the order in which store and load actually happen, made me think barriers would be preferable, thing that I now very much doubt. >>> You are suggesting that it would make more sense (and be cleaner) to acquire a CodeCache lock instead of using barriers, right? >> >> That seems to be the established convention. And I wonder if there are other problems caused by accessing data in this way, outside the CodeCache lock. >> >>> > ``` >>> > // Caller in JDK is responsible for synchronization - >>> > // acquire the lock for this memory pool before calling VM >>> > MemoryUsage usage = get_memory_usage(); >>> > ``` >>> > Oops. I can't see the lock being acquired. >>> >>> I'm wondering if there are more places where this would be needed... >> >> The comment makes it pretty clear that `get_memory_usage()`really needs to hold the lock for this memory pool. I very strongly suspect this is the real cause of your problem. >> >>> > What do you mean by "we?d need a reference to be shared between the 2 places "? >>> >>> What I meant is mainly that in this specific case the _storing_ and _loading_ happens in very different places in the code. >> >> Sure, it's not best practice. Better than an (apparently) randomly-placed fence, though. > > @theRealAph do you think the current solution (with `CodeCache_lock` acquisition) could be OK? I think so. I am a bit nervous because `MutexLock` isn't recursive, so if this code is ever called from a region that is already locked it'll fail. But there is an assertion in `MutexLock` that will detect that if it ever happens, so OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15819#discussion_r1348408521 From dfenacci at openjdk.org Fri Oct 6 11:04:00 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 6 Oct 2023 11:04:00 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 08:28:27 GMT, Andrew Haley wrote: >> # Issue >> An intermittent _Memory Pool not found_ error has been noticed when running a few tests (_vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java_, _vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java_) on _macosx_aarch64_ (production build) with non-segmented code cache. >> >> ## Origin >> The issue originates from the fact that aarch64 architecture is a weakly ordered memory architecture, i.e. it _permits the observation and completion of memory accesses in a different order from the program order_. >> >> More precisely: while calling `CodeHeapPool::get_memory_usage`, the `used` and `committed` variables are retrieved >> https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/services/memoryPool.cpp#L181-L182 >> and these are computed based on different variables saved in memory in `CodeCache::allocate` (during `heap->allocate` and `heap->expand_by` to be precise) .https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/code/codeCache.cpp#L535-L537 >> The problem happens when first `heap->expand_by` gets called (which _increases_ `committed`) and then `heap->allocate` gets called in a second loop pass (which _increases_ `used`). Although stores in `CodeCache::allocate` happen in the this order, when reading from memory in `CodeHeapPool::get_memory_usage` it can happen that `used` has the newly computed value, while `committed` is still "old" (because of ARM?s weak memory order). This is a problem, since `committed` must be > than `used`. >> >> # Solution >> >> To avoid this situation we must assure that values used to calculate `committed` are actually saved before the values used to calculate `used` and that the opposite be true for reading. To enforce this we acquire a `CodeCache_lock` while reading `used` and `committed` in `CodeHeapPool::get_memory_usage` (which should actually be the convention when accessing CodeCache data). > > Marked as reviewed by aph (Reviewer). Thanks a lot for your reviews @theRealAph @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15819#issuecomment-1750244400 From dfenacci at openjdk.org Fri Oct 6 11:04:00 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 6 Oct 2023 11:04:00 GMT Subject: Integrated: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 14:03:30 GMT, Damon Fenacci wrote: > # Issue > An intermittent _Memory Pool not found_ error has been noticed when running a few tests (_vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java_, _vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java_) on _macosx_aarch64_ (production build) with non-segmented code cache. > > ## Origin > The issue originates from the fact that aarch64 architecture is a weakly ordered memory architecture, i.e. it _permits the observation and completion of memory accesses in a different order from the program order_. > > More precisely: while calling `CodeHeapPool::get_memory_usage`, the `used` and `committed` variables are retrieved > https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/services/memoryPool.cpp#L181-L182 > and these are computed based on different variables saved in memory in `CodeCache::allocate` (during `heap->allocate` and `heap->expand_by` to be precise) .https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/code/codeCache.cpp#L535-L537 > The problem happens when first `heap->expand_by` gets called (which _increases_ `committed`) and then `heap->allocate` gets called in a second loop pass (which _increases_ `used`). Although stores in `CodeCache::allocate` happen in the this order, when reading from memory in `CodeHeapPool::get_memory_usage` it can happen that `used` has the newly computed value, while `committed` is still "old" (because of ARM?s weak memory order). This is a problem, since `committed` must be > than `used`. > > # Solution > > To avoid this situation we must assure that values used to calculate `committed` are actually saved before the values used to calculate `used` and that the opposite be true for reading. To enforce this we acquire a `CodeCache_lock` while reading `used` and `committed` in `CodeHeapPool::get_memory_usage` (which should actually be the convention when accessing CodeCache data). This pull request has now been integrated. Changeset: 7162624d Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/7162624d70886fc2afc357ab4b0d4ec431e2d1cd Stats: 7 lines in 1 file changed: 5 ins; 0 del; 2 mod 8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 Reviewed-by: thartmann, aph ------------- PR: https://git.openjdk.org/jdk/pull/15819 From rrich at openjdk.org Fri Oct 6 11:17:12 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 6 Oct 2023 11:17:12 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v15] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: - Missed acquire semantics - Overlap scavenge with pre-scavenge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/22fe8496..d845e650 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=13-14 Stats: 114 lines in 3 files changed: 80 ins; 26 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Fri Oct 6 11:17:25 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 6 Oct 2023 11:17:25 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v14] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 15:24:30 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Parallel copying of imprecise marks to stripes I think it would be possible to combine the two approaches: we would have a read only copy of the card table only for the current stripe. This would reduce the required extra memory to just `num_cards_in_stripe * active_workers` (128 * active_workers) bytes. The readonly copy could be a local variable (on stack) in `scavenge_contents_parallel`. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1750252198 From coleenp at openjdk.org Fri Oct 6 11:51:46 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 6 Oct 2023 11:51:46 GMT Subject: RFR: 8317440: Lock rank checking fails when code root set is modified with the Servicelock held after JDK-8315503 In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 17:19:35 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that fixes lock ranking after recent changes to the code root set, now using a CHT. > > The issue came up because the lock rank of the CHT lock has been larger than the rank of the Servicethread_lock where it is possible that code roots can be added. > > The suggested solution is to fix up the lock rankings to work; actually this PR contains two variants: > 1) one that statically sets the lock ranks of the CHT lock (and the ThreadSMR_lock that can be used during CHT operation) to something smaller than Servicethread_lock. > 2) one that allows setting of the CHT lock rank via parameter as well (the last commit changed the code to variant 1). > > The other lock ranking changes to Metaspace_lock and ContinuationRelativize_lock are simply undos of the respective changes in [JDK-8315503](https://bugs.openjdk.org/browse/JDK-8315503). > > Testing: tier1-8 for variant 2), tier 1-7 for variant 1) > > Thanks, > Thomas Variant 1 seems ok. Uses of the CHT shouldn't take locks, so having a low lock ranking for CHT lock seems like it'll be fine (I can't find where it takes the ThreadsSMRDelete_lock). If any of this breaks, we can try approach #2 next. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16062#pullrequestreview-1661696681 From iwalulya at openjdk.org Fri Oct 6 11:58:47 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 6 Oct 2023 11:58:47 GMT Subject: RFR: 8317318: Serial: Change GenCollectedHeap to SerialHeap in whitebox [v2] In-Reply-To: References: Message-ID: On Sat, 30 Sep 2023 17:27:42 GMT, Albert Mingkun Yang wrote: >> Use more precise type for Serial GC. I also added a ` ShouldNotReachHere()` there, because using serial-heap when serial-gc is not used seems problematic. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > s1-prims Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15988#pullrequestreview-1661705568 From aph at openjdk.org Fri Oct 6 12:11:35 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 6 Oct 2023 12:11:35 GMT Subject: RFR: JDK-8269393: store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64 In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 08:28:05 GMT, Andrew Haley wrote: >> @theRealAph do you think the current solution (with `CodeCache_lock` acquisition) could be OK? > > I think so. I am a bit nervous because `MutexLock` isn't recursive, so if this code is ever called from a region that is already locked it'll fail. But there is an assertion in `MutexLock` that will detect that if it ever happens, so OK. Looks like I was right bout the locking, I'm afraid. We need a better way to handle this. Still, at least the assertion detected this before everything went south. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15819#discussion_r1348636409 From ayang at openjdk.org Fri Oct 6 12:18:07 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 6 Oct 2023 12:18:07 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v15] In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 11:17:12 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: > > - Missed acquire semantics > - Overlap scavenge with pre-scavenge I find pre-processing card-table removes much complexity in determining which (part of) obj belongs to current stripe. However, synchronizing with actual scavenging introduce some complexity. The fact that `find_first_clean_card` copies the cached-obj-start is easy to miss and hard to reason IMO. > we would have a read only copy of the card table only for the current stripe. It would still require pre-processing card-table, right? Otherwise, I don't see how one can work around the "interference" across stripes. Maybe this can simplify the impl of `find_first_clean_card`. I am not too concerned about the regression observed for "large (32K) non-array instances", because that pattern is not common in java and the pause-time is still reasonable (<100ms). The long-term optimization (or the redemption of the extra-mem-requirement) I have in mind is to use 1 bit (instead of 1 byte) for a card -- Parallel requires only a boolean info for a particular card. One can even pre-alloc two card-tables now that each card-table is 1/8 of its original size, to avoid calling malloc inside young-gc-pause. My preference is some simple code without much regression. Ofc, this is quite subjective. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1750541087 From ayang at openjdk.org Fri Oct 6 12:20:49 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 6 Oct 2023 12:20:49 GMT Subject: RFR: 8317318: Serial: Change GenCollectedHeap to SerialHeap in whitebox [v2] In-Reply-To: References: Message-ID: On Sat, 30 Sep 2023 17:27:42 GMT, Albert Mingkun Yang wrote: >> Use more precise type for Serial GC. I also added a ` ShouldNotReachHere()` there, because using serial-heap when serial-gc is not used seems problematic. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > s1-prims Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15988#issuecomment-1750541862 From ayang at openjdk.org Fri Oct 6 12:20:50 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 6 Oct 2023 12:20:50 GMT Subject: Integrated: 8317318: Serial: Change GenCollectedHeap to SerialHeap in whitebox In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 14:47:19 GMT, Albert Mingkun Yang wrote: > Use more precise type for Serial GC. I also added a ` ShouldNotReachHere()` there, because using serial-heap when serial-gc is not used seems problematic. This pull request has now been integrated. Changeset: b3cc0c84 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/b3cc0c84316dd59f406a6fa23fcaf3d029910843 Stats: 11 lines in 1 file changed: 8 ins; 1 del; 2 mod 8317318: Serial: Change GenCollectedHeap to SerialHeap in whitebox Reviewed-by: tschatzl, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/15988 From ayang at openjdk.org Fri Oct 6 12:33:45 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 6 Oct 2023 12:33:45 GMT Subject: RFR: 8317440: Lock rank checking fails when code root set is modified with the Servicelock held after JDK-8315503 In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 17:19:35 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that fixes lock ranking after recent changes to the code root set, now using a CHT. > > The issue came up because the lock rank of the CHT lock has been larger than the rank of the Servicethread_lock where it is possible that code roots can be added. > > The suggested solution is to fix up the lock rankings to work; actually this PR contains two variants: > 1) one that statically sets the lock ranks of the CHT lock (and the ThreadSMR_lock that can be used during CHT operation) to something smaller than Servicethread_lock. > 2) one that allows setting of the CHT lock rank via parameter as well (the last commit changed the code to variant 1). > > The other lock ranking changes to Metaspace_lock and ContinuationRelativize_lock are simply undos of the respective changes in [JDK-8315503](https://bugs.openjdk.org/browse/JDK-8315503). > > Testing: tier1-8 for variant 2), tier 1-7 for variant 1) > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16062#pullrequestreview-1661761085 From tschatzl at openjdk.org Fri Oct 6 12:51:08 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 6 Oct 2023 12:51:08 GMT Subject: RFR: 8317440: Lock rank checking fails when code root set is modified with the Servicelock held after JDK-8315503 In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 11:49:21 GMT, Coleen Phillimore wrote: > Variant 1 seems ok. Uses of the CHT shouldn't take locks, so having a low lock ranking for CHT lock seems like it'll be fine (I can't find where it takes the ThreadsSMRDelete_lock). If any of this breaks, we can try approach #2 next. Thread lists synchronization in `GlobalCounter::write_synchronize()` uses the `ThreadsSMRDelete_lock` via `JavaThreadIteratorWithHandle`->`ThreadsListHandle`->`SafeThreadsListPtr` in the destructor ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/16062#issuecomment-1750614008 From mcimadamore at openjdk.org Fri Oct 6 13:22:58 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 6 Oct 2023 13:22:58 GMT Subject: RFR: 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free In-Reply-To: References: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> Message-ID: <-XiDO_BGrHX3qucomFmxb7K1Ye41uhoohx5OHVodQ2E=.0185c0fe-3a61-4052-937f-6f0ee977b8f4@github.com> On Fri, 29 Sep 2023 08:17:18 GMT, Maurizio Cimadamore wrote: >> To increase performance by avoiding a thread-state transition (native -> in_vm) we change the three "raw" allocation functions in Unsafe to be UNSAFE_LEAF rather than UNSAFE_ENTRY. >> >> It is hard to track through all the related code to prove this is a safe change, but I could not spot anything obvious and testing indicated no issues (my main concern was potential missing WXWrite on macOS Aarch64). >> >> Testing: >> - tiers 1-7 on linux and macos x64 and Aarch64, plus Windows x64 >> >> Thanks > > Thanks for taking care of this @dholmes-ora. Do you know if Unsafe::copyMemory, or Unsafe::setMemory can also receive same treatment? These are bulk operations, so they are less sensitive to the transition cost - but for small copies it can still be a factor. > @mcimadamore setMemory and copyMemory are targeting Java arrays not native memory so they have to be safepoint-aware and so cannot be leaf operations. Makes sense. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15977#issuecomment-1750672021 From zgu at openjdk.org Fri Oct 6 13:37:00 2023 From: zgu at openjdk.org (Zhengyu Gu) Date: Fri, 6 Oct 2023 13:37:00 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs Message-ID: Interpreter oop maps are computed lazily during GC root scan and they are expensive to compute. GCs uses a small hash table per instance class to cache computed oop maps during STW root scan, but not for concurrent root scan. This patch is intended to enable `OopMapCache` for concurrent GCs. Test: tier1 and tier2 fastdebug and release on MacOSX, Linux 86_84 and Linux 86_32. ------------- Commit messages: - Fix merge conflicts - Merge - cleanup - Merge branch 'master' into JDK-8317466 - Cleanup - Merge branch 'master' into oopmapcache_for_concurrent_root_scan - v2 - v1 - v0 - 8317240: Promptly free OopMapEntry after fail to insert the entry to OopMapCache Changes: https://git.openjdk.org/jdk/pull/16074/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16074&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317466 Stats: 56 lines in 10 files changed: 20 ins; 15 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/16074.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16074/head:pull/16074 PR: https://git.openjdk.org/jdk/pull/16074 From jvernee at openjdk.org Fri Oct 6 16:12:57 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 6 Oct 2023 16:12:57 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v34] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 63 commits: - fix failing RestrictedMethods test - Merge branch 'master' into JEP22 - Remove PIP annotation from jdk.incubator.vector - review @enablePreview from java/foreign/TestRestricted test - Merge branch 'master' into JEP22 - drop unneeded @compile tags from jtreg tests - Use IAE instead of UOE for unsupported char sets - Use abort instead of IEA when encountering wrong value for ENA attrib. - Fix visibility issues Reviewed-by: mcimadamore - Review comments - ... and 53 more: https://git.openjdk.org/jdk/compare/b3cc0c84...b4a7b7ab ------------- Changes: https://git.openjdk.org/jdk/pull/15103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=33 Stats: 4361 lines in 263 files changed: 2211 ins; 1196 del; 954 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From jvernee at openjdk.org Fri Oct 6 16:13:33 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 6 Oct 2023 16:13:33 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v33] In-Reply-To: <0r5bNt-ez79b7DrOJUuHCPguBQkn3MtEJdoQFqVuWxA=.86265508-9e7d-4fab-a851-35c5c138255d@github.com> References: <0r5bNt-ez79b7DrOJUuHCPguBQkn3MtEJdoQFqVuWxA=.86265508-9e7d-4fab-a851-35c5c138255d@github.com> Message-ID: On Mon, 2 Oct 2023 16:07:09 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Remove PIP annotation from jdk.incubator.vector I've addressed one more test failure in [b4a7b7a](https://github.com/openjdk/jdk/pull/15103/commits/b4a7b7abe75aee72c8e56c12c0ddbeb597e15f1b) that was found after merging this patch with the most recent master branch. The test in questions checks for linter warnings for restricted methods. It depends on the FFM API for that, which is still preview in the master branch, but no longer preview as part of this patch. Hence the warnings the compiler outputs is different, and the test fails. I've fixed to test. The preview flags are no longer needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15103#issuecomment-1751015355 From shade at openjdk.org Fri Oct 6 17:21:14 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 6 Oct 2023 17:21:14 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v6] In-Reply-To: References: Message-ID: <56RFDm_cBCPZHPHJq_vuSRPb9OLhUe9uwBhI-xhxgqk=.0b06a02e-cd09-431d-9a7a-3aa7fab7d1e9@github.com> On Thu, 5 Oct 2023 06:06:01 GMT, Thomas Stuefe wrote: >> Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. >> >> Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. >> >> >> 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 >> 8b7b69: 0f b6 00 movzbl (%rax),%eax >> 8b7b6c: 84 c0 test %al,%al >> 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> >> 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi >> 8b7b7e: 8b 0a mov (%rdx),%ecx >> 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> >> 8b7b87: 48 d3 e7 shl %cl,%rdi >> 8b7b8a: 48 03 3a add (%rdx),%rdi >> >> >> Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. >> >> >> 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> >> 8ba309: 48 8b 08 mov (%rax),%rcx >> 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? >> 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi >> 8ba318: 48 d3 e7 shl %cl,%rdi # shift >> 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base >> 8ba31e: 48 01 cf add %rcx,%rdi # add base >> 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx >> >> --- >> >> Performance measurements: >> >> G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. >> >> I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. >> >> --- >> >> Future extensions: >> >> This patch uses the fact that the encoding base is aligned to metaspace reser... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ > - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ > - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ > - APH feedback > - Merge branch 'master' into optimize-narrow-klass-decoding-in-c++ > - fix -UseCCP case > - use 16 bit alignment > - with raw bit ops A few stylistic comments. What is confusing to me is that combo flag initialization is basically conditional on `UseCompressedClassPointers` (i.e. assert in new `set_base_and_shift`). But at the same time, we ask for `CompressedKlassPointers::use_compressed_klass_pointers()` in `oop`methods. This works "only" because the -UseCCP generates the same combo as the initial value of `0`? Seems fragile. I wonder if we want to initialize combo unconditionally. src/hotspot/share/oops/compressedKlass.cpp line 40: > 38: > 39: void CompressedKlassPointers::set_base_and_shift(address thebase, int theshift) { > 40: assert(UseCompressedClassPointers, "Why are we here?"); All other Hotspot asserts have this form: Suggestion: assert(UseCompressedClassPointers, "only for compressed klass code"); src/hotspot/share/oops/compressedKlass.cpp line 46: > 44: > 45: // we keep a composite word, `_combo`, containing base+shift+UseCCP, to load > 46: // all three information with a single 64-bit load. Suggestion: // Encode all three base+shift+UseCCP into a single 64-bit word. // This would allow optimizing the fast-path with a single load. src/hotspot/share/oops/compressedKlass.cpp line 53: > 51: _combo = (uint64_t)_base | (uint64_t)_shift | (1 << bitpos_useccp); > 52: > 53: // validate combo. Suggestion: src/hotspot/share/oops/compressedKlass.cpp line 56: > 54: assert(base() == _base, "combo encoding"); > 55: assert(shift() == _shift, "combo encoding"); > 56: assert(use_compressed_class_pointers() == UseCompressedClassPointers, "combo encoding"); Pre-condition assert means `UseCompressedClassPointers` is always `true` here, can simplify the assert. src/hotspot/share/oops/compressedKlass.hpp line 66: > 64: // compiler. > 65: // - Bit [0-7] shift > 66: // - Bit 8 UseCompressedOops Suggestion: // - Bit 8 UseCompressedClassPointers ------------- PR Review: https://git.openjdk.org/jdk/pull/15389#pullrequestreview-1662323705 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1348986318 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1348990132 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1348990824 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1348989259 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1348994530 From sspitsyn at openjdk.org Fri Oct 6 18:21:06 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 6 Oct 2023 18:21:06 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered In-Reply-To: References: Message-ID: On Wed, 4 Oct 2023 21:55:21 GMT, Leonid Mesnik wrote: >> The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. >> The fix includes: >> - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec >> - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask >> - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function >> >> The fix also includes a couple of minor unification tweaks: >> - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` which have a little bit more optimized check for the `JVMTI_PHASE_PRIMORDIAL`. >> - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` >> >> Testing: ran mach5 tiers 1-6. All tests are passed. > > src/hotspot/share/prims/jvmtiExport.cpp line 1582: > >> 1580: // Do not post virtual thread start event for hidden java thread. >> 1581: if (JvmtiEventController::is_enabled(JVMTI_EVENT_VIRTUAL_THREAD_START) && >> 1582: !thread->is_hidden_from_external_view()) { > > Do we need this check? I'm not sure that JavaThread executing a virtual thread. Might be better to replace it with assertion? Good suggestion, thanks! Will make it an assertion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1349174060 From sspitsyn at openjdk.org Fri Oct 6 18:32:07 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 6 Oct 2023 18:32:07 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered In-Reply-To: References: Message-ID: On Wed, 4 Oct 2023 21:59:54 GMT, Leonid Mesnik wrote: >> The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. >> The fix includes: >> - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec >> - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask >> - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function >> >> The fix also includes a couple of minor unification tweaks: >> - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` which have a little bit more optimized check for the `JVMTI_PHASE_PRIMORDIAL`. >> - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` >> >> Testing: ran mach5 tiers 1-6. All tests are passed. > > src/hotspot/share/prims/jvmtiExport.cpp line 1552: > >> 1550: JvmtiEnvThreadStateIterator it(state); >> 1551: for (JvmtiEnvThreadState* ets = it.first(); ets != nullptr; ets = it.next(ets)) { >> 1552: JvmtiEnv *env = ets->get_env(); > > This change as well as renaming cur_thread are not related to the main issue. It would be better to separate them. Easier to track and backport if needed. They are mentioned in PR but not in jira bug, hard to find the reason without GitHub. Might be better to copy them in the bug if you want to keep them. Thanks. I agree with you in general. The whole fix is relatively small, so I'd prefer to keep such a minor cleanup in this particular case. I'll update the bug report with this info. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1349184898 From sspitsyn at openjdk.org Fri Oct 6 18:46:42 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 6 Oct 2023 18:46:42 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v2] In-Reply-To: References: Message-ID: > The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. > The fix includes: > - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec > - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask > - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function > > The fix also includes a couple of minor unification tweaks: > - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` to have a unified check for the `JVMTI_PHASE_PRIMORDIAL`. > - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` > > Testing: ran mach5 tiers 1-6. All tests are passed. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: convert check for is_hidden_from_external_view check() into assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16019/files - new: https://git.openjdk.org/jdk/pull/16019/files/1b716552..46ce6453 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16019&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16019&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16019/head:pull/16019 PR: https://git.openjdk.org/jdk/pull/16019 From cslucas at openjdk.org Fri Oct 6 18:48:46 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 6 Oct 2023 18:48:46 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v3] In-Reply-To: References: Message-ID: <4EB3jcooR9miV6MpOHmX_A_Zp-j-CkmBNXn9QjCC6L0=.59e315e6-9397-48cb-a372-b826a4703231@github.com> > ### Description > > Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. > > Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. > > The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. > > ### Benchmarking > > **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. > **Note 2:** Marging of error was negligible. > > | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | > |--------------------------------------|------------------|-------------------| > | TestTrapAfterMerge | 19.515 | 13.386 | > | TestArgEscape | 33.165 | 33.254 | > | TestCallTwoSide | 70.547 | 69.427 | > | TestCmpAfterMerge | 16.400 | 2.984 | > | TestCmpMergeWithNull_Second | 27.204 | 27.293 | > | TestCmpMergeWithNull | 8.248 | 4.920 | > | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | > | TestCondAfterMergeWithNull | 6.265 | 5.078 | > | TestCondLoadAfterMerge | 12.713 | 5.163 | > | TestConsecutiveSimpleMerge | 30.863 | 4.068 | > | TestDoubleIfElseMerge | 16.069 | 2.444 | > | TestEscapeInCallAfterMerge | 23.111 | 22.924 | > | TestGlobalEscape | 14.459 | 14.425 | > | TestIfElseInLoop | 246.061 | 42.786 | > | TestLoadAfterLoopAlias | 45.808 | 45.812 | > | TestLoadAfterTrap | 28.370 | ... Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Refrain from RAM of arrays and Phis controlled by Loop nodes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15825/files - new: https://git.openjdk.org/jdk/pull/15825/files/e8e9c13d..257e0447 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=01-02 Stats: 7 lines in 1 file changed: 5 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15825.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15825/head:pull/15825 PR: https://git.openjdk.org/jdk/pull/15825 From lmesnik at openjdk.org Fri Oct 6 19:12:06 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 6 Oct 2023 19:12:06 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v2] In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 18:46:42 GMT, Serguei Spitsyn wrote: >> The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. >> The fix includes: >> - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec >> - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask >> - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function >> >> The fix also includes a couple of minor unification tweaks: >> - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` to have a unified check for the `JVMTI_PHASE_PRIMORDIAL`. >> - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` >> >> Testing: ran mach5 tiers 1-6. All tests are passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: convert check for is_hidden_from_external_view check() into assert Thanks for fixing this. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16019#pullrequestreview-1662649766 From hgreule at openjdk.org Fri Oct 6 21:09:35 2023 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 6 Oct 2023 21:09:35 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 Message-ID: See the bug description for more information. This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. ------------- Commit messages: - whitespaces - Iterate fields forwards on thread dump Changes: https://git.openjdk.org/jdk/pull/16083/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16083&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317692 Stats: 110 lines in 2 files changed: 102 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16083/head:pull/16083 PR: https://git.openjdk.org/jdk/pull/16083 From dlong at openjdk.org Fri Oct 6 22:18:52 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 6 Oct 2023 22:18:52 GMT Subject: RFR: 8314294: Unsafe::allocateMemory and Unsafe::freeMemory are slower than malloc/free In-Reply-To: <-XiDO_BGrHX3qucomFmxb7K1Ye41uhoohx5OHVodQ2E=.0185c0fe-3a61-4052-937f-6f0ee977b8f4@github.com> References: <5BsY-GjOu3yFCKRa2U_JeJS7b3KncWlt5Jeips2mwP8=.ec4f1e3b-1081-48b4-abac-e4ddb241c02e@github.com> <-XiDO_BGrHX3qucomFmxb7K1Ye41uhoohx5OHVodQ2E=.0185c0fe-3a61-4052-937f-6f0ee977b8f4@github.com> Message-ID: On Fri, 6 Oct 2023 13:20:19 GMT, Maurizio Cimadamore wrote: > @mcimadamore setMemory and copyMemory are targeting Java arrays not native memory so they have to be safepoint-aware and so cannot be leaf operations. I think the key requirement is that they not run in the "native" state. If we had a compiler intrinsic then it should be able to run in the in_Java state. Offering a safepoint seems optional, depending on how much data is being modified. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15977#issuecomment-1751457450 From cjplummer at openjdk.org Fri Oct 6 23:28:02 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 6 Oct 2023 23:28:02 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v2] In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 18:46:42 GMT, Serguei Spitsyn wrote: >> The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. >> The fix includes: >> - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec >> - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask >> - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function >> >> The fix also includes a couple of minor unification tweaks: >> - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` to have a unified check for the `JVMTI_PHASE_PRIMORDIAL`. >> - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` >> >> Testing: ran mach5 tiers 1-6. All tests are passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: convert check for is_hidden_from_external_view check() into assert src/hotspot/share/prims/jvmti.xml line 13044: > 13042: > 13043: 13044: id="VirtualThreadStart" const="JVMTI_EVENT_VIRTUAL_THREAD_START" num="87" phase="start" since="21"> Does "filtered" mean that the event can be enabled or disabled on a per thread basis, and therefore by removing this it means the event can only be enabled or disabled globally? src/hotspot/share/prims/jvmtiExport.cpp line 1581: > 1579: assert(!thread->is_hidden_from_external_view(), "carrier threads can't be hidden"); > 1580: > 1581: // Do not post virtual thread start event for hidden java thread. Why would we ever have a hidden virtual thread? Also, why is this comment here. It is also below, which seems to be the more appropriate location. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1349381024 PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1349379751 From dlong at openjdk.org Fri Oct 6 23:49:53 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 6 Oct 2023 23:49:53 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object Thinking about this some more, aren't we going to hit this problem every time the monitor is inflated, either because of contention, recursive entry, or running out of lock stack slots? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1751506640 From dlong at openjdk.org Fri Oct 6 23:56:04 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 6 Oct 2023 23:56:04 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object The lock stack has a fixed size, so it cannot contain the complete lock order. Only the C1/C2 BasicObjectLock stack records contain that. In theory we could do sanity checks against the top of that "stack" for C1/C2, but the compiler current cheat. Instead of passing in the oop for unlock, they grab it from the top of the BasicObjectLock stack, so any sanity check would always pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1751509158 From dlong at openjdk.org Sat Oct 7 00:03:48 2023 From: dlong at openjdk.org (Dean Long) Date: Sat, 7 Oct 2023 00:03:48 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object I guess all callers of lightweight_unlock already check if the object is inflated, so that should be OK as long as it is guaranteed to stay inflated, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1751513041 From dlong at openjdk.org Sat Oct 7 00:40:04 2023 From: dlong at openjdk.org (Dean Long) Date: Sat, 7 Oct 2023 00:40:04 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object > I have tried to test on x86 with this patch: > > ```diff > diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp > index 2154601f2f2..3666d1490fc 100644 > --- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp > +++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp > @@ -863,7 +863,7 @@ void C2_MacroAssembler::fast_unlock(Register objReg, Register boxReg, Register t > jccb (Assembler::notZero, CheckSucc); > // Without cast to int32_t this style of movptr will destroy r10 which is typically obj. > movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD); > - jmpb (DONE_LABEL); > + jmp (DONE_LABEL); > > // Try to avoid passing control into the slow_path ... > bind (CheckSucc); > diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.cpp b/src/hotspot/cpu/x86/macroAssembler_x86.cpp > index 26135c65418..a95149c2be5 100644 > --- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp > +++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp > @@ -9836,6 +9836,15 @@ void MacroAssembler::lightweight_unlock(Register obj, Register hdr, Register tmp > assert(hdr == rax, "header must be in rax for cmpxchg"); > assert_different_registers(obj, hdr, tmp); > > + if (UseNewCode) { > + Label tos_ok; > + movl(tmp, Address(r15_thread, JavaThread::lock_stack_top_offset())); > + cmpptr(obj, Address(r15_thread, tmp, Address::times_1, -oopSize)); > + jcc(Assembler::equal, tos_ok); > + STOP("Top of lock-stack does not match the unlocked object"); > + bind(tos_ok); > + } > + > // Mark-word must be lock_mask now, try to swing it back to unlocked_value. > movptr(tmp, hdr); // The expected old value > orptr(tmp, markWord::unlocked_value); > ``` > > The assertion fires in C1 compiled methods and prevents me from getting far enough to run the same test. I don't see x86 callers of lightweight_unlock doing a check for inflation, so there is no guarantee it's on the lock stack. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1751529743 From alanb at openjdk.org Sat Oct 7 06:34:04 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 7 Oct 2023 06:34:04 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v2] In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 23:03:14 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: convert check for is_hidden_from_external_view check() into assert > > src/hotspot/share/prims/jvmti.xml line 13044: > >> 13042: >> 13043: > 13044: id="VirtualThreadStart" const="JVMTI_EVENT_VIRTUAL_THREAD_START" num="87" phase="start" since="21"> > > Does "filtered" mean that the event can be enabled or disabled on a per thread basis, and therefore by removing this it means the event can only be enabled or disabled globally? That's right. The spec for SetEventNotificationMode lists the events cannot be enabled/disabled at the thread level. Both ThreadStart and VirtualThreadStart are listed so I view this JBS/PR issue as fixing the implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1349472883 From dnsimon at openjdk.org Sat Oct 7 09:14:10 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 7 Oct 2023 09:14:10 GMT Subject: RFR: 8317689: [JVMCI] include error message when CreateJavaVM in libgraal fails Message-ID: Creating a new libgraal isolate can fail for a number of reasons. Currently, all that one sees on such a failure is a numeric error code. For example: 2096 20291 4 java.lang.CharacterData::of (136 bytes) 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024) Native Image is being enhanced to return an error message along with an error code by a non-standard `_strerror` argument passed to the `CreateJavaVM` JNI invocation interface function: |---------------|-----------------------------------------------------------------------------------| | _strerror | extraInfo is a "const char**" value. | | | If CreateJavaVM returns non-zero, then extraInfo is assigned a newly malloc'ed | | | 0-terminated C string describing the error if a description is available. | |---------------|-----------------------------------------------------------------------------------| This PR updates JVMCI to take advantage of this Native Image enhancement. This is sample `-XX:+PrintCompilation` output from testing this PR on libgraal: 2096 20291 4 java.lang.CharacterData::of (136 bytes) 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024, Image page size is incompatible with run-time page size. Rebuild image with -H:PageSize=[pagesize] to set appropriately.) ------------- Commit messages: - get error message from CreateJavaVM in libjvmci Changes: https://git.openjdk.org/jdk/pull/16086/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16086&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317689 Stats: 33 lines in 6 files changed: 20 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/16086.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16086/head:pull/16086 PR: https://git.openjdk.org/jdk/pull/16086 From kbarrett at openjdk.org Sat Oct 7 16:26:55 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Oct 2023 16:26:55 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v5] In-Reply-To: References: Message-ID: > Please review this new facility, providing a general mechanism for intrusive > doubly-linked lists. A class supports inclusion in a list by having an > IntrusiveListEntry member, and providing structured information about how to > access that member. A class supports inclusion in multiple lists by having > multiple IntrusiveListEntry members, with different lists specified to use > different members. > > The IntrusiveList class template provides the list management. It is modelled > on bidirectional containers such as std::list and boost::intrusive::list, > providing many of the expected member types and functions. (Note that the > member types use the Standard's naming conventions.) (Not all standard > container requirements are met; some operations are not presently supported > because they haven't been needed yet.) This includes iteration support using > (mostly) standard-conforming iterator types (they are presently missing > iterator_category member types, pending being able to include so we > can use std::bidirectional_iterator_tag). > > This change only provides the new facility, and doesn't include any uses of > it. It is intended to replace the 4-5 (or maybe more) competing intrusive > doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of > those alterantives, this proposal provides a suite of unit tests. > > An example of a place that I think might benefit from this is G1's region > handling. There are various places where G1 iterates over all regions in order > to do something with those which satisfy some property (humongous regions, > regions in the collection set, &etc). If it were trivial to create new region > sublists (and this facility makes that easy), some of these could be turned > into direct iteration over only the regions of interest. > > Some specific points to consider when reviewing this proposal: > > (1) This proposal follows Standard Library API conventions, which differ from > HotSpot in various ways. > > (1a) Lists and iterators provide various type members, with names per the > Standard Library. There has been discussion of using some parts of the > Standard Library eventually, in which case this would be important. But for > now some of the naming choices are atypical for HotSpot. > > (1b) Some of the function signatures follow the Standard Library APIs even > though the reasons for that form might not apply to HotSpot. For example, the > list pop operations don't return the removed value. For node-based containers > in Standard Library that would introduce exception... Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: - fix disposer argument - more erase variants ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15896/files - new: https://git.openjdk.org/jdk/pull/15896/files/0ee7d587..cd98eee5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=03-04 Stats: 135 lines in 2 files changed: 126 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/15896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15896/head:pull/15896 PR: https://git.openjdk.org/jdk/pull/15896 From kbarrett at openjdk.org Sat Oct 7 16:32:06 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Oct 2023 16:32:06 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v5] In-Reply-To: References: Message-ID: On Sat, 7 Oct 2023 16:26:55 GMT, Kim Barrett wrote: >> Please review this new facility, providing a general mechanism for intrusive >> doubly-linked lists. A class supports inclusion in a list by having an >> IntrusiveListEntry member, and providing structured information about how to >> access that member. A class supports inclusion in multiple lists by having >> multiple IntrusiveListEntry members, with different lists specified to use >> different members. >> >> The IntrusiveList class template provides the list management. It is modelled >> on bidirectional containers such as std::list and boost::intrusive::list, >> providing many of the expected member types and functions. (Note that the >> member types use the Standard's naming conventions.) (Not all standard >> container requirements are met; some operations are not presently supported >> because they haven't been needed yet.) This includes iteration support using >> (mostly) standard-conforming iterator types (they are presently missing >> iterator_category member types, pending being able to include so we >> can use std::bidirectional_iterator_tag). >> >> This change only provides the new facility, and doesn't include any uses of >> it. It is intended to replace the 4-5 (or maybe more) competing intrusive >> doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of >> those alterantives, this proposal provides a suite of unit tests. >> >> An example of a place that I think might benefit from this is G1's region >> handling. There are various places where G1 iterates over all regions in order >> to do something with those which satisfy some property (humongous regions, >> regions in the collection set, &etc). If it were trivial to create new region >> sublists (and this facility makes that easy), some of these could be turned >> into direct iteration over only the regions of interest. >> >> Some specific points to consider when reviewing this proposal: >> >> (1) This proposal follows Standard Library API conventions, which differ from >> HotSpot in various ways. >> >> (1a) Lists and iterators provide various type members, with names per the >> Standard Library. There has been discussion of using some parts of the >> Standard Library eventually, in which case this would be important. But for >> now some of the naming choices are atypical for HotSpot. >> >> (1b) Some of the function signatures follow the Standard Library APIs even >> though the reasons for that form might not apply to HotSpot. For example, the >> list pop operations don't return the removed... > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - fix disposer argument > - more erase variants I've pushed a couple more commits: - "entry accessor function" changed from using a pointer-to-data-member to access an objects list entry to using a function. - "more erase variants" adds some more functions for erasing elements that were found useful in conversions - "fix disposer argument" changed the type of the argument passed to a disposer from const_reference to pointer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15896#issuecomment-1751751084 From jkratochvil at openjdk.org Sat Oct 7 17:55:39 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Sat, 7 Oct 2023 17:55:39 GMT Subject: RFR: 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo Message-ID: In OpenJDK project CRaC I had a [need to fetch new CpuidInfo without affecting the existing one](https://github.com/openjdk/crac/commit/ed4ad9ba31b77732dcede2eb743b2f389ec9a0fe#diff-6ed856c57ddbe33e49883adb7c52ec51ed377e5f697dfd6d8bea505a97bfc5a5R2743). Which led me to encapsulate the object more and I think this no-functionality-change patch is even appropriate for JDK. I am sure interested primarily to reduce the CRaC patchset boilerplate. ------------- Commit messages: - 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo Changes: https://git.openjdk.org/jdk/pull/16093/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16093&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317697 Stats: 123 lines in 2 files changed: 29 ins; 23 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/16093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16093/head:pull/16093 PR: https://git.openjdk.org/jdk/pull/16093 From kbarrett at openjdk.org Sat Oct 7 20:42:17 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Oct 2023 20:42:17 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Wed, 4 Oct 2023 09:57:45 GMT, Kim Barrett wrote: > Regarding the mechanism for accessing the entry of an element, I'm now thinking a function-based mechanism (like NonblockingQueue and LockFreeStack) is better than the pointer-to-data-member mechanism currently used here. The benefit of a function-based mechanism is that it doesn't require the element type to be complete at the point of list declaration. It also avoids the need for an MSVC workaround. Using a function accessor rather than a pointer-to-data-member is more general. I've already mentioned that it permits list declarations with an incomplete element type. It also permits Entry objects being arbitrary subobjects. For example, an object could support being in multiple lists by having an array of Entry objects, with an array index per list. A pointer-to-data-member can't designate an individual element of an array member. (One could also have Entry class base class, which also can't be designated by a pointer-to-data-member, but I doubt that's a realistic use-case.) The pointer-to-data-member approach also seems prone to obscure corner-case compiler bugs (one of which, as noted, requires a workaround in this PR, e.g. affects a currently in-use compiler). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15896#issuecomment-1751813462 From kbarrett at openjdk.org Sat Oct 7 20:45:04 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Oct 2023 20:45:04 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Wed, 4 Oct 2023 10:46:41 GMT, Thomas Stuefe wrote: > Cursory glance. > > The list hook element costs 16 bytes on 64-bit. Would it be possible to get a single-linked variant? For many (most?) cases, traversing backward or random access deletion is not needed. That's certainly possible, but is a different RFE/PR. IntrusiveForwardList, as a cognate for std::forward_list. > nit, "subobject" confused me a little. Maybe "element" or "member" ? "subobject" (C++14 1.8/2) is the correct term. "member" would have worked, but doesn't with the change to use a function accessor rather than a pointer-to-data-member. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15896#issuecomment-1751813935 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1349572236 From kbarrett at openjdk.org Sat Oct 7 21:02:14 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Oct 2023 21:02:14 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 10:18:18 GMT, Johan Sj?len wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> add IntrusiveListEntry::is_attached() > > src/hotspot/share/utilities/intrusiveList.hpp line 39: > >> 37: class IntrusiveListImpl; >> 38: >> 39: /** > > Most long form comments I see in Hotspot do not use `/** */` and instead uses `//`, can the style be changed to this? I used distinct commenting styles intentionally, in an attempt to distinguish between "API documentation" and comments on the implementation. I realize this is relatively novel in HotSpot code. I'm open to other ideas, but I think that distinction is important here. I don't think it's a good idea to skimp on the API documentation for a utility like this, leaving questions to be answered by reading the code. That leads to people making assumptions based on the current implementation, making it brittle. Of course, someone might still inadvertently depend on some aspect of the current implementation, but if there is clear API documentation then at least a reviewer can call out violations. Also, as several people have said to me privately, there's a lot of template usage here that may not be familiar to all HotSpot devs. I think that makes requiring someone to read the code to figure out how to use it a problem. I'm actively striving to avoid requiring someone to read the code simply to use it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1349573793 From kbarrett at openjdk.org Sat Oct 7 21:07:04 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Oct 2023 21:07:04 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 10:24:00 GMT, Johan Sj?len wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> add IntrusiveListEntry::is_attached() > > src/hotspot/share/utilities/intrusiveList.hpp line 63: > >> 61: * in a list are externally managed, rather than being embedded values >> 62: * in the list, the actual type of such objects may be more specific >> 63: * than the list's element type. > > Okay, is there a reason that this shouldn't be true? I assume that what you're saying is that we can have: > > ```c++ > struct Super { IntrusiveListEntry entry; }; > struct SubA : public Super {}; > struct SubB : public Super {}; > void foo() { > IntrusiveList my_list; // This my_list may contain SubA, SubB, and Super > } > > > And this seems like it should be true for any reasonable intrusive list in C++. Yes, that works. I mentioned it because I think it's a consequence of intrusive data structures that might not be obvious to someone familiar with things like standard containers or our GrowableArray or the like. (I've not found a term for distinguishing those from the intrusive kind.) > src/hotspot/share/utilities/intrusiveList.hpp line 66: > >> 64: * >> 65: * * T is the class of the elements in the list. Must be a possibly >> 66: * const-qualified class type. > > I don't know what it means to be a 'possibly const-qualified class type'. Perhaps "a class type, possibly const-qualified" would be clearer? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1349574231 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1349574201 From kbarrett at openjdk.org Sat Oct 7 21:21:06 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Oct 2023 21:21:06 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 10:26:59 GMT, Johan Sj?len wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> add IntrusiveListEntry::is_attached() > > src/hotspot/share/utilities/intrusiveList.hpp line 45: > >> 43: * when inserting objects into the list or referencing list objects, >> 44: * and removing an object from a list need not involve destroying the >> 45: * object. > >>As a result, [...] > > We know what an intrusive linked list is, we have at least 5 of them :)! And maybe someday we'll have just (this) one? I can delete this if you really think it's pointless. > src/hotspot/share/utilities/intrusiveList.hpp line 73: > >> 71: * * has_size determines whether the list has a size() >> 72: * operation, returning the number of elements in the list. If the >> 73: * operation is present, it has constant-time complexity. The default > > Surely that depends on the time complexity of the operation? has_size determines whether the list implementation provides a constant-time size operation or doesn't provide a size operation at all. The size operation for Standard Library containers is always constant-time. Some containers, such as std::forward_list, don't provide a size operation because it wouldn't be constant time. This allows one to write things like for (size_t i = 0; i < c.size(); ++i) ... knowing that you've not added another O(N) through that use of size(). It's an option because some use-cases benefit from having it while it's wasted in others. > src/hotspot/share/utilities/intrusiveList.hpp line 78: > >> 76: * * Base is the base class for the list. This is typically >> 77: * used to specify the allocation class. The default is void, indicating >> 78: * no allocation class for the list. > > What's an allocation class? Referring to the HotSpot allocation classes, like CHeapObj<> and ResourceObj. I should clarify that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1349575535 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1349575158 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1349575302 From kbarrett at openjdk.org Sat Oct 7 21:31:16 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Oct 2023 21:31:16 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 10:30:19 GMT, Johan Sj?len wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> add IntrusiveListEntry::is_attached() > > src/hotspot/share/utilities/intrusiveList.hpp line 87: > >> 85: * iterators and access to const-qualified elements. A const object cannot be >> 86: * added to a list whose value type is not const-qualified, as that would be >> 87: * an implicit casting away of the const qualifier. > > Okay, I feel like this can be shortened signifcantly: > >> A const-qualified type can be part of an IntrusiveList, then you only get const iterators and access to const elements. If you use a non-const type, then you can get both const and non-const iterators and access to elements. You can't add const values to a non-const list, as that would be implicitly casting away the const qualifier. That rewrite isn't correct. You _can_ add const values to a non-const list. In fact, you can only add values (const or not) to a non-const list. A const list has type `const IntrusiveList`, while a non-const list is similar but without the `const` qualifier. What you can't do is add a const value to a (necessarily non-const) list of non-const elements. So if T is an unqualified class type, then you can add const values to an `IntrusiveList` but not to an `IntrusiveList`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1349576545 From kbarrett at openjdk.org Sat Oct 7 21:36:07 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Oct 2023 21:36:07 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: <4QX65drx9WV5Y_a2lPwWMJvFpxHHF7zJV_mDvdiiNz0=.e0313812-8f7a-4379-9619-afd936e0f974@github.com> On Thu, 5 Oct 2023 10:35:18 GMT, Johan Sj?len wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> add IntrusiveListEntry::is_attached() > > src/hotspot/share/utilities/intrusiveList.hpp line 93: > >> 91: * argument, a const reference to a removed element. This function should >> 92: * "dispose" of the argument object when called, such as by deleting the >> 93: * object. The result of the call is ignored. > > A lot of this information is more easily seen by reading the code. Saying something like: > >>Operations that remove elements from a list take a disposer function as an argument. A disposer should free up the memory associated with the object. > > Shorter documentation is quicker to read and easier to maintain. There are other uses for disposers than freeing memory. @kstefanj found a nice one while prototyping a conversion of FreeRegionList (which didn't work until the recent commit to fix the disposer API). And again, I don't want to require reading the code to just use it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1349576945 From fyang at openjdk.org Mon Oct 9 03:32:28 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 9 Oct 2023 03:32:28 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v2] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 16:54:14 GMT, Hamlin Li wrote: >> Only vector version is included in this patch. >> >> ### Test >> The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > revert adding t3-t6 Hi, some comments from a brief look. Thanks. src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1293: > 1291: // vector pseudo instructions > 1292: // rotate vector register left with shift bits, 32-bit version > 1293: inline void vrol_vwi(VectorRegister vd, uint32_t shift, VectorRegister tmp_vr) { Since this is not a narrowing/widening vector opertation, I don't think it's appropriate to use the `_vwi` suffix according to the naming convension of the RVV spec. Maybe rename this as `vrole32_vi` ? Also note that support for vector rotate left/right has been added by RISC-V cryptography extensions which have been ratified recently. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4287: > 4285: __ vrol_vwi(dVec, 16, tmp_vr); > 4286: > 4287: // rev32(dVec, T8H, dVec); Irrelevant comment? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4312: > 4310: * c_rarg1 - key_stream, the array that will hold the result of the ChaCha20 block function > 4311: */ > 4312: address generate_chacha20Block() { I think we should add some more comments to help understand the code. The code comments in the aarch64 counterpart might be a good reference: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4244 src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4318: > 4316: StubCodeMark mark(this, "StubRoutines", "chacha20Block"); > 4317: address start = __ pc(); > 4318: Should we start a new frame with `enter()` on entry and `leave()` on exit for proper stackwalking of RuntimeStub frame? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4324: > 4322: const Register key_stream = c_rarg1; > 4323: const Register length = t0; > 4324: const Register tmp_addr = t1; `t0` as scratch register are frequently and sometimes implicitly used/clobbered by various assembler functions. So it's not a good idea to let `length` alias 't0' as it is live across this stub. Maybe swap `t0` and `t1` for `length` and `tmp_addr` respectively. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4328: > 4326: const VectorRegister work_vrs[16] = { > 4327: v4, v5, v6, v7, v16, v17, v18, v19, > 4328: v20, v21, v22, v23, v24, v25, v26, v27 Is there a rule on how those vector registers are chosen here? Can we simply start from `v0`? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4338: > 4336: // in java level. > 4337: __ mv(avl, 16); > 4338: __ vsetvli(length, avl, Assembler::e32, Assembler::m1); I think we can make use of `vsetivli` instruction to save the use of register `t2` here. RVV Spec: `For the vsetivli instruction, the AVL is encoded as a 5-bit zero-extended immediate (0 - 31) in the rs1 field` src/hotspot/cpu/riscv/vm_version_riscv.cpp line 257: > 255: } else if (UseChaCha20Intrinsics) { > 256: if (!FLAG_IS_DEFAULT(UseChaCha20Intrinsics)) { > 257: warning("Chacha20 Intrinsics requires RVV instructions (not available on this CPU)"); Suggestion: s/Intrinsics/intrinsic/ ------------- PR Review: https://git.openjdk.org/jdk/pull/15899#pullrequestreview-1650751154 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1342042824 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1342131300 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1349837665 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1342129509 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1349836272 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1342130428 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1342043706 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1341347405 From svkamath at openjdk.org Mon Oct 9 04:58:10 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 9 Oct 2023 04:58:10 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v4] In-Reply-To: References: Message-ID: > Hi All, > I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. > > Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: > > |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup > |-------------|------------|---------------|------------------|-----------| > |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 > full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 > small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 > small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 > full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 > full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 > small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 > small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 > full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 > small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 > small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 > full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 > full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 > small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 > small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 > full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 > small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 > small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 > full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 > full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 > small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 > small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 > full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 > small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 > small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 > full.AESGCMBench.decryptMultiPart | 65536 | 42649.816 | 47591.587 |1.11 > full.AESGCMBe... Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Updated code as per review comments, added new comments for every method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15410/files - new: https://git.openjdk.org/jdk/pull/15410/files/c92f98ab..77fd61b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=02-03 Stats: 132 lines in 2 files changed: 68 ins; 26 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/15410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15410/head:pull/15410 PR: https://git.openjdk.org/jdk/pull/15410 From svkamath at openjdk.org Mon Oct 9 05:01:18 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 9 Oct 2023 05:01:18 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v4] In-Reply-To: References: Message-ID: On Mon, 9 Oct 2023 04:58:10 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated code as per review comments, added new comments for every method @djelinski, @sviswa7, Thanks for your comments. I have addressed them in the latest commits. Kindly take a look and let me know your thoughts. Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15410#issuecomment-1752347949 From kbarrett at openjdk.org Mon Oct 9 06:25:20 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 9 Oct 2023 06:25:20 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Sat, 7 Oct 2023 21:28:36 GMT, Kim Barrett wrote: >> src/hotspot/share/utilities/intrusiveList.hpp line 87: >> >>> 85: * iterators and access to const-qualified elements. A const object cannot be >>> 86: * added to a list whose value type is not const-qualified, as that would be >>> 87: * an implicit casting away of the const qualifier. >> >> Okay, I feel like this can be shortened signifcantly: >> >>> A const-qualified type can be part of an IntrusiveList, then you only get const iterators and access to const elements. If you use a non-const type, then you can get both const and non-const iterators and access to elements. You can't add const values to a non-const list, as that would be implicitly casting away the const qualifier. > > That rewrite isn't correct. > > You _can_ add const values to a non-const list. In fact, you can only add > values (const or not) to a non-const list. A const list has type `const > IntrusiveList`, while a non-const list is similar but without the > `const` qualifier. > > What you can't do is add a const value to a (necessarily non-const) list of > non-const elements. So if T is an unqualified class type, then you can add > const values to an `IntrusiveList` but not to an > `IntrusiveList`. Is something like this more clear? In my experience, non-intrusive collections with const-qualified elements are uncommon, so I found the implications here not immediately obvious. * A const iterator has a const-qualified element type, and provides const * access to the elements of the associated list. A non-const iterator has an * unqualified element type, and provides mutable element access. A non-const * iterator is implicitly convertible to a corresponding const iterator. * * A const list provides const iterators and access to const-qualified * elements, and cannot be used to modify the sequence of elements. Only a * non-const list can be used to modify the sequence of elements. * * A list can have a const-qualified element type, providing const iterators * and access to const-qualified elements. A const object cannot be added to * a list with an unqualified element type, as that wuold be an implicit * casting away of the const qualifier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1349894749 From duke at openjdk.org Mon Oct 9 06:52:51 2023 From: duke at openjdk.org (Liming Liu) Date: Mon, 9 Oct 2023 06:52:51 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v2] In-Reply-To: References: Message-ID: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Liming Liu has updated the pull request incrementally with six additional commits since the last revision: - Use pointer_delta to calculate the distance - Add a sanity check for MADV_POPULATE_WRITE - Fix grammar issues - Add assertions in pretouch_memory_common - Cuddle ptr-operators with types to meet HotSpot style - Remove the Unnecessary line continue character ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15781/files - new: https://git.openjdk.org/jdk/pull/15781/files/1d1c8349..51533b9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=00-01 Stats: 20 lines in 7 files changed: 7 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From duke at openjdk.org Mon Oct 9 07:00:06 2023 From: duke at openjdk.org (Liming Liu) Date: Mon, 9 Oct 2023 07:00:06 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v3] In-Reply-To: References: Message-ID: <1yvPQl57ipfn7sd_gi1pN1y731drqlMm7WPrzrtQyww=.97958bba-b0d6-460e-a974-bf89314aa733@github.com> > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Cuddle ptr-operators in pretouch_memory_common ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15781/files - new: https://git.openjdk.org/jdk/pull/15781/files/51533b9a..b265cdfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From duke at openjdk.org Mon Oct 9 07:26:14 2023 From: duke at openjdk.org (Liming Liu) Date: Mon, 9 Oct 2023 07:26:14 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v3] In-Reply-To: <1yvPQl57ipfn7sd_gi1pN1y731drqlMm7WPrzrtQyww=.97958bba-b0d6-460e-a974-bf89314aa733@github.com> References: <1yvPQl57ipfn7sd_gi1pN1y731drqlMm7WPrzrtQyww=.97958bba-b0d6-460e-a974-bf89314aa733@github.com> Message-ID: On Mon, 9 Oct 2023 07:00:06 GMT, Liming Liu wrote: >> As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). >> >> Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: >> >> >> >> >> >> >> >> >> >> >> >>
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
> > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Cuddle ptr-operators in pretouch_memory_common > Side note, does anyone know why we pretouch memory for _explicit_ large pages? I would have thought that memory is already online and as "live" as it can get once it is mmapped. `UseTransparentHugePages` just gives kernel advice to use transparent huge pages. It is not regular huge pages that need to be allocated explicitly through /sys/kernel/mm/hugepages. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1752467996 From kbarrett at openjdk.org Mon Oct 9 07:53:30 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 9 Oct 2023 07:53:30 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v3] In-Reply-To: References: <0oZ06FXPfZ6SEJBdPRrUEiW3tqR7hTWkXUeXdzqVyNo=.e782f692-9152-44db-9847-7292e4afa7a0@github.com> Message-ID: <_iyaqt-pOBTlpAhEqlJqaE2wN9rohTwLgXDzZE3g2kA=.0d7da7d7-148a-49eb-8877-c92d74c124c7@github.com> On Wed, 4 Oct 2023 14:01:36 GMT, Thomas Stuefe wrote: >> src/hotspot/share/gc/shared/pretouchTask.cpp line 75: >> >>> 73: // initially always use small pages. >>> 74: page_size = UseTransparentHugePages ? (size_t)os::vm_page_size() : page_size; >>> 75: #endif >> >> I never liked this, so happy to see it gone. > > It was also the wrong place for this fix since it left out naked calls to os::pretouch_memory. ZGC, and I think Shenandoah, have special handling of UseTransparentHugePages, arranging to similarly call os::pretouch_memory with small page sizes when that option is true. It might be that this change enables some further simplifications (that should perhaps be done as followups rather than added to this). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1349961302 From duke at openjdk.org Mon Oct 9 08:00:16 2023 From: duke at openjdk.org (Liming Liu) Date: Mon, 9 Oct 2023 08:00:16 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v3] In-Reply-To: <0oZ06FXPfZ6SEJBdPRrUEiW3tqR7hTWkXUeXdzqVyNo=.e782f692-9152-44db-9847-7292e4afa7a0@github.com> References: <0oZ06FXPfZ6SEJBdPRrUEiW3tqR7hTWkXUeXdzqVyNo=.e782f692-9152-44db-9847-7292e4afa7a0@github.com> Message-ID: On Fri, 29 Sep 2023 07:08:19 GMT, Kim Barrett wrote: > PretouchTask attempts to parallelize the pretouching. How well does that work with the use of MADV_POPULATE_WRITE? I tested it on 64c aarch64 machines with 24GB heaps, 64 gc threads and kernel 6.1, and the startup time of JVM was changed from 0.27s to 0.33s when transparent huge pages (THP) were disabled, while the startup time was reduced from 3.54s to 0.33s when THP were used. The point of the use of MADV_POPULATE_WRITE is to avoid kernel from copying small pages around to form a huge page, and this behavior was cause by https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3917c80280c9 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1752510098 From tschatzl at openjdk.org Mon Oct 9 08:28:32 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Oct 2023 08:28:32 GMT Subject: RFR: 8317440: Lock rank checking fails when code root set is modified with the Servicelock held after JDK-8315503 In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 11:49:21 GMT, Coleen Phillimore wrote: >> Hi all, >> >> please review this change that fixes lock ranking after recent changes to the code root set, now using a CHT. >> >> The issue came up because the lock rank of the CHT lock has been larger than the rank of the Servicethread_lock where it is possible that code roots can be added. >> >> The suggested solution is to fix up the lock rankings to work; actually this PR contains two variants: >> 1) one that statically sets the lock ranks of the CHT lock (and the ThreadSMR_lock that can be used during CHT operation) to something smaller than Servicethread_lock. >> 2) one that allows setting of the CHT lock rank via parameter as well (the last commit changed the code to variant 1). >> >> The other lock ranking changes to Metaspace_lock and ContinuationRelativize_lock are simply undos of the respective changes in [JDK-8315503](https://bugs.openjdk.org/browse/JDK-8315503). >> >> Testing: tier1-8 for variant 2), tier 1-7 for variant 1) >> >> Thanks, >> Thomas > > Variant 1 seems ok. Uses of the CHT shouldn't take locks, so having a low lock ranking for CHT lock seems like it'll be fine (I can't find where it takes the ThreadsSMRDelete_lock). If any of this breaks, we can try approach #2 next. Thanks @coleenp @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/16062#issuecomment-1752548969 From tschatzl at openjdk.org Mon Oct 9 08:31:41 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Oct 2023 08:31:41 GMT Subject: Integrated: 8317440: Lock rank checking fails when code root set is modified with the Servicelock held after JDK-8315503 In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 17:19:35 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that fixes lock ranking after recent changes to the code root set, now using a CHT. > > The issue came up because the lock rank of the CHT lock has been larger than the rank of the Servicethread_lock where it is possible that code roots can be added. > > The suggested solution is to fix up the lock rankings to work; actually this PR contains two variants: > 1) one that statically sets the lock ranks of the CHT lock (and the ThreadSMR_lock that can be used during CHT operation) to something smaller than Servicethread_lock. > 2) one that allows setting of the CHT lock rank via parameter as well (the last commit changed the code to variant 1). > > The other lock ranking changes to Metaspace_lock and ContinuationRelativize_lock are simply undos of the respective changes in [JDK-8315503](https://bugs.openjdk.org/browse/JDK-8315503). > > Testing: tier1-8 for variant 2), tier 1-7 for variant 1) > > Thanks, > Thomas This pull request has now been integrated. Changeset: 0cf1a558 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/0cf1a558bacf18d9fc41e43fb5e9eba39dc51f2e Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod 8317440: Lock rank checking fails when code root set is modified with the Servicelock held after JDK-8315503 Reviewed-by: coleenp, ayang ------------- PR: https://git.openjdk.org/jdk/pull/16062 From kbarrett at openjdk.org Mon Oct 9 09:23:09 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 9 Oct 2023 09:23:09 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v5] In-Reply-To: References: Message-ID: On Sat, 7 Oct 2023 16:26:55 GMT, Kim Barrett wrote: >> Please review this new facility, providing a general mechanism for intrusive >> doubly-linked lists. A class supports inclusion in a list by having an >> IntrusiveListEntry member, and providing structured information about how to >> access that member. A class supports inclusion in multiple lists by having >> multiple IntrusiveListEntry members, with different lists specified to use >> different members. >> >> The IntrusiveList class template provides the list management. It is modelled >> on bidirectional containers such as std::list and boost::intrusive::list, >> providing many of the expected member types and functions. (Note that the >> member types use the Standard's naming conventions.) (Not all standard >> container requirements are met; some operations are not presently supported >> because they haven't been needed yet.) This includes iteration support using >> (mostly) standard-conforming iterator types (they are presently missing >> iterator_category member types, pending being able to include so we >> can use std::bidirectional_iterator_tag). >> >> This change only provides the new facility, and doesn't include any uses of >> it. It is intended to replace the 4-5 (or maybe more) competing intrusive >> doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of >> those alterantives, this proposal provides a suite of unit tests. >> >> An example of a place that I think might benefit from this is G1's region >> handling. There are various places where G1 iterates over all regions in order >> to do something with those which satisfy some property (humongous regions, >> regions in the collection set, &etc). If it were trivial to create new region >> sublists (and this facility makes that easy), some of these could be turned >> into direct iteration over only the regions of interest. >> >> Some specific points to consider when reviewing this proposal: >> >> (1) This proposal follows Standard Library API conventions, which differ from >> HotSpot in various ways. >> >> (1a) Lists and iterators provide various type members, with names per the >> Standard Library. There has been discussion of using some parts of the >> Standard Library eventually, in which case this would be important. But for >> now some of the naming choices are atypical for HotSpot. >> >> (1b) Some of the function signatures follow the Standard Library APIs even >> though the reasons for that form might not apply to HotSpot. For example, the >> list pop operations don't return the removed... > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - fix disposer argument > - more erase variants src/hotspot/share/utilities/intrusiveList.hpp line 365: > 363: IntrusiveListImpl::IteratorImpl> > 364: { > 365: using const_reference = std::add_lvalue_reference_t>; This should use ListTraits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1350038592 From stuefe at openjdk.org Mon Oct 9 10:08:14 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 9 Oct 2023 10:08:14 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v3] In-Reply-To: References: <0oZ06FXPfZ6SEJBdPRrUEiW3tqR7hTWkXUeXdzqVyNo=.e782f692-9152-44db-9847-7292e4afa7a0@github.com> Message-ID: <9nNqQxEdadu-FqHpiozaslDiY2qRWpBkyTA8lj2KWHA=.05de49c5-8b62-4ab5-bad0-04a29333864c@github.com> On Mon, 9 Oct 2023 07:57:04 GMT, Liming Liu wrote: > > PretouchTask attempts to parallelize the pretouching. How well does that work with the use of MADV_POPULATE_WRITE? > > I tested it on 64c aarch64 machines with 24GB heaps, 64 gc threads and kernel 6.1, and the startup time of JVM was changed from 0.27s to 0.33s when transparent huge pages (THP) were disabled, while the startup time was reduced from 3.54s to 0.33s when THP were used. The point of the use of MADV_POPULATE_WRITE is to avoid kernel from copying small pages around to form a huge page, and this behavior was cause by https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3917c80280c9 I would only use the system call for pre-touching if we actually use THPs. Since THPs can be unconditionally enabled without user setting a flag, I'd do: if (HugePages::thp_mode() == THPMode::always || UseTransparentHugePages) { madvise... } else { use old method } Note: if THPs are disabled on the system, VM sets UseTransparentHugePages to false. So we have to deal with just 2 states: THPs always mode, UTHP can be 1 or 0 THPs madvise mode, UTHP must be 1, otherwise VM does not use THPs ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1752654923 From jsjolen at openjdk.org Mon Oct 9 12:30:10 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 9 Oct 2023 12:30:10 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Sat, 7 Oct 2023 21:03:49 GMT, Kim Barrett wrote: >> src/hotspot/share/utilities/intrusiveList.hpp line 66: >> >>> 64: * >>> 65: * * T is the class of the elements in the list. Must be a possibly >>> 66: * const-qualified class type. >> >> I don't know what it means to be a 'possibly const-qualified class type'. > > Perhaps "a class type, possibly const-qualified" would be clearer? Aha, yes! That's much clearer to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1350241687 From mdoerr at openjdk.org Mon Oct 9 12:43:00 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 9 Oct 2023 12:43:00 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object Valid point. Thanks! I've got it to work: diff --git a/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp index 78361a305ae..4571036477e 100644 --- a/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp @@ -135,6 +135,8 @@ void C1_MacroAssembler::unlock_object(Register hdr, Register obj, Register disp_ if (LockingMode == LM_LIGHTWEIGHT) { movptr(disp_hdr, Address(obj, hdr_offset)); + testptr(disp_hdr, markWord::monitor_value); + jcc(Assembler::notEqual, slow_case); andptr(disp_hdr, ~(int32_t)markWord::lock_mask_in_place); lightweight_unlock(obj, disp_hdr, hdr, slow_case); } else if (LockingMode == LM_LEGACY) { diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp index 2154601f2f2..3666d1490fc 100644 --- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp @@ -863,7 +863,7 @@ void C2_MacroAssembler::fast_unlock(Register objReg, Register boxReg, Register t jccb (Assembler::notZero, CheckSucc); // Without cast to int32_t this style of movptr will destroy r10 which is typically obj. movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD); - jmpb (DONE_LABEL); + jmp (DONE_LABEL); // Try to avoid passing control into the slow_path ... bind (CheckSucc); diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.cpp b/src/hotspot/cpu/x86/macroAssembler_x86.cpp index 26135c65418..0b3af526b32 100644 --- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp @@ -9836,6 +9836,15 @@ void MacroAssembler::lightweight_unlock(Register obj, Register hdr, Register tmp assert(hdr == rax, "header must be in rax for cmpxchg"); assert_different_registers(obj, hdr, tmp); + { + Label tos_ok; + movl(tmp, Address(r15_thread, JavaThread::lock_stack_top_offset())); + cmpptr(obj, Address(r15_thread, tmp, Address::times_1, -oopSize)); + jcc(Assembler::equal, tos_ok); + STOP("Top of lock-stack does not match the unlocked object"); + bind(tos_ok); + } + // Mark-word must be lock_mask now, try to swing it back to unlocked_value. movptr(tmp, hdr); // The expected old value orptr(tmp, markWord::unlocked_value); The nsk/jdi/StepEvent tests have passed on my x86_64 machine. I couldn't reproduce the issue on that platform. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1752936786 From ayang at openjdk.org Mon Oct 9 13:06:38 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 9 Oct 2023 13:06:38 GMT Subject: RFR: 8317730: Change byte_size to return size_t Message-ID: Simple signature update to `byte_size` to match expectation from callers. ------------- Commit messages: - byte-size Changes: https://git.openjdk.org/jdk/pull/16100/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16100&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317730 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16100.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16100/head:pull/16100 PR: https://git.openjdk.org/jdk/pull/16100 From dcubed at openjdk.org Mon Oct 9 13:20:00 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 9 Oct 2023 13:20:00 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: <1Eb95C9_AH5UD6ldaBIIywbyR7hj1LG12xmEiQ7_wLM=.e85fbdf8-4f72-440e-bec7-b8bb691abe6c@github.com> On Mon, 9 Oct 2023 06:22:48 GMT, Kim Barrett wrote: >> That rewrite isn't correct. >> >> You _can_ add const values to a non-const list. In fact, you can only add >> values (const or not) to a non-const list. A const list has type `const >> IntrusiveList`, while a non-const list is similar but without the >> `const` qualifier. >> >> What you can't do is add a const value to a (necessarily non-const) list of >> non-const elements. So if T is an unqualified class type, then you can add >> const values to an `IntrusiveList` but not to an >> `IntrusiveList`. > > Is something like this more clear? In my experience, non-intrusive collections > with const-qualified elements are uncommon, so I found the implications here > not immediately obvious. > > > * A const iterator has a const-qualified element type, and provides const > * access to the elements of the associated list. A non-const iterator has an > * unqualified element type, and provides mutable element access. A non-const > * iterator is implicitly convertible to a corresponding const iterator. > * > * A const list provides const iterators and access to const-qualified > * elements, and cannot be used to modify the sequence of elements. Only a > * non-const list can be used to modify the sequence of elements. > * > * A list can have a const-qualified element type, providing const iterators > * and access to const-qualified elements. A const object cannot be added to > * a list with an unqualified element type, as that wuold be an implicit > * casting away of the const qualifier. Nit typo: s/wuold/would/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1350296773 From coleenp at openjdk.org Mon Oct 9 14:09:02 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Oct 2023 14:09:02 GMT Subject: RFR: 8317730: Change byte_size to return size_t In-Reply-To: References: Message-ID: On Mon, 9 Oct 2023 12:57:34 GMT, Albert Mingkun Yang wrote: > Simple signature update to `byte_size` to match expectation from callers. This looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16100#pullrequestreview-1664533639 From mli at openjdk.org Mon Oct 9 14:55:42 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 9 Oct 2023 14:55:42 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v3] In-Reply-To: References: Message-ID: <-SH1KoNgpnXy3nRtsnVHW-EaZtRKytDCPwS53-ngwxM=.50ee506e-e680-421d-8a7a-9adca26f980a@github.com> > Only vector version is included in this patch. > > ### Test > The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Ajust code according round 3 reviewing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15899/files - new: https://git.openjdk.org/jdk/pull/15899/files/fc19cb23..3b42ce13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15899&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15899&range=01-02 Stats: 43 lines in 3 files changed: 23 ins; 4 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/15899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15899/head:pull/15899 PR: https://git.openjdk.org/jdk/pull/15899 From mli at openjdk.org Mon Oct 9 14:56:11 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 9 Oct 2023 14:56:11 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v2] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 16:54:14 GMT, Hamlin Li wrote: >> Only vector version is included in this patch. >> >> ### Test >> The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > revert adding t3-t6 Hi Yang Fei, Thanks for the detailed reviewing! I have modified the code as you suggested, except of refining the code with `vector rotate left/right`, I will address this refinement in the subsequent pr soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15899#issuecomment-1753161965 From rrich at openjdk.org Mon Oct 9 15:34:05 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 9 Oct 2023 15:34:05 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v15] In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 12:15:40 GMT, Albert Mingkun Yang wrote: > I find pre-processing card-table removes much complexity in determining which (part of) obj belongs to current stripe. However, synchronizing with actual scavenging introduce some complexity. The complexity for synchronization is not too bad though. Also it only comes from overlapping card table preprocessing with scavenging. I think this could be removed again without loosing performance. > The fact that `find_first_clean_card` copies the cached-obj-start is easy to miss Yes, it is easy to miss. I thought it was a minor detail anyway. > and hard to reason IMO. It could be passed by reference if the query in `process_range` would be pulled up before the `find_first_clean_card` call. Let me know if you think that was better. > > we would have a read only copy of the card table only for the current stripe. > > It would still require pre-processing card-table, right? Otherwise, I don't see how one can work around the "interference" across stripes. Maybe this can simplify the impl of `find_first_clean_card`. That's correct. The implementation should be straight forward. I think I'll experiment with it. > > I am not too concerned about the regression observed for "large (32K) non-array instances", because that pattern is not common in java and the pause-time is still reasonable (<100ms). Agreed. > The long-term optimization (or the redemption of the extra-mem-requirement) I have in mind is to use 1 bit (instead of 1 byte) for a card -- Parallel requires only a boolean info for a particular card. One can even pre-alloc two card-tables now that each card-table is 1/8 of its original size, to avoid calling malloc inside young-gc-pause. > > My preference is some simple code without much regression. Ofc, this is quite subjective. Sure. My first preference would be that the change can be backported. We were discussing internally if the increased memory consumption could be an issue. Since environments that are sensitive to this either configure serial or g1 we thought it could be ok. At least from our point of view. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1753232330 From rrich at openjdk.org Mon Oct 9 15:42:02 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 9 Oct 2023 15:42:02 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v16] In-Reply-To: References: Message-ID: <7MCfxKnwPdEgJ_bTJ6T-WGBaiUTm3v_zNuED3OdbNP0=.beb49d29-ad0e-40ab-a0f8-0fff5373dbc4@github.com> > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: - find_first_clean_card: return end_card if final object extends beyond it. - Cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/d845e650..272ab97b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=14-15 Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From kbarrett at openjdk.org Mon Oct 9 18:13:00 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 9 Oct 2023 18:13:00 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v5] In-Reply-To: References: Message-ID: On Mon, 9 Oct 2023 09:07:11 GMT, Kim Barrett wrote: >> Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: >> >> - fix disposer argument >> - more erase variants > > src/hotspot/share/utilities/intrusiveList.hpp line 365: > >> 363: IntrusiveListImpl::IteratorImpl> >> 364: { >> 365: using const_reference = std::add_lvalue_reference_t>; > > This should use ListTraits. Fixed locally, to be included in next push. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1350640840 From kbarrett at openjdk.org Mon Oct 9 18:12:57 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 9 Oct 2023 18:12:57 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Mon, 9 Oct 2023 12:27:31 GMT, Johan Sj?len wrote: >> Perhaps "a class type, possibly const-qualified" would be clearer? > > Aha, yes! That's much clearer to me. Fixed locally, will be in next push. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1350638668 From kbarrett at openjdk.org Mon Oct 9 18:41:01 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 9 Oct 2023 18:41:01 GMT Subject: RFR: 8317730: Change byte_size to return size_t In-Reply-To: References: Message-ID: <2Lc8rkVmxtf8GLdNC2tKWRnoBin5TixdrtBKNiWHm6U=.ad5dceb1-5155-44f1-929a-d1fd4fc45335@github.com> On Mon, 9 Oct 2023 12:57:34 GMT, Albert Mingkun Yang wrote: > Simple signature update to `byte_size` to match expectation from callers. Looks good. I checked all the uses, and they all are dealing with size_t. Thanks for spotting and fixing. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16100#pullrequestreview-1665088255 From pchilanomate at openjdk.org Mon Oct 9 18:41:03 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 9 Oct 2023 18:41:03 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 17:55:12 GMT, Leonid Mesnik wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - add comment to tests >> - use driver + @requires vm.flagless > > Thanks for the changes. The test looks good for me. Thanks for the reviews @lmesnik and @theRealAph. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15972#issuecomment-1753475113 From pchilanomate at openjdk.org Mon Oct 9 18:41:05 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 9 Oct 2023 18:41:05 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: On Fri, 29 Sep 2023 16:35:18 GMT, Patricio Chilano Mateo wrote: >> Please review the following patch. As explained in the bug comments the problem is that os::get_sender_for_C_frame() always constructs a frame as if the sender is also a native C/C++ frame. Setting a correct value for _unextended_sp is important to avoid crashes if this value is later used to get that frame's caller, which will happen if we end up calling frame::sender_for_compiled_frame(). >> >> The issue exists on aarch64 for both linux and macos but the fix for linux is different. The "Procedure Call Standard for the Arm 64-bit Architecture" doesn't specify a location for the frame record within a stack frame (6.4.6), and gcc happens to choose to save it the top of the frame (lowest address) rather than the bottom. This means that changing fr->link() for fr->sender_sp() won't work. The fix is to use the value of fr->link() but adjusted using the code blob frame size before setting it as the _unextended_sp of the sender frame. While working on this fix I realized the issue is not only when the sender is a native nmethod but with all frames associated with a CodeBlob with a frame size > 0 (runtime stub, safepoint stub, etc) so the check takes that into account. I also made a small fix to next_frame() since these mentioned frames should also use frame::sender(). >> >> I created a new test to verify that walking the stack over a native nmethod or runtime stub now works okay. I'll try to add a reliable test case for walking over a safepoint stub too. I tested the fix by running the new test and also running tiers1-4 in mach5. I'll run the upper tiers too. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - add comment to tests > - use driver + @requires vm.flagless @dholmes-ora are you okay with the last version? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15972#issuecomment-1753475722 From cslucas at openjdk.org Mon Oct 9 21:50:56 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 9 Oct 2023 21:50:56 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v2] In-Reply-To: References: Message-ID: On Tue, 3 Oct 2023 08:43:46 GMT, Tobias Hartmann wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typo in test. > > I didn't look at this in detail yet but submitted testing. I see the following failures. > > `compiler/eliminateAutobox/TestByteBoxing.java` with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/workspace/open/src/hotspot/share/opto/loopnode.cpp:2178), pid=951972, tid=951999 > # assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed > # > # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc > > Current CompileTask: > C2: 1438 263 % b compiler.eliminateAutobox.TestByteBoxing::main @ 1358 (1805 bytes) > > Stack: [0x00007f0efc9cb000,0x00007f0efcacb000], sp=0x00007f0efcac57a0, free space=1001k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc (loopnode.cpp:2178) > V [libjvm.so+0x1256ead] PathFrequency::to(Node*)+0x70d (loopPredicate.cpp:988) > V [libjvm.so+0x1258b49] PhaseIdealLoop::loop_predication_impl(IdealLoopTree*)+0x8e9 (loopPredicate.cpp:1462) > V [libjvm.so+0x125989a] IdealLoopTree::loop_predication(PhaseIdealLoop*)+0x9a (loopPredicate.cpp:1536) > V [libjvm.so+0x12a28d7] PhaseIdealLoop::build_and_optimize()+0xf57 (loopnode.cpp:4582) > V [libjvm.so+0x9ee7fb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1114) > V [libjvm.so+0x9e9db6] Compile::Optimize()+0xdf6 (compile.cpp:2362) > > > `compiler/eliminateAutobox/TestByteBoxing.java` with `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers`: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (workspace/open/src/hotspot/share/opto/loopnode.cpp:6035), pid=1353611, tid=1353627 > # Error: ShouldNotReachHere() > # > # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartma... Hello @TobiHartmann, I pushed a fix for the test failures that you reported. Could you please re-run your tests? Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1753928457 From dlong at openjdk.org Mon Oct 9 22:35:00 2023 From: dlong at openjdk.org (Dean Long) Date: Mon, 9 Oct 2023 22:35:00 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object With your fix for load_interpreter_state, you should be able to set may_be_unordered to false for c1 and c2 calls now, even for osr, right? Other platforms may still need may_be_unordered for interpreter calls. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1753989786 PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1753991342 From iklam at openjdk.org Mon Oct 9 23:14:17 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 9 Oct 2023 23:14:17 GMT Subject: RFR: 8317761: Combine two versions of print_statistics() in java.cpp Message-ID: The non-product version of `print_statistics()` can be compiled in the product build as well, so there's no need to keep two different versions. BTW, for some reason `print_method_profiling_data()` is called unconditionally in non-product builds, but is guarded by `PrintMethodData` in product builds. I made the behavior the same as before in this PR. This can probably be cleaned up in a future PR. I verified that: - All functions call by the original product version are also called by the non-product version (but could be in different order). - For functions that are called by the non-product version but not call by the original product version: all such calls are guarded by non-product flags (e.g. `TimeOopMap`), or the function itself does nothing (e.g., declared as `PRODUCT_RETURN`) Testing: tier1, tier2, build-tiers5 ------------- Commit messages: - 8317761: Combine two versions of print_statistics() in java.cpp Changes: https://git.openjdk.org/jdk/pull/16110/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16110&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317761 Stats: 64 lines in 2 files changed: 12 ins; 51 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16110/head:pull/16110 PR: https://git.openjdk.org/jdk/pull/16110 From sspitsyn at openjdk.org Mon Oct 9 23:56:58 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 9 Oct 2023 23:56:58 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v2] In-Reply-To: References: Message-ID: <80GauRmG9lobuW6bqt4L2dikGWQ4-T2Hz3BS_fuZWf8=.d435a4c2-5e91-4331-a7d2-83213dcd238a@github.com> On Sat, 7 Oct 2023 06:31:08 GMT, Alan Bateman wrote: >> src/hotspot/share/prims/jvmti.xml line 13044: >> >>> 13042: >>> 13043: >> 13044: id="VirtualThreadStart" const="JVMTI_EVENT_VIRTUAL_THREAD_START" num="87" phase="start" since="21"> >> >> Does "filtered" mean that the event can be enabled or disabled on a per thread basis, and therefore by removing this it means the event can only be enabled or disabled globally? > > That's right. The spec for SetEventNotificationMode lists the events that cannot be enabled/disabled at the thread level. Both ThreadStart and VirtualThreadStart are listed so I view this JBS/PR issue as fixing the implementation. Thank you for explaining, Alan. There is nothing to add. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1351005124 From sspitsyn at openjdk.org Mon Oct 9 23:57:01 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 9 Oct 2023 23:57:01 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v2] In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 22:59:56 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: convert check for is_hidden_from_external_view check() into assert > > src/hotspot/share/prims/jvmtiExport.cpp line 1581: > >> 1579: assert(!thread->is_hidden_from_external_view(), "carrier threads can't be hidden"); >> 1580: >> 1581: // Do not post virtual thread start event for hidden java thread. > > Why would we ever have a hidden virtual thread? Also, why is this comment here. It is also below, which seems to be the more appropriate location. Good catch, thanks. Forgot to remove this comment. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1351004505 From sspitsyn at openjdk.org Tue Oct 10 00:02:48 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 10 Oct 2023 00:02:48 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v3] In-Reply-To: References: Message-ID: > The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. > The fix includes: > - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec > - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask > - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function > > The fix also includes a couple of minor unification tweaks: > - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` to have a unified check for the `JVMTI_PHASE_PRIMORDIAL`. > - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` > > Testing: ran mach5 tiers 1-6. All tests are passed. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: removed unneeded comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16019/files - new: https://git.openjdk.org/jdk/pull/16019/files/46ce6453..c84931af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16019&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16019&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16019/head:pull/16019 PR: https://git.openjdk.org/jdk/pull/16019 From cjplummer at openjdk.org Tue Oct 10 00:12:57 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 10 Oct 2023 00:12:57 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v2] In-Reply-To: References: Message-ID: On Mon, 9 Oct 2023 23:52:40 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiExport.cpp line 1581: >> >>> 1579: assert(!thread->is_hidden_from_external_view(), "carrier threads can't be hidden"); >>> 1580: >>> 1581: // Do not post virtual thread start event for hidden java thread. >> >> Why would we ever have a hidden virtual thread? Also, why is this comment here. It is also below, which seems to be the more appropriate location. > > Good catch, thanks. Forgot to remove this comment. Fixed now. I still would like to know how we might end up with a hidden virtual thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1351024365 From amenkov at openjdk.org Tue Oct 10 00:24:05 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 10 Oct 2023 00:24:05 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v3] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 00:02:48 GMT, Serguei Spitsyn wrote: >> The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. >> The fix includes: >> - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec >> - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask >> - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function >> >> The fix also includes a couple of minor unification tweaks: >> - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` to have a unified check for the `JVMTI_PHASE_PRIMORDIAL`. >> - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` >> >> Testing: ran mach5 tiers 1-6. All tests are passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: removed unneeded comment src/hotspot/share/prims/jvmtiExport.cpp line 1614: > 1612: > 1613: // Do not post virtual thread end event for hidden java thread. > 1614: if (state->is_enabled(JVMTI_EVENT_VIRTUAL_THREAD_END) && Should this be assert like in vthread_start? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1351061395 From sspitsyn at openjdk.org Tue Oct 10 00:41:59 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 10 Oct 2023 00:41:59 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v2] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 00:09:53 GMT, Chris Plummer wrote: >> Good catch, thanks. Forgot to remove this comment. Fixed now. > > I still would like to know how we might end up with a hidden virtual thread. A JavaThread can be hidden, not a virtual thread. For such a case, I'd treat it that a carrier thread is hidden. The assert is to catch if it ever happens. Do you think this assert is an overkill? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1351068632 From sspitsyn at openjdk.org Tue Oct 10 00:47:04 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 10 Oct 2023 00:47:04 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v3] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 00:21:27 GMT, Alex Menkov wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: removed unneeded comment > > src/hotspot/share/prims/jvmtiExport.cpp line 1614: > >> 1612: >> 1613: // Do not post virtual thread end event for hidden java thread. >> 1614: if (state->is_enabled(JVMTI_EVENT_VIRTUAL_THREAD_END) && > > Should this be assert like in vthread_start? Thanks. Yes, this has to be the same as for `VirtualThreadStart`. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1351070324 From sspitsyn at openjdk.org Tue Oct 10 00:54:42 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 10 Oct 2023 00:54:42 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v4] In-Reply-To: References: Message-ID: > The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. > The fix includes: > - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec > - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask > - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function > > The fix also includes a couple of minor unification tweaks: > - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` to have a unified check for the `JVMTI_PHASE_PRIMORDIAL`. > - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` > > Testing: ran mach5 tiers 1-6. All tests are passed. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: convert hidden thread check for vthread end into assert as for vthread start ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16019/files - new: https://git.openjdk.org/jdk/pull/16019/files/c84931af..71e49d38 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16019&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16019&range=02-03 Stats: 5 lines in 1 file changed: 2 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16019/head:pull/16019 PR: https://git.openjdk.org/jdk/pull/16019 From amenkov at openjdk.org Tue Oct 10 01:17:58 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 10 Oct 2023 01:17:58 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v4] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 00:54:42 GMT, Serguei Spitsyn wrote: >> The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. >> The fix includes: >> - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec >> - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask >> - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function >> >> The fix also includes a couple of minor unification tweaks: >> - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` to have a unified check for the `JVMTI_PHASE_PRIMORDIAL`. >> - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` >> >> Testing: ran mach5 tiers 1-6. All tests are passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: convert hidden thread check for vthread end into assert as for vthread start Marked as reviewed by amenkov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16019#pullrequestreview-1665736274 From cjplummer at openjdk.org Tue Oct 10 02:48:09 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 10 Oct 2023 02:48:09 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v4] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 00:54:42 GMT, Serguei Spitsyn wrote: >> The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. >> The fix includes: >> - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec >> - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask >> - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function >> >> The fix also includes a couple of minor unification tweaks: >> - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` to have a unified check for the `JVMTI_PHASE_PRIMORDIAL`. >> - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` >> >> Testing: ran mach5 tiers 1-6. All tests are passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: convert hidden thread check for vthread end into assert as for vthread start Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16019#pullrequestreview-1665882344 From cjplummer at openjdk.org Tue Oct 10 02:48:10 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 10 Oct 2023 02:48:10 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v2] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 00:39:10 GMT, Serguei Spitsyn wrote: >> I still would like to know how we might end up with a hidden virtual thread. > > A JavaThread can be hidden, not a virtual thread. > For such a case, I'd treat it that a carrier thread is hidden. > The assert is to catch if it ever happens. > Do you think this assert is an overkill? Never mind. I see you changed it to an assert when your removed the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1351236440 From sspitsyn at openjdk.org Tue Oct 10 03:09:12 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 10 Oct 2023 03:09:12 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v4] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 00:54:42 GMT, Serguei Spitsyn wrote: >> The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. >> The fix includes: >> - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec >> - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask >> - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function >> >> The fix also includes a couple of minor unification tweaks: >> - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` to have a unified check for the `JVMTI_PHASE_PRIMORDIAL`. >> - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` >> >> Testing: ran mach5 tiers 1-6. All tests are passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: convert hidden thread check for vthread end into assert as for vthread start Leonid, Alex and Chris, thank you for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16019#issuecomment-1754266215 From sspitsyn at openjdk.org Tue Oct 10 03:09:12 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 10 Oct 2023 03:09:12 GMT Subject: RFR: 8316233: VirtualThreadStart events should not be thread-filtered [v2] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 02:44:53 GMT, Chris Plummer wrote: >> A JavaThread can be hidden, not a virtual thread. >> For such a case, I'd treat it that a carrier thread is hidden. >> The assert is to catch if it ever happens. >> Do you think this assert is an overkill? > > Never mind. I see you changed it to an assert when your removed the comment. Okay. I agree, it is confusing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16019#discussion_r1351284923 From sspitsyn at openjdk.org Tue Oct 10 03:09:14 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 10 Oct 2023 03:09:14 GMT Subject: Integrated: 8316233: VirtualThreadStart events should not be thread-filtered In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 23:11:01 GMT, Serguei Spitsyn wrote: > The JVMTI VirtualThreadStart events have to follow the ThreadStart events pattern and so, should not be thread-filtered. > The fix includes: > - `jvmti.xml`: remov the attribute `filtered="thread"` in the `VirtuallThreadStart` event spec > - `jvmtiEventController.cpp`: remove the `VTHREAD_START_BIT` from the `THREAD_FILTERED_EVENT_BITS` mask and and it to the `NEED_THREAD_LIFE_EVENTS` mask > - `jvmtiExport.cpp`: rearrangements in the `JvmtiExport::post_vthread_start()` function > > The fix also includes a couple of minor unification tweaks: > - to align `JvmtiExport::post_thread_end()` with `JvmtiExport::post_vthread_end()` to have a unified check for the `JVMTI_PHASE_PRIMORDIAL`. > - to rename the local variable `cur_thread` as `thread` to follow the common pattern in `JvmtiExport::post_vthread_start()` and `JvmtiExport::post_vthread_end()` > > Testing: ran mach5 tiers 1-6. All tests are passed. This pull request has now been integrated. Changeset: d3139159 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/d31391597433cf275fc615e0148c48c34acf6e11 Stats: 32 lines in 3 files changed: 7 ins; 10 del; 15 mod 8316233: VirtualThreadStart events should not be thread-filtered Reviewed-by: lmesnik, amenkov, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/16019 From jwaters at openjdk.org Tue Oct 10 03:32:08 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 10 Oct 2023 03:32:08 GMT Subject: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v6] In-Reply-To: References: <7piLRto5nNbhYYYfENCr5ecm4M2xNtMkjkE8XhrLLQ0=.8fd1ac3a-46f8-47a8-ae37-a4abbf7757d9@github.com> Message-ID: On Thu, 28 Sep 2023 03:12:03 GMT, Julian Waters wrote: >> We should set the -permissive- flag for the Microsoft Visual C compiler, as was requested by the now backed out [JDK-8241499](https://bugs.openjdk.org/browse/JDK-8241499). Doing so makes the Visual C compiler much less accepting of ill formed code, which will improve code quality on Windows in the future. > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge branch 'openjdk:master' into patch-10 > - Merge branch 'master' into patch-10 > - Document changes in awt_DnDDS.cpp > - Remove negation in os_windows.cpp > - Mismatched declaration in D3DGlyphCache.cpp > - Fields in awt_TextComponent.cpp > - reinterpret_cast needed in AccessBridgeJavaEntryPoints.cpp > - Qualifiers in awt_PrintDialog.h should be removed > - Likewise for awt_DnDDT.cpp > - awt_ole.h include order issue in awt_DnDDS.cpp > - ... and 16 more: https://git.openjdk.org/jdk/compare/84390dd0...1e2b39f9 Reopening - I will use this as the portion of the change for java.desktop and jdk.accessibility ------------- PR Comment: https://git.openjdk.org/jdk/pull/15096#issuecomment-1754296794 From dlong at openjdk.org Tue Oct 10 04:26:58 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 10 Oct 2023 04:26:58 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: References: Message-ID: On Tue, 20 Jun 2023 08:26:08 GMT, Erik ?sterlund wrote: >> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). >> In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. >> >> The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. >> >> I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Typo in comment Wouldn't it be better to put a cross_modify_fence() at the end of BarrierSetNMethod::nmethod_entry_barrier()? I don't see any code patching after that. Then we don't need it in the platform-specific generate_method_entry_barrier(). And the cross_modify_fence() could be condition depending on if code was actually patched, which it sounds like doesn't happen on aarch64 unless Generational ZGC is used? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1754337830 From dlong at openjdk.org Tue Oct 10 04:34:59 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 10 Oct 2023 04:34:59 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 11:16:10 GMT, Erik ?sterlund wrote: > The assumption is that if the nmethod immediate oops are patched first, and the guard value (immediate of the cmp instruction) is patched after, then if a thread sees the new cmp instruction, it will also see the new oop immediates. And that is indeed what the "asynchronous" cross modifying code description ensures will work in the AMD APM. So that all checks out. I guess this is a separate issue from this patch, but where does the AMD APM guarantee that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1754350788 From svkamath at openjdk.org Tue Oct 10 05:04:45 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Tue, 10 Oct 2023 05:04:45 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v5] In-Reply-To: References: Message-ID: > Hi All, > I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. > > Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: > > |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup > |-------------|------------|---------------|------------------|-----------| > |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 > full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 > small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 > small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 > full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 > full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 > small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 > small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 > full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 > small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 > small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 > full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 > full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 > small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 > small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 > full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 > small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 > small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 > full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 > full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 > small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 > small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 > full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 > small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 > small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 > full.AESGCMBench.decryptMultiPart | 65536 | 42649.816 | 47591.587 |1.11 > full.AESGCMBe... Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Updated comments, generate_8_block_avx2 method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15410/files - new: https://git.openjdk.org/jdk/pull/15410/files/77fd61b9..7e1cf54d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=03-04 Stats: 28 lines in 2 files changed: 10 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/15410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15410/head:pull/15410 PR: https://git.openjdk.org/jdk/pull/15410 From duke at openjdk.org Tue Oct 10 05:07:08 2023 From: duke at openjdk.org (Liming Liu) Date: Tue, 10 Oct 2023 05:07:08 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v3] In-Reply-To: <0oZ06FXPfZ6SEJBdPRrUEiW3tqR7hTWkXUeXdzqVyNo=.e782f692-9152-44db-9847-7292e4afa7a0@github.com> References: <0oZ06FXPfZ6SEJBdPRrUEiW3tqR7hTWkXUeXdzqVyNo=.e782f692-9152-44db-9847-7292e4afa7a0@github.com> Message-ID: On Fri, 29 Sep 2023 07:08:19 GMT, Kim Barrett wrote: >> Liming Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Cuddle ptr-operators in pretouch_memory_common > > PretouchTask attempts to parallelize the pretouching. How well does that work > with the use of MADV_POPULATE_WRITE? > Like @kimbarrett, I think this needs a better regression test. Ideally (and probably not that difficult to pull off): start the VM with AlwaysPreTouch, `-Xlog:pagesize`, and +UseTHP. Then, scan smaps to check that the heap is not splintered. Please see https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/runtime/os/TestTracePageSizes.java . It may be that you can just extend that test to include running with UseTHP. It is hard to judge whether the test is success. The only difference of smaps around JDK-8272807 (from `*p = 0` to atomic-add) and this patch (from atomic-add to madvise) is the size of AnonHugePages on aarch64: - the term covers ~90% of the map when THP mode is always and `*p = 0`/madvise is used, while it is just ~10% when atomic-add is used. - the term would use up adjacent regions reflected by /proc/buddyinfo when THP mode is madvise and `*p = 0`/madvise is used, while it is zero when atomic-add is used. So it is hard to say which coverage is considered to be success. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1754388017 From djelinski at openjdk.org Tue Oct 10 06:42:59 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Tue, 10 Oct 2023 06:42:59 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v4] In-Reply-To: References: Message-ID: <37k6gxd6R7Ft0NWpy7SXv-m0ody0oPbRiT1IF4KMITc=.09321e5c-b0d0-478e-b6f0-ff22df861400@github.com> On Mon, 9 Oct 2023 04:58:10 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated code as per review comments, added new comments for every method src/hotspot/cpu/x86/assembler_x86.cpp line 1335: > 1333: > 1334: void Assembler::addb(Register dst, int imm8) { > 1335: prefix(dst); This will miscompile code for rsi/rdi/rsp/rbp; the operation will use ah/bh/ch/dh instead of sil/dil/spl/bpl. I believe you need `prefix_and_encode(dst->encoding(), true);`. See: https://github.com/openjdk/jdk/blob/7e1cf54df79bc8010c40acd4f84c11dcba0d3dac/src/hotspot/cpu/x86/assembler_x86.cpp#L6257-L6258 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1350528886 From duke at openjdk.org Tue Oct 10 07:32:21 2023 From: duke at openjdk.org (Liming Liu) Date: Tue, 10 Oct 2023 07:32:21 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v4] In-Reply-To: References: Message-ID: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Improve the use of madvise for pretouching: 1. use madvise when THP is actually used; 2. remove the need of modifing page_size; 3. log the failure of madvise rather than warn. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15781/files - new: https://git.openjdk.org/jdk/pull/15781/files/b265cdfd..525b3ec5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=02-03 Stats: 26 lines in 1 file changed: 6 ins; 8 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From duke at openjdk.org Tue Oct 10 07:36:11 2023 From: duke at openjdk.org (Liming Liu) Date: Tue, 10 Oct 2023 07:36:11 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v5] In-Reply-To: References: Message-ID: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Untabify ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15781/files - new: https://git.openjdk.org/jdk/pull/15781/files/525b3ec5..98642e37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=03-04 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From ayang at openjdk.org Tue Oct 10 09:45:13 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 10 Oct 2023 09:45:13 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v15] In-Reply-To: References: Message-ID: On Mon, 9 Oct 2023 15:31:22 GMT, Richard Reingruber wrote: > Also it only comes from overlapping card table preprocessing with scavenging. I think this could be removed again without loosing performance. That complexity is uncalled for if its benefit is marginal. > It could be passed by reference if the query in process_range would be pulled up before the find_first_clean_card call. > The implementation should be straight forward. I think I'll experiment with it. Could it be updated to not query object-start? That would remove much complexity inside that method. Additionally, I wonder if the scanning-dirty-chunk iteration can be simplified a bit: the num of calls to `scan_obj_with_limit` seems excessive and it's not obvious whether it's intended or not that `continue` skips `drain_stacks_cond_depth`). If so, dirtying-first-card-inside-a-stripe probably strikes the best balance btw complexity and performance/mem-overhead for now. Otherwise, I prefer shadow-card-table for its simplicity and the mem-overhead issue can be addressed later on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1754833838 From ayang at openjdk.org Tue Oct 10 11:55:27 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 10 Oct 2023 11:55:27 GMT Subject: RFR: 8317730: Change byte_size to return size_t In-Reply-To: References: Message-ID: On Mon, 9 Oct 2023 12:57:34 GMT, Albert Mingkun Yang wrote: > Simple signature update to `byte_size` to match expectation from callers. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16100#issuecomment-1755200730 From ayang at openjdk.org Tue Oct 10 12:00:02 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 10 Oct 2023 12:00:02 GMT Subject: Integrated: 8317730: Change byte_size to return size_t In-Reply-To: References: Message-ID: On Mon, 9 Oct 2023 12:57:34 GMT, Albert Mingkun Yang wrote: > Simple signature update to `byte_size` to match expectation from callers. This pull request has now been integrated. Changeset: fb4098ff Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/fb4098ff1a7cca5ec42600f9ab753681961bb1ad Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8317730: Change byte_size to return size_t Reviewed-by: coleenp, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/16100 From mdoerr at openjdk.org Tue Oct 10 12:22:32 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 10 Oct 2023 12:22:32 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object Hi Dean, the following test fails with the reversed order (my load_interpreter_state patch above) on PPC64, but passes with the original order. Surprisingly, x86_64 passes with either version (with asm assertions in unlock_object applied from above). Maybe I should contribute this test even though it doesn't trigger the problem in the original VM? diff --git a/test/hotspot/jtreg/compiler/locks/TestUnlockOSR.java b/test/hotspot/jtreg/compiler/locks/TestUnlockOSR.java new file mode 100644 index 00000000000..be58f21c977 --- /dev/null +++ b/test/hotspot/jtreg/compiler/locks/TestUnlockOSR.java @@ -0,0 +1,49 @@ +/* + * Copyright (c) 2023 SAP SE. All rights reserved. + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This code is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 only, as + * published by the Free Software Foundation. + * + * This code is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * version 2 for more details (a copy is included in the LICENSE file that + * accompanied this code). + * + * You should have received a copy of the GNU General Public License version + * 2 along with this work; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. + * + * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA + * or visit www.oracle.com if you need additional information or have any + * questions. + * + */ + +/* + * @test + * @summary During OSR, locks get transferred from interpreter frame. + * Check that unlocking 2 such locks works in the OSR compiled nmethod. + * Some platforms verify that the unlocking happens in the corrent order. + * + * @run main/othervm -Xbatch TestUnlockOSR + */ + +public class TestUnlockOSR { + static void test_method(Object a, Object b, int limit) { + synchronized(a) { + synchronized(b) { + for (int i = 0; i < limit; i++) {} + } + } + } + + public static void main(String[] args) { + Object a = new TestUnlockOSR(), + b = new TestUnlockOSR(); + for (int i = 0; i < 10000; i++) { test_method(a, b, 0); } // compile + test_method(a, b, 100000); // deopt, trigger OSR + } +} ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1755267873 From stuefe at openjdk.org Tue Oct 10 13:24:39 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Oct 2023 13:24:39 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v7] In-Reply-To: References: Message-ID: > Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. > > Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. > > > 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 > 8b7b69: 0f b6 00 movzbl (%rax),%eax > 8b7b6c: 84 c0 test %al,%al > 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> > 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi > 8b7b7e: 8b 0a mov (%rdx),%ecx > 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> > 8b7b87: 48 d3 e7 shl %cl,%rdi > 8b7b8a: 48 03 3a add (%rdx),%rdi > > > Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. > > > 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> > 8ba309: 48 8b 08 mov (%rax),%rcx > 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? > 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi > 8ba318: 48 d3 e7 shl %cl,%rdi # shift > 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base > 8ba31e: 48 01 cf add %rcx,%rdi # add base > 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx > > --- > > Performance measurements: > > G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. > > I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. > > --- > > Future extensions: > > This patch uses the fact that the encoding base is aligned to metaspace reserve alignment (16 Mb). We only use 16 of those 24 bits of alignment shadow and could us... Thomas Stuefe has updated the pull request incrementally with four additional commits since the last revision: - Update src/hotspot/share/oops/compressedKlass.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/compressedKlass.cpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/compressedKlass.cpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/compressedKlass.cpp Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15389/files - new: https://git.openjdk.org/jdk/pull/15389/files/fde21bab..a380b03b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=05-06 Stats: 5 lines in 2 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15389/head:pull/15389 PR: https://git.openjdk.org/jdk/pull/15389 From stuefe at openjdk.org Tue Oct 10 13:42:27 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Oct 2023 13:42:27 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v6] In-Reply-To: <56RFDm_cBCPZHPHJq_vuSRPb9OLhUe9uwBhI-xhxgqk=.0b06a02e-cd09-431d-9a7a-3aa7fab7d1e9@github.com> References: <56RFDm_cBCPZHPHJq_vuSRPb9OLhUe9uwBhI-xhxgqk=.0b06a02e-cd09-431d-9a7a-3aa7fab7d1e9@github.com> Message-ID: <7waqFj1qZSUHjU7lHlHfJ3wDDmyhP_YUTj1UR7q-_O8=.f552fe8a-7bd3-4a9f-a38c-3817e3b0f65a@github.com> On Fri, 6 Oct 2023 17:18:01 GMT, Aleksey Shipilev wrote: > A few stylistic comments. > > What is confusing to me is that combo flag initialization is basically conditional on `UseCompressedClassPointers` (i.e. assert in new `set_base_and_shift`). But at the same time, we ask for `CompressedKlassPointers::use_compressed_klass_pointers()` in `oop`methods. This works "only" because the -UseCCP generates the same combo as the initial value of `0`? Seems fragile. I wonder if we want to initialize combo unconditionally. I know. I played around with a different version before but ended up relying on the static initialization since it seemed the most simple and robust one. I added a comment to explain this better. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15389#issuecomment-1755445730 From stuefe at openjdk.org Tue Oct 10 13:42:26 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Oct 2023 13:42:26 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v8] In-Reply-To: References: Message-ID: > Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. > > Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. > > > 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 > 8b7b69: 0f b6 00 movzbl (%rax),%eax > 8b7b6c: 84 c0 test %al,%al > 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> > 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi > 8b7b7e: 8b 0a mov (%rdx),%ecx > 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> > 8b7b87: 48 d3 e7 shl %cl,%rdi > 8b7b8a: 48 03 3a add (%rdx),%rdi > > > Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. > > > 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> > 8ba309: 48 8b 08 mov (%rax),%rcx > 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? > 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi > 8ba318: 48 d3 e7 shl %cl,%rdi # shift > 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base > 8ba31e: 48 01 cf add %rcx,%rdi # add base > 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx > > --- > > Performance measurements: > > G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. > > I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. > > --- > > Future extensions: > > This patch uses the fact that the encoding base is aligned to metaspace reserve alignment (16 Mb). We only use 16 of those 24 bits of alignment shadow and could us... Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - simplify assert - add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15389/files - new: https://git.openjdk.org/jdk/pull/15389/files/a380b03b..7a284cbf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=06-07 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15389/head:pull/15389 PR: https://git.openjdk.org/jdk/pull/15389 From kvn at openjdk.org Tue Oct 10 16:53:10 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 10 Oct 2023 16:53:10 GMT Subject: RFR: 8317761: Combine two versions of print_statistics() in java.cpp In-Reply-To: References: Message-ID: <2oFwKcgFsl3MU-RgyzXZIACejtK15iwdwlMXGaRvqZk=.fca55e08-fdea-40b1-9fe3-a168933d0a8d@github.com> On Mon, 9 Oct 2023 23:05:26 GMT, Ioi Lam wrote: > The non-product version of `print_statistics()` can be compiled in the product build as well, so there's no need to keep two different versions. > > BTW, for some reason `print_method_profiling_data()` is called unconditionally in non-product builds, but is guarded by `PrintMethodData` in product builds. I made the behavior the same as before in this PR. This can probably be cleaned up in a future PR. > > I verified that: > > - All functions call by the original product version are also called by the non-product version (but could be in different order). > - For functions that are called by the non-product version but not call by the original product version: all such calls are guarded by non-product flags (e.g. `TimeOopMap`), or the function itself does nothing (e.g., declared as `PRODUCT_RETURN`) > > Testing: tier1, tier2, build-tiers5 src/hotspot/share/runtime/java.cpp line 290: > 288: // TODO: why is this unconditional in non-product builds? > 289: print_method_profiling_data(); > 290: } I think it should be called unconditional always because we have check inside method. I think the check in product VM is historical. Originally `print_method_profiling_data()` was called only in NOT_PRODUCT and later [8037970](https://github.com/openjdk/jdk/commit/b21d142f016360986212a941659115c0c96a3426#diff-d9e4fa0ecdd187c3fce3dcb8b0344264134d7746225956636b237e26ce6ad369) add call to PRODUCT under `PrintMethodData` check (there was no this flag check inside method). Then [8042727](https://github.com/openjdk/jdk/commit/ead7a2760b46226947dc07548ea0fa4897aef18d) added the flag check inside method to avoid unconditional walk class data (I assume in debug VM). This change made the check before call the method in product version redundant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16110#discussion_r1352932770 From iklam at openjdk.org Tue Oct 10 17:05:12 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 10 Oct 2023 17:05:12 GMT Subject: RFR: 8317761: Combine two versions of print_statistics() in java.cpp [v2] In-Reply-To: References: Message-ID: > The non-product version of `print_statistics()` can be compiled in the product build as well, so there's no need to keep two different versions. > > BTW, for some reason `print_method_profiling_data()` is called unconditionally in non-product builds, but is guarded by `PrintMethodData` in product builds. I made the behavior the same as before in this PR. This can probably be cleaned up in a future PR. > > I verified that: > > - All functions call by the original product version are also called by the non-product version (but could be in different order). > - For functions that are called by the non-product version but not call by the original product version: all such calls are guarded by non-product flags (e.g. `TimeOopMap`), or the function itself does nothing (e.g., declared as `PRODUCT_RETURN`) > > Testing: tier1, tier2, build-tiers5 Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: call print_method_profiling_data() unconditionally ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16110/files - new: https://git.openjdk.org/jdk/pull/16110/files/04376605..f736e91d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16110&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16110&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16110/head:pull/16110 PR: https://git.openjdk.org/jdk/pull/16110 From iklam at openjdk.org Tue Oct 10 17:05:14 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 10 Oct 2023 17:05:14 GMT Subject: RFR: 8317761: Combine two versions of print_statistics() in java.cpp [v2] In-Reply-To: <2oFwKcgFsl3MU-RgyzXZIACejtK15iwdwlMXGaRvqZk=.fca55e08-fdea-40b1-9fe3-a168933d0a8d@github.com> References: <2oFwKcgFsl3MU-RgyzXZIACejtK15iwdwlMXGaRvqZk=.fca55e08-fdea-40b1-9fe3-a168933d0a8d@github.com> Message-ID: On Tue, 10 Oct 2023 16:50:19 GMT, Vladimir Kozlov wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> call print_method_profiling_data() unconditionally > > src/hotspot/share/runtime/java.cpp line 290: > >> 288: // TODO: why is this unconditional in non-product builds? >> 289: print_method_profiling_data(); >> 290: } > > I think it should be called unconditional always because we have check inside method. > > I think the check in product VM is historical. Originally `print_method_profiling_data()` was called only in NOT_PRODUCT and later [8037970](https://github.com/openjdk/jdk/commit/b21d142f016360986212a941659115c0c96a3426#diff-d9e4fa0ecdd187c3fce3dcb8b0344264134d7746225956636b237e26ce6ad369) add call to PRODUCT under `PrintMethodData` check (there was no this flag check inside method). > Then [8042727](https://github.com/openjdk/jdk/commit/ead7a2760b46226947dc07548ea0fa4897aef18d) added the flag check inside method to avoid unconditional walk class data (I assume in debug VM). This change made the check before call the method in product version redundant. Thanks. I removed the `if` check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16110#discussion_r1352981644 From kvn at openjdk.org Tue Oct 10 17:52:13 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 10 Oct 2023 17:52:13 GMT Subject: RFR: 8317761: Combine two versions of print_statistics() in java.cpp [v2] In-Reply-To: References: Message-ID: <1H9L8DC0-X2cUhR54YJdmEhJiurO4I0xmoGVCH0H7xI=.7bf54669-9f41-4aa9-a3bc-1c81313f557a@github.com> On Tue, 10 Oct 2023 17:05:12 GMT, Ioi Lam wrote: >> The non-product version of `print_statistics()` can be compiled in the product build as well, so there's no need to keep two different versions. >> >> BTW, for some reason `print_method_profiling_data()` is called unconditionally in non-product builds, but is guarded by `PrintMethodData` in product builds. I made the behavior the same as before in this PR. This can probably be cleaned up in a future PR. >> >> I verified that: >> >> - All functions call by the original product version are also called by the non-product version (but could be in different order). >> - For functions that are called by the non-product version but not call by the original product version: all such calls are guarded by non-product flags (e.g. `TimeOopMap`), or the function itself does nothing (e.g., declared as `PRODUCT_RETURN`) >> >> Testing: tier1, tier2, build-tiers5 > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > call print_method_profiling_data() unconditionally Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16110#pullrequestreview-1668495333 From vlivanov at openjdk.org Tue Oct 10 18:08:10 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 10 Oct 2023 18:08:10 GMT Subject: RFR: 8317761: Combine two versions of print_statistics() in java.cpp [v2] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 17:05:12 GMT, Ioi Lam wrote: >> The non-product version of `print_statistics()` can be compiled in the product build as well, so there's no need to keep two different versions. >> >> BTW, for some reason `print_method_profiling_data()` is called unconditionally in non-product builds, but is guarded by `PrintMethodData` in product builds. I made the behavior the same as before in this PR. This can probably be cleaned up in a future PR. >> >> I verified that: >> >> - All functions call by the original product version are also called by the non-product version (but could be in different order). >> - For functions that are called by the non-product version but not call by the original product version: all such calls are guarded by non-product flags (e.g. `TimeOopMap`), or the function itself does nothing (e.g., declared as `PRODUCT_RETURN`) >> >> Testing: tier1, tier2, build-tiers5 > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > call print_method_profiling_data() unconditionally FTR a slight difference in behavior between product and non-product builds when it comes to `-XX:CompileCommand=print,...` command is whether MDOs are printed or not. I'm fine with aligning all builds flavors include MDOs into `-XX:CompileCommand=print,...` output. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16110#pullrequestreview-1668523656 From jvernee at openjdk.org Tue Oct 10 21:04:24 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 10 Oct 2023 21:04:24 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v35] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: update copyright years ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/b4a7b7ab..35ca1921 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=33-34 Stats: 107 lines in 107 files changed: 0 ins; 0 del; 107 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From jvernee at openjdk.org Tue Oct 10 23:17:17 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 10 Oct 2023 23:17:17 GMT Subject: RFR: 8312522: Implementation of Foreign Function & Memory API [v36] In-Reply-To: References: Message-ID: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: adjust misformatted copyright headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15103/files - new: https://git.openjdk.org/jdk/pull/15103/files/35ca1921..5cf9e753 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15103&range=34-35 Stats: 66 lines in 4 files changed: 0 ins; 0 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/15103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15103/head:pull/15103 PR: https://git.openjdk.org/jdk/pull/15103 From svkamath at openjdk.org Tue Oct 10 23:21:54 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Tue, 10 Oct 2023 23:21:54 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v6] In-Reply-To: References: Message-ID: <17PclKIK_ZA6BrMLGitrZhf8v41Jg3gc4q2jF55VX8I=.e5a1d8b5-c4d6-4a26-b821-00c2f5a4ec07@github.com> > Hi All, > I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. > > Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: > > |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup > |-------------|------------|---------------|------------------|-----------| > |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 > full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 > small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 > small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 > full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 > full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 > small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 > small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 > full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 > small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 > small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 > full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 > full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 > small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 > small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 > full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 > small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 > small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 > full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 > full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 > small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 > small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 > full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 > small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 > small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 > full.AESGCMBench.decryptMultiPart | 65536 | 42649.816 | 47591.587 |1.11 > full.AESGCMBe... Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Updated addb instruction definition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15410/files - new: https://git.openjdk.org/jdk/pull/15410/files/7e1cf54d..7812f36d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15410/head:pull/15410 PR: https://git.openjdk.org/jdk/pull/15410 From svkamath at openjdk.org Tue Oct 10 23:49:18 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Tue, 10 Oct 2023 23:49:18 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v7] In-Reply-To: References: Message-ID: > Hi All, > I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. > > Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: > > |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup > |-------------|------------|---------------|------------------|-----------| > |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 > full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 > small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 > small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 > full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 > full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 > small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 > small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 > full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 > small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 > small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 > full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 > full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 > small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 > small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 > full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 > small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 > small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 > full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 > full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 > small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 > small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 > full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 > small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 > small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 > full.AESGCMBench.decryptMultiPart | 65536 | 42649.816 | 47591.587 |1.11 > full.AESGCMBe... Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Updated addb instruction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15410/files - new: https://git.openjdk.org/jdk/pull/15410/files/7812f36d..7ce8068f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15410/head:pull/15410 PR: https://git.openjdk.org/jdk/pull/15410 From sviswanathan at openjdk.org Wed Oct 11 00:21:11 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 11 Oct 2023 00:21:11 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v7] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 23:49:18 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated addb instruction Thanks a lot for considering the review comments. The PR looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15410#pullrequestreview-1669414361 From amenkov at openjdk.org Wed Oct 11 01:00:10 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 11 Oct 2023 01:00:10 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 20:56:13 GMT, Hannes Greule wrote: > See the bug description for more information. > > This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. src/hotspot/share/oops/fieldStreams.hpp line 194: > 192: void prepare() { > 193: _next_klass = next_klass_with_fields(); > 194: // special case: the base klass has no fields. If any supertype has any fields, use that directly. "base klass" sounds misleading. I think "initial" would be clearer ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16083#discussion_r1353752601 From amenkov at openjdk.org Wed Oct 11 01:17:01 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 11 Oct 2023 01:17:01 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 20:56:13 GMT, Hannes Greule wrote: > See the bug description for more information. > > This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. The fix itself looks good to me. How did you tested the change? Looks like we don't have test coverage for the correctness of the dumped fields. Would be nice to add it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16083#issuecomment-1756579065 From dlong at openjdk.org Wed Oct 11 01:17:03 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 11 Oct 2023 01:17:03 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: References: Message-ID: <0p2dyy8w_5WbR_oeNqBVnT0F8k0_02sj-DlZ4jBOXaM=.9da09560-9ae6-4e2c-86fa-d1882911eb31@github.com> On Tue, 20 Jun 2023 08:26:08 GMT, Erik ?sterlund wrote: >> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). >> In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. >> >> The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. >> >> I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Typo in comment In particular, I'm wondering about branch prediction. The "beyond reach of the thread's current control flow" to me only rules out pre-fetching code that is truly unreachable (ignoring unconditional branches). What about this scenario: 1) Thread 1 reaches 1st instruction of nmethod, and predicts that the entry barrier slow path branch will not be taken, so it loads the some number of instructions past the branch into the pipeline, including instructions with oop immediates. 2) Before Thread 1 reaches the entry barrier compare, another thread calls the same nmethod, takes the slow path, patches oops in instructions, and disarms the entry barrier 3) Thread 1 sees the disarmed conditional branch and continues to execute the previously fetched pipeline which contains stale oops. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1756579161 From fyang at openjdk.org Wed Oct 11 02:52:01 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 11 Oct 2023 02:52:01 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v3] In-Reply-To: <-SH1KoNgpnXy3nRtsnVHW-EaZtRKytDCPwS53-ngwxM=.50ee506e-e680-421d-8a7a-9adca26f980a@github.com> References: <-SH1KoNgpnXy3nRtsnVHW-EaZtRKytDCPwS53-ngwxM=.50ee506e-e680-421d-8a7a-9adca26f980a@github.com> Message-ID: On Mon, 9 Oct 2023 14:55:42 GMT, Hamlin Li wrote: >> Only vector version is included in this patch. >> >> ### Test >> The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Ajust code according round 3 reviewing Thanks for the update. Two more nits remain, otherwise LGTM. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4345: > 4343: }; > 4344: const VectorRegister tmp_vr = v16; > 4345: const VectorRegister counter = v17; Maybe rename this as `counter_vr`? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4378: > 4376: chacha20_quarter_round(work_vrs[1], work_vrs[6], work_vrs[11], work_vrs[12], tmp_vr); > 4377: chacha20_quarter_round(work_vrs[2], work_vrs[7], work_vrs[8], work_vrs[13], tmp_vr); > 4378: chacha20_quarter_round(work_vrs[3], work_vrs[4], work_vrs[9], work_vrs[14], tmp_vr); I personally prefer to add one extra space to keep the fourth parameter aligned for those calls to `chacha20_quarter_round`. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15899#pullrequestreview-1665737295 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1351129196 PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1351723257 From iklam at openjdk.org Wed Oct 11 05:13:00 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 11 Oct 2023 05:13:00 GMT Subject: RFR: 8317761: Combine two versions of print_statistics() in java.cpp [v2] In-Reply-To: <1H9L8DC0-X2cUhR54YJdmEhJiurO4I0xmoGVCH0H7xI=.7bf54669-9f41-4aa9-a3bc-1c81313f557a@github.com> References: <1H9L8DC0-X2cUhR54YJdmEhJiurO4I0xmoGVCH0H7xI=.7bf54669-9f41-4aa9-a3bc-1c81313f557a@github.com> Message-ID: On Tue, 10 Oct 2023 17:49:44 GMT, Vladimir Kozlov wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> call print_method_profiling_data() unconditionally > > Good. Thanks @vnkozlov and @iwanowww for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16110#issuecomment-1756819143 From iklam at openjdk.org Wed Oct 11 05:14:22 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 11 Oct 2023 05:14:22 GMT Subject: Integrated: 8317761: Combine two versions of print_statistics() in java.cpp In-Reply-To: References: Message-ID: <0VLxtnZW0___7_wP9aE1sjbVt-sfryrA8TpoY3G3QSg=.65099b91-b0a7-40b7-a36b-677410de59ca@github.com> On Mon, 9 Oct 2023 23:05:26 GMT, Ioi Lam wrote: > The non-product version of `print_statistics()` can be compiled in the product build as well, so there's no need to keep two different versions. > > BTW, for some reason `print_method_profiling_data()` is called unconditionally in non-product builds, but is guarded by `PrintMethodData` in product builds. I made the behavior the same as before in this PR. This can probably be cleaned up in a future PR. > > I verified that: > > - All functions call by the original product version are also called by the non-product version (but could be in different order). > - For functions that are called by the non-product version but not call by the original product version: all such calls are guarded by non-product flags (e.g. `TimeOopMap`), or the function itself does nothing (e.g., declared as `PRODUCT_RETURN`) > > Testing: tier1, tier2, build-tiers5 This pull request has now been integrated. Changeset: 84b7cc15 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/84b7cc15c20581a14cdd2a590e0a30b1ef9acddb Stats: 60 lines in 2 files changed: 9 ins; 51 del; 0 mod 8317761: Combine two versions of print_statistics() in java.cpp Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/16110 From hgreule at openjdk.org Wed Oct 11 06:14:05 2023 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 11 Oct 2023 06:14:05 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 01:13:56 GMT, Alex Menkov wrote: > The fix itself looks good to me. How did you tested the change? Looks like we don't have test coverage for the correctness of the dumped fields. Would be nice to add it. Thanks. I ran `hotspot_serviceability` and also manually looked into more complex heap dumps. I agree that specific tests would be better. I'll need to figure out how that can be accomplished. If you have any pointers how to get started there, please let me know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16083#issuecomment-1756867890 From mli at openjdk.org Wed Oct 11 07:43:03 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Oct 2023 07:43:03 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v4] In-Reply-To: References: Message-ID: > Only vector version is included in this patch. > > ### Test > The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Minor fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15899/files - new: https://git.openjdk.org/jdk/pull/15899/files/3b42ce13..721b84dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15899&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15899&range=02-03 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/15899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15899/head:pull/15899 PR: https://git.openjdk.org/jdk/pull/15899 From mli at openjdk.org Wed Oct 11 07:43:06 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Oct 2023 07:43:06 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v3] In-Reply-To: References: <-SH1KoNgpnXy3nRtsnVHW-EaZtRKytDCPwS53-ngwxM=.50ee506e-e680-421d-8a7a-9adca26f980a@github.com> Message-ID: On Tue, 10 Oct 2023 07:29:57 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Ajust code according round 3 reviewing > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4378: > >> 4376: chacha20_quarter_round(work_vrs[1], work_vrs[6], work_vrs[11], work_vrs[12], tmp_vr); >> 4377: chacha20_quarter_round(work_vrs[2], work_vrs[7], work_vrs[8], work_vrs[13], tmp_vr); >> 4378: chacha20_quarter_round(work_vrs[3], work_vrs[4], work_vrs[9], work_vrs[14], tmp_vr); > > I personally prefer to add one extra space to keep the fourth parameter aligned for those calls to `chacha20_quarter_round`. Thanks for your reviewing! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15899#discussion_r1354318379 From rrich at openjdk.org Wed Oct 11 09:36:50 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 11 Oct 2023 09:36:50 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v17] In-Reply-To: References: Message-ID: <7Wf_JP97N60baqhknTPclz-KiF8cBARxqTD8KtVfD0w=.631369a7-e665-406a-8e10-4942893dc578@github.com> > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Shadow table per stripe ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/272ab97b..8b544d84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=15-16 Stats: 273 lines in 2 files changed: 110 ins; 124 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Wed Oct 11 09:36:52 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 11 Oct 2023 09:36:52 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v15] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 09:42:00 GMT, Albert Mingkun Yang wrote: > > Also it only comes from overlapping card table preprocessing with scavenging. I think this could be removed again without loosing performance. > > That complexity is uncalled for if its benefit is marginal. I'll remove it. > > It could be passed by reference if the query in process_range would be pulled up before the find_first_clean_card call. > > The implementation should be straight forward. I think I'll experiment with it. > > Could it be updated to not query object-start? That would remove much complexity inside that method. > > Additionally, I wonder if the scanning-dirty-chunk iteration can be simplified a bit [...] Probably. I though it would be a good idea (for performance and clearity) to strucure the processing 1. objects reaching in, 2. objects contained in, 3. objects reaching out of the dirty chunk. I found now that it's neither necessary for performance nor is it helping to better understand the code. I'll push a new version that's supposed to look very much like yours, except it does the card table preprocessing and keeps a shadow copy of the card table entries corresponding to the current stripe on stack (so not malloc'ed). I think it would be a good base for further enhancements you have on your mind but also good to be backported. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1757252721 From rrich at openjdk.org Wed Oct 11 09:36:54 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 11 Oct 2023 09:36:54 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v16] In-Reply-To: <7MCfxKnwPdEgJ_bTJ6T-WGBaiUTm3v_zNuED3OdbNP0=.beb49d29-ad0e-40ab-a0f8-0fff5373dbc4@github.com> References: <7MCfxKnwPdEgJ_bTJ6T-WGBaiUTm3v_zNuED3OdbNP0=.beb49d29-ad0e-40ab-a0f8-0fff5373dbc4@github.com> Message-ID: <8os7ovPJW5cmNTfhaqsH-oiyaNj2K21U8LT_JPzV_-Q=.9bc2818b-d6b8-4914-a27c-0b6851a572ee@github.com> On Mon, 9 Oct 2023 15:42:02 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with two additional commits since the last revision: > > - find_first_clean_card: return end_card if final object extends beyond it. > - Cleanup Tested https://github.com/openjdk/jdk/pull/14846/commits/8b544d84da282c9f0f86d3f275c6688baac91da8 with langtools:tier1 TEST_VM_OPTS="-XX:+UseParallelGC" jdk:tier1 TEST_VM_OPTS="-XX:+UseParallelGC" hotspot:tier1 card_scan* tests ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1757255124 From dchuyko at openjdk.org Wed Oct 11 10:18:53 2023 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Wed, 11 Oct 2023 10:18:53 GMT Subject: RFR: 8309271: A way to align already compiled methods with compiler directives [v9] In-Reply-To: References: Message-ID: > Compiler Control (https://openjdk.org/jeps/165) provides method-context dependent control of the JVM compilers (C1 and C2). The active directive stack is built from the directive files passed with the `-XX:CompilerDirectivesFile` diagnostic command-line option and the Compiler.add_directives diagnostic command. It is also possible to clear all directives or remove the top from the stack. > > A matching directive will be applied at method compilation time when such compilation is started. If directives are added or changed, but compilation does not start, then the state of compiled methods doesn't correspond to the rules. This is not an error, and it happens in long running applications when directives are added or removed after compilation of methods that could be matched. For example, the user decides that C2 compilation needs to be disabled for some method due to a compiler bug, issues such a directive but this does not affect the application behavior. In such case, the target application needs to be restarted, and such an operation can have high costs and risks. Another goal is testing/debugging compilers. > > It would be convenient to optionally reconcile at least existing matching nmethods to the current stack of compiler directives (so bypass inlined methods). > > Natural way to eliminate the discrepancy between the result of compilation and the broken rule is to discard the compilation result, i.e. deoptimization. Prior to that we can try to re-compile the method letting compile broker to perform it taking new directives stack into account. Re-compilation helps to prevent hot methods from execution in the interpreter. > > A new flag `-r` has beed introduced for some directives related to compile commands: `Compiler.add_directives`, `Compiler.remove_directives`, `Compiler.clear_directives`. The default behavior has not changed (no flag). If the new flag is present, the command scans already compiled methods and puts methods that have any active non-default matching compiler directives to re-compilation if possible, otherwise marks them for deoptimization. There is currently no distinction which directives are found. In particular, this means that if there are rules for inlining into some method, it will be refreshed. On the other hand, if there are rules for a method and it was inlined, top-level methods won't be refreshed, but this can be achieved by having rules for them. > > In addition, a new diagnostic command `Compiler.replace_directives`, has been added for ... Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - jcheck - Unnecessary import - force_update->refresh - Merge branch 'openjdk:master' into compiler-directives-force-update - Use only top directive for add/remove; better mutex rank definition; texts - Merge branch 'openjdk:master' into compiler-directives-force-update - ... and 17 more: https://git.openjdk.org/jdk/compare/731fb4ee...d7cb519a ------------- Changes: https://git.openjdk.org/jdk/pull/14111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14111&range=08 Stats: 372 lines in 15 files changed: 339 ins; 3 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/14111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14111/head:pull/14111 PR: https://git.openjdk.org/jdk/pull/14111 From djelinski at openjdk.org Wed Oct 11 11:00:04 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Wed, 11 Oct 2023 11:00:04 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v7] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 23:49:18 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated addb instruction Could you update the comment on `implGCMCrypt0`? https://github.com/openjdk/jdk/blob/731fb4eea21ab67d90970d7c6107fb0a4fbee9ec/src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java#L625-L629 src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3755: > 3753: const XMMRegister ctr_blockx = xmm9; > 3754: const XMMRegister aad_hashx = xmm8; > 3755: Label encrypt_done, encrypt_by_8_parallel, encrypt_by_8_new, encrypt_by_8, hash_last_8, generate_htbl_8_blks; `encrypt_by_8_parallel` is never used in jumps; `hash_last_8` and `generate_htbl_8_blks` are never used. Can you remove them? ------------- PR Review: https://git.openjdk.org/jdk/pull/15410#pullrequestreview-1670446151 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1354454458 From ayang at openjdk.org Wed Oct 11 11:15:17 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 11 Oct 2023 11:15:17 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v17] In-Reply-To: <7Wf_JP97N60baqhknTPclz-KiF8cBARxqTD8KtVfD0w=.631369a7-e665-406a-8e10-4942893dc578@github.com> References: <7Wf_JP97N60baqhknTPclz-KiF8cBARxqTD8KtVfD0w=.631369a7-e665-406a-8e10-4942893dc578@github.com> Message-ID: <_jwOMcqQDRIjTc-CLlvGk42ySeBHAHa5OP1KxYiFWu8=.7735fa6f-4afb-45fa-b594-4ccba269be79@github.com> On Wed, 11 Oct 2023 09:36:50 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Shadow table per stripe > I think it would be a good base for further enhancements you have on your mind but also good to be backported. Agree. Card-scanning logic is quite clear now. src/hotspot/share/gc/parallel/psCardTable.cpp line 228: > 226: scan_obj_with_limit(pm, obj.obj, addr_l, end); > 227: } > 228: } I wonder if the else branch can be replaced by: if (obj.addr < i_addr && obj.addr > start) { // already-scanned } else { scan_obj_with_limit(pm, obj.obj, addr_l, end); } ------------- PR Review: https://git.openjdk.org/jdk/pull/14846#pullrequestreview-1670870614 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1354760039 From aph at openjdk.org Wed Oct 11 13:36:01 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 11 Oct 2023 13:36:01 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v8] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 - Fix LLVM - Give x32 bug its own ID. - cleanup - Fix conditional compilation - Remove x32 handling - Stash x86-32 changes - MacOS - AArch64 - x86-32 changes - ... and 18 more: https://git.openjdk.org/jdk/compare/cef9fff0...c56adbd9 ------------- Changes: https://git.openjdk.org/jdk/pull/10661/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=07 Stats: 267 lines in 12 files changed: 265 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Wed Oct 11 13:36:04 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 11 Oct 2023 13:36:04 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <3sDhB4Anie_ab5PBqGAGtFHDICzJdgkDDIaOSLZZKFI=.7556e8b1-1d68-4c18-82ef-3452fa52193e@github.com> On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic This is a stripped-down, less risky, and simplified patch, in order to get something done. I removed x32 handling because it have turned out to be disproportionately fiddly. The `mxcsr` register we need to access is not a part of the `fenv` structure, so x32 needs special-case handling. This handling turns out to be problematic because HotSpot on GCC is compiled with low x32 CPU version that doesn't support the builtins or assembly instructions that we need to access `mxcsr`. I don't want to change HotSpot's default x32 CPU version, because that will affect code generation in unknown ways. Given that x32 is obsolescent, I dropped it. In addition, I dropped the proposed warning at safepoints if the handling of denormal floating-point changes. It's fiddly and has potential compatibility issues, and may have performance impact. It's not needed to fix 8295159. Reopened for review, in a simpler form that doesn't raise any compatibility issues. I'm seeing one automated test failure on Linux x86, which I don't understand because I've excluded that test for generic-i586. If anyone understands this, please shout up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1743421995 PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1757707730 From ihse at openjdk.org Wed Oct 11 13:36:07 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 11 Oct 2023 13:36:07 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v8] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 11 Oct 2023 13:31:22 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 > - Fix LLVM > - Give x32 bug its own ID. > - cleanup > - Fix conditional compilation > - Remove x32 handling > - Stash x86-32 changes > - MacOS > - AArch64 > - x86-32 changes > - ... and 18 more: https://git.openjdk.org/jdk/compare/cef9fff0...c56adbd9 make/autoconf/flags-cflags.m4 line 577: > 575: # CXXFLAGS C++ language level for all of JDK, including Hotspot. > 576: if test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang || test "x$TOOLCHAIN_TYPE" = xxlc; then > 577: LANGSTD_CXXFLAGS="-std=gnu++14" Uhhh, is this really okay? I thought the idea was that we should standardize on official C++, not use vendor-specific extensions. @kimbarrett Opinions on this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1338190849 From aph at openjdk.org Wed Oct 11 13:36:08 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 11 Oct 2023 13:36:08 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v8] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 27 Sep 2023 07:49:48 GMT, Magnus Ihse Bursie wrote: >> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: >> >> - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 >> - Fix LLVM >> - Give x32 bug its own ID. >> - cleanup >> - Fix conditional compilation >> - Remove x32 handling >> - Stash x86-32 changes >> - MacOS >> - AArch64 >> - x86-32 changes >> - ... and 18 more: https://git.openjdk.org/jdk/compare/cef9fff0...c56adbd9 > > make/autoconf/flags-cflags.m4 line 577: > >> 575: # CXXFLAGS C++ language level for all of JDK, including Hotspot. >> 576: if test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang || test "x$TOOLCHAIN_TYPE" = xxlc; then >> 577: LANGSTD_CXXFLAGS="-std=gnu++14" > > Uhhh, is this really okay? I thought the idea was that we should standardize on official C++, not use vendor-specific extensions. > > @kimbarrett Opinions on this? I pulled it out. The problem is that I have to do this: jdouble_cast(0x0030000000000000); // 0x1.0p-1020; which is yucky, but not the End Of The World. Hexadecimal floating-point literals were not part of C++ until C++17. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1338783490 From aph at openjdk.org Wed Oct 11 13:36:16 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 11 Oct 2023 13:36:16 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v5] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 15:59:48 GMT, Aleksey Shipilev wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic > > test/hotspot/jtreg/compiler/floatingpoint/TestDenormalFloat.java line 28: > >> 26: * @bug 8295159 >> 27: * @summary DSO created with -ffast-math breaks Java floating-point arithmetic >> 28: * @run main/othervm compiler.floatingpoint.TestDenormalFloat > > Should it have `/native` somewhere here? Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1336161729 From aph at openjdk.org Wed Oct 11 13:42:19 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 11 Oct 2023 13:42:19 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: <3sDhB4Anie_ab5PBqGAGtFHDICzJdgkDDIaOSLZZKFI=.7556e8b1-1d68-4c18-82ef-3452fa52193e@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <3sDhB4Anie_ab5PBqGAGtFHDICzJdgkDDIaOSLZZKFI=.7556e8b1-1d68-4c18-82ef-3452fa52193e@github.com> Message-ID: <9w3ZvcA_XeRBsb_qrKs2NYK-7U93G7nv-u4C29WOPkA=.2b31974c-e361-41fa-b58b-39619e9499ec@github.com> On Wed, 11 Oct 2023 13:31:22 GMT, Andrew Haley wrote: > I'm seeing one automated test failure on Linux x86, which I don't understand because I've excluded that test for generic-i586. If anyone understands this, please shout up. For avoidance of doubt, the test doesn't run locally, only on the Github-triggered CI. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1757721303 From stuefe at openjdk.org Wed Oct 11 13:57:23 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Oct 2023 13:57:23 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: <9w3ZvcA_XeRBsb_qrKs2NYK-7U93G7nv-u4C29WOPkA=.2b31974c-e361-41fa-b58b-39619e9499ec@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <3sDhB4Anie_ab5PBqGAGtFHDICzJdgkDDIaOSLZZKFI=.7556e8b1-1d68-4c18-82ef-3452fa52193e@github.com> <9w3ZvcA_XeRBsb_qrKs2NYK-7U93G7nv-u4C29WOPkA=.2b31974c-e361-41fa-b58b-39619e9499ec@github.com> Message-ID: On Wed, 11 Oct 2023 13:38:57 GMT, Andrew Haley wrote: > > I'm seeing one automated test failure on Linux x86, which I don't understand because I've excluded that test for generic-i586. If anyone understands this, please shout up. > > For avoidance of doubt, the test doesn't run locally, only on the Github-triggered CI. Only TestDenormalFloat is excluded. The double variant still runs and fails. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1757748942 From stuefe at openjdk.org Wed Oct 11 14:11:44 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Oct 2023 14:11:44 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v8] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 11 Oct 2023 13:36:01 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 > - Fix LLVM > - Give x32 bug its own ID. > - cleanup > - Fix conditional compilation > - Remove x32 handling > - Stash x86-32 changes > - MacOS > - AArch64 > - x86-32 changes > - ... and 18 more: https://git.openjdk.org/jdk/compare/cef9fff0...c56adbd9 src/hotspot/os/bsd/os_bsd.cpp line 976: > 974: // same architecture as Hotspot is running on > 975: > 976: void *os::Bsd::dlopen_helper(const char *filename, int mode) { I thought BSD is switching to clang. src/hotspot/os/bsd/os_bsd.cpp line 1001: > 999: static const volatile double thresh > 1000: = jdouble_cast(0x0000000000000003); // 0x0.0000000000003p-1022; > 1001: if (unity + thresh == unity || -unity - thresh == -unity) { Could this expression happen to be precomputed by the compiler at build time? Maybe make the parts volatile? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355064475 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355059684 From mli at openjdk.org Wed Oct 11 14:52:09 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Oct 2023 14:52:09 GMT Subject: RFR: 8315716: RISC-V: implement ChaCha20 intrinsic [v4] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 07:43:03 GMT, Hamlin Li wrote: >> Only vector version is included in this patch. >> >> ### Test >> The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Minor fixes Thanks for your reviewing. @robehn @luhenry @RealFYang @gctony ------------- PR Comment: https://git.openjdk.org/jdk/pull/15899#issuecomment-1757850306 From mli at openjdk.org Wed Oct 11 14:52:10 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Oct 2023 14:52:10 GMT Subject: Integrated: 8315716: RISC-V: implement ChaCha20 intrinsic In-Reply-To: References: Message-ID: On Mon, 25 Sep 2023 11:47:40 GMT, Hamlin Li wrote: > Only vector version is included in this patch. > > ### Test > The patch passed the jdk tests found via `find test/jdk/ -iname *ChaCha*` This pull request has now been integrated. Changeset: 8f8c45b5 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/8f8c45b54a0ca2d676b76521fef87fb3a3ccad97 Stats: 163 lines in 4 files changed: 163 ins; 0 del; 0 mod 8315716: RISC-V: implement ChaCha20 intrinsic Reviewed-by: luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/15899 From rrich at openjdk.org Wed Oct 11 15:02:32 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 11 Oct 2023 15:02:32 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v17] In-Reply-To: <_jwOMcqQDRIjTc-CLlvGk42ySeBHAHa5OP1KxYiFWu8=.7735fa6f-4afb-45fa-b594-4ccba269be79@github.com> References: <7Wf_JP97N60baqhknTPclz-KiF8cBARxqTD8KtVfD0w=.631369a7-e665-406a-8e10-4942893dc578@github.com> <_jwOMcqQDRIjTc-CLlvGk42ySeBHAHa5OP1KxYiFWu8=.7735fa6f-4afb-45fa-b594-4ccba269be79@github.com> Message-ID: On Wed, 11 Oct 2023 11:09:48 GMT, Albert Mingkun Yang wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Shadow table per stripe > > src/hotspot/share/gc/parallel/psCardTable.cpp line 228: > >> 226: scan_obj_with_limit(pm, obj.obj, addr_l, end); >> 227: } >> 228: } > > I wonder if the else branch can be replaced by: > > > if (obj.addr < i_addr && obj.addr > start) { > // already-scanned > } else { > scan_obj_with_limit(pm, obj.obj, addr_l, end); > } You mean to replace L220-L227, right? There we know `obj.addr >= addr_l`. `obj.addr < i_addr` cannot be true then IMO. It would overlap with an objArray if `i_addr` was set there. If `i_addr` was set at the end of a non-objArray then objects starting before `i_addr` cannot start in or after `addr_l`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1355164575 From ayang at openjdk.org Wed Oct 11 16:26:27 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 11 Oct 2023 16:26:27 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v17] In-Reply-To: References: <7Wf_JP97N60baqhknTPclz-KiF8cBARxqTD8KtVfD0w=.631369a7-e665-406a-8e10-4942893dc578@github.com> <_jwOMcqQDRIjTc-CLlvGk42ySeBHAHa5OP1KxYiFWu8=.7735fa6f-4afb-45fa-b594-4ccba269be79@github.com> Message-ID: On Wed, 11 Oct 2023 14:59:29 GMT, Richard Reingruber wrote: >> src/hotspot/share/gc/parallel/psCardTable.cpp line 228: >> >>> 226: scan_obj_with_limit(pm, obj.obj, addr_l, end); >>> 227: } >>> 228: } >> >> I wonder if the else branch can be replaced by: >> >> >> if (obj.addr < i_addr && obj.addr > start) { >> // already-scanned >> } else { >> scan_obj_with_limit(pm, obj.obj, addr_l, end); >> } > > You mean to replace L220-L227, right? There we know `obj.addr >= addr_l`. `obj.addr < i_addr` cannot be true then IMO. It would overlap with an objArray if `i_addr` was set there. If `i_addr` was set at the end of a non-objArray then objects starting before `i_addr` cannot start in or after `addr_l`. No, I mean L212 to L228. "Comment on lines +212 to +228" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1355177469 From rrich at openjdk.org Wed Oct 11 16:56:34 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 11 Oct 2023 16:56:34 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v17] In-Reply-To: References: <7Wf_JP97N60baqhknTPclz-KiF8cBARxqTD8KtVfD0w=.631369a7-e665-406a-8e10-4942893dc578@github.com> <_jwOMcqQDRIjTc-CLlvGk42ySeBHAHa5OP1KxYiFWu8=.7735fa6f-4afb-45fa-b594-4ccba269be79@github.com> Message-ID: On Wed, 11 Oct 2023 15:06:22 GMT, Albert Mingkun Yang wrote: >> You mean to replace L220-L227, right? There we know `obj.addr >= addr_l`. `obj.addr < i_addr` cannot be true then IMO. It would overlap with an objArray if `i_addr` was set there. If `i_addr` was set at the end of a non-objArray then objects starting before `i_addr` cannot start in or after `addr_l`. > > No, I mean L212 to L228. "Comment on lines +212 to +228" Yes indeed. Looks correct to me. Nice simplification! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1355375376 From rrich at openjdk.org Wed Oct 11 17:04:09 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 11 Oct 2023 17:04:09 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v17] In-Reply-To: References: <7Wf_JP97N60baqhknTPclz-KiF8cBARxqTD8KtVfD0w=.631369a7-e665-406a-8e10-4942893dc578@github.com> <_jwOMcqQDRIjTc-CLlvGk42ySeBHAHa5OP1KxYiFWu8=.7735fa6f-4afb-45fa-b594-4ccba269be79@github.com> Message-ID: <7Yql4JbKg57I3Vi3BoKoyD8QleWQ0wPh_yBh6bNjhj4=.31078396-6df1-4439-9504-d84b62442f13@github.com> On Wed, 11 Oct 2023 16:53:13 GMT, Richard Reingruber wrote: >> No, I mean L212 to L228. "Comment on lines +212 to +228" > > Yes indeed. Looks correct to me. Nice simplification! > No, I mean L212 to L228. "Comment on lines +212 to +228" I looked at this at first but thought it is obviously wrong if `obj` is precisely marked after a previous young collection. But if it is precisely marked it is of course correct scan from the beginning of the dirty chunk (addr_l) to the obj/stripe end. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1355385596 From aph at openjdk.org Wed Oct 11 17:29:11 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 11 Oct 2023 17:29:11 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v9] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Add TestDenormalDouble.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/c56adbd9..01f6e224 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=07-08 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Wed Oct 11 17:29:12 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 11 Oct 2023 17:29:12 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <3sDhB4Anie_ab5PBqGAGtFHDICzJdgkDDIaOSLZZKFI=.7556e8b1-1d68-4c18-82ef-3452fa52193e@github.com> <9w3ZvcA_XeRBsb_qrKs2NYK-7U93G7nv-u4C29WOPkA=.2b31974c-e361-41fa-b58b-39619e9499ec@github.com> Message-ID: On Wed, 11 Oct 2023 13:54:05 GMT, Thomas Stuefe wrote: > > > I'm seeing one automated test failure on Linux x86, which I don't understand because I've excluded that test for generic-i586. If anyone understands this, please shout up. > > > > > > For avoidance of doubt, the test doesn't run locally, only on the Github-triggered CI. > > Only TestDenormalFloat is excluded. The double variant still runs and fails. Ha! Thanks. Sometimes I can't see the wood for the trees. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1758144546 From aph at openjdk.org Wed Oct 11 17:29:24 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 11 Oct 2023 17:29:24 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v8] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <1bpDDmcV654Y0ZlXd-tkddf8syD6JrlLQ70BnhG6EwE=.0eb242bb-bede-404f-b37e-9d7328e261b3@github.com> On Wed, 11 Oct 2023 14:03:24 GMT, Thomas Stuefe wrote: >> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: >> >> - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 >> - Fix LLVM >> - Give x32 bug its own ID. >> - cleanup >> - Fix conditional compilation >> - Remove x32 handling >> - Stash x86-32 changes >> - MacOS >> - AArch64 >> - x86-32 changes >> - ... and 18 more: https://git.openjdk.org/jdk/compare/cef9fff0...c56adbd9 > > src/hotspot/os/bsd/os_bsd.cpp line 976: > >> 974: // same architecture as Hotspot is running on >> 975: >> 976: void *os::Bsd::dlopen_helper(const char *filename, int mode) { > > I thought BSD is switching to clang. What difference does it make if it does? > src/hotspot/os/bsd/os_bsd.cpp line 1001: > >> 999: static const volatile double thresh >> 1000: = jdouble_cast(0x0000000000000003); // 0x0.0000000000003p-1022; >> 1001: if (unity + thresh == unity || -unity - thresh == -unity) { > > Could this expression happen to be precomputed by the compiler at build time? Maybe make the parts volatile? `thresh` is volatile. What more is needed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355424964 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355423807 From stuefe at openjdk.org Wed Oct 11 18:46:49 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Oct 2023 18:46:49 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v8] In-Reply-To: <1bpDDmcV654Y0ZlXd-tkddf8syD6JrlLQ70BnhG6EwE=.0eb242bb-bede-404f-b37e-9d7328e261b3@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <1bpDDmcV654Y0ZlXd-tkddf8syD6JrlLQ70BnhG6EwE=.0eb242bb-bede-404f-b37e-9d7328e261b3@github.com> Message-ID: On Wed, 11 Oct 2023 17:20:18 GMT, Andrew Haley wrote: >> src/hotspot/os/bsd/os_bsd.cpp line 976: >> >>> 974: // same architecture as Hotspot is running on >>> 975: >>> 976: void *os::Bsd::dlopen_helper(const char *filename, int mode) { >> >> I thought BSD is switching to clang. > > What difference does it make if it does? I was trying to understand the BSD+gcc combination. We use clang on MacOS, so the only platform I thought would be affected would be one of the BSDs. But I thought those also moved to clang in their builds. Hence my question. >> src/hotspot/os/bsd/os_bsd.cpp line 1001: >> >>> 999: static const volatile double thresh >>> 1000: = jdouble_cast(0x0000000000000003); // 0x0.0000000000003p-1022; >>> 1001: if (unity + thresh == unity || -unity - thresh == -unity) { >> >> Could this expression happen to be precomputed by the compiler at build time? Maybe make the parts volatile? > > `thresh` is volatile. What more is needed? Argh, sorry, missed that volatile. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355591017 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355593391 From zgu at openjdk.org Wed Oct 11 18:50:14 2023 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 11 Oct 2023 18:50:14 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v2] In-Reply-To: References: Message-ID: > Interpreter oop maps are computed lazily during GC root scan and they are expensive to compute. > > GCs uses a small hash table per instance class to cache computed oop maps during STW root scan, but not for concurrent root scan. > > This patch is intended to enable `OopMapCache` for concurrent GCs. > > Test: > tier1 and tier2 fastdebug and release on MacOSX, Linux 86_84 and Linux 86_32. Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Cleanup old oop map cache entry after class redefinition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16074/files - new: https://git.openjdk.org/jdk/pull/16074/files/11936030..015d4fb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16074&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16074&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16074.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16074/head:pull/16074 PR: https://git.openjdk.org/jdk/pull/16074 From duke at openjdk.org Wed Oct 11 18:50:17 2023 From: duke at openjdk.org (Leela Mohan Venati) Date: Wed, 11 Oct 2023 18:50:17 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v2] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 18:44:04 GMT, Zhengyu Gu wrote: >> Interpreter oop maps are computed lazily during GC root scan and they are expensive to compute. >> >> GCs uses a small hash table per instance class to cache computed oop maps during STW root scan, but not for concurrent root scan. >> >> This patch is intended to enable `OopMapCache` for concurrent GCs. >> >> Test: >> tier1 and tier2 fastdebug and release on MacOSX, Linux 86_84 and Linux 86_32. > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup old oop map cache entry after class redefinition src/hotspot/share/interpreter/oopMapCache.cpp line 506: > 504: > 505: if (Atomic::cmpxchg(&_array[i], entry, (OopMapCacheEntry*)nullptr, memory_order_relaxed) == entry) { > 506: enqueue_for_cleanup(entry); OopMapCache::flush_obsolete_entries() is called from VM_RedefineClasses::doit(). Enqueuing OopMapCacheEntries for cleanup means that we would need call OopMapCache::cleanup_old_entries() in VM_RedefineClasses::doit_epilogue() ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16074#discussion_r1355360462 From zgu at openjdk.org Wed Oct 11 18:50:18 2023 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 11 Oct 2023 18:50:18 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v2] In-Reply-To: References: Message-ID: <0YGIJj0crvW3iZ_fd5MlNJ8UxWPECo9XgVppni70cjQ=.391f488a-077a-47ee-9cc4-9e47bceeec65@github.com> On Wed, 11 Oct 2023 16:40:57 GMT, Leela Mohan Venati wrote: >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup old oop map cache entry after class redefinition > > src/hotspot/share/interpreter/oopMapCache.cpp line 506: > >> 504: >> 505: if (Atomic::cmpxchg(&_array[i], entry, (OopMapCacheEntry*)nullptr, memory_order_relaxed) == entry) { >> 506: enqueue_for_cleanup(entry); > > OopMapCache::flush_obsolete_entries() is called from VM_RedefineClasses::doit(). Enqueuing OopMapCacheEntries for cleanup means that we would need call OopMapCache::cleanup_old_entries() in VM_RedefineClasses::doit_epilogue() Good catch. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16074#discussion_r1355594222 From kbarrett at openjdk.org Wed Oct 11 19:20:06 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 11 Oct 2023 19:20:06 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v6] In-Reply-To: References: Message-ID: > Please review this new facility, providing a general mechanism for intrusive > doubly-linked lists. A class supports inclusion in a list by having an > IntrusiveListEntry member, and providing structured information about how to > access that member. A class supports inclusion in multiple lists by having > multiple IntrusiveListEntry members, with different lists specified to use > different members. > > The IntrusiveList class template provides the list management. It is modelled > on bidirectional containers such as std::list and boost::intrusive::list, > providing many of the expected member types and functions. (Note that the > member types use the Standard's naming conventions.) (Not all standard > container requirements are met; some operations are not presently supported > because they haven't been needed yet.) This includes iteration support using > (mostly) standard-conforming iterator types (they are presently missing > iterator_category member types, pending being able to include so we > can use std::bidirectional_iterator_tag). > > This change only provides the new facility, and doesn't include any uses of > it. It is intended to replace the 4-5 (or maybe more) competing intrusive > doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of > those alterantives, this proposal provides a suite of unit tests. > > An example of a place that I think might benefit from this is G1's region > handling. There are various places where G1 iterates over all regions in order > to do something with those which satisfy some property (humongous regions, > regions in the collection set, &etc). If it were trivial to create new region > sublists (and this facility makes that easy), some of these could be turned > into direct iteration over only the regions of interest. > > Some specific points to consider when reviewing this proposal: > > (1) This proposal follows Standard Library API conventions, which differ from > HotSpot in various ways. > > (1a) Lists and iterators provide various type members, with names per the > Standard Library. There has been discussion of using some parts of the > Standard Library eventually, in which case this would be important. But for > now some of the naming choices are atypical for HotSpot. > > (1b) Some of the function signatures follow the Standard Library APIs even > though the reasons for that form might not apply to HotSpot. For example, the > list pop operations don't return the removed value. For node-based containers > in Standard Library that would introduce exception... Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: initialize disposed array in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15896/files - new: https://git.openjdk.org/jdk/pull/15896/files/cd98eee5..4a959bd7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15896/head:pull/15896 PR: https://git.openjdk.org/jdk/pull/15896 From vlivanov at openjdk.org Wed Oct 11 19:51:16 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 11 Oct 2023 19:51:16 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v9] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 11 Oct 2023 17:29:11 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Add TestDenormalDouble.java Overall, looks good. Primarily, I'm curious why did you decide to fix JNI on x86-64, but not on AArch64? FTR the fix covers shared library case on both architectures. > In addition, I dropped the proposed warning at safepoints if the handling of denormal floating-point changes. It still makes sense to introduce an assert checking that FP operations behave according to the spec. The invariant should hold for any JavaThread on any thread state transition. But I'm fine with handling it separately. Some minor comments follow. src/hotspot/os/bsd/os_bsd.cpp line 977: > 975: > 976: void *os::Bsd::dlopen_helper(const char *filename, int mode) { > 977: #if defined(__GNUC__) What's the intention of limiting it to GCC on BSD? `os_linux.cpp` doesn't have it. Also, since 3rd party libraries are the root of the problem, does the information about toolchain JDK is built with help in any way? src/hotspot/os/bsd/os_bsd.cpp line 1001: > 999: static const volatile double thresh > 1000: = jdouble_cast(0x0000000000000003); // 0x0.0000000000003p-1022; > 1001: if (unity + thresh == unity || -unity - thresh == -unity) { The check is duplicated in 4 places (twice in os_bsd.cpp and twice in os_linux.cpp). Maybe extract it into some central shared location? Also, unity and thresh constants are duplicated in `stubGenerator_x86_64.cpp`. Why don't you place everything on `StubRoutines` instead? test/hotspot/jtreg/compiler/floatingpoint/libfast-math.c line 26: > 24: #include > 25: #include > 26: #include "jni.h" Redundant includes? test/hotspot/jtreg/compiler/floatingpoint/libfast-math.c line 39: > 37: static void __attribute__((constructor)) set_flush_to_zero(void) { > 38: > 39: #if defined(__x86_64__) && defined(SSE) x86-64 implies SSE2, doesn't it? test/hotspot/jtreg/compiler/floatingpoint/libfast-math.c line 51: > 49: { __asm__ __volatile__ ("msr fpcr, %0" : : "r" (fpcr)); } > 50: /* Flush to zero, round to nearest, IEEE exceptions disabled. */ > 51: _FPU_SETCW (_FPU_FPCR_FZ); Macros make it a bit harder to decipher what happens there. At least, I'd suggest to change formatting around `_FPU_SETCW`. At first glance, it looks like it is part of the macro definition. ------------- PR Review: https://git.openjdk.org/jdk/pull/10661#pullrequestreview-1672027282 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355492626 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355478109 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355523173 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355438801 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355681073 From vlivanov at openjdk.org Wed Oct 11 19:54:37 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 11 Oct 2023 19:54:37 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v8] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <1bpDDmcV654Y0ZlXd-tkddf8syD6JrlLQ70BnhG6EwE=.0eb242bb-bede-404f-b37e-9d7328e261b3@github.com> Message-ID: On Wed, 11 Oct 2023 18:43:31 GMT, Thomas Stuefe wrote: >> `thresh` is volatile. What more is needed? > > Argh, sorry, missed that volatile. And I was confused at first why there's a volatile on `tresh`. A short comment describing the intentions would definitely help here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1355695380 From dnsimon at openjdk.org Wed Oct 11 21:11:41 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 11 Oct 2023 21:11:41 GMT Subject: RFR: 8317689: [JVMCI] include error message when CreateJavaVM in libgraal fails [v2] In-Reply-To: References: Message-ID: > Creating a new libgraal isolate can fail for a number of reasons. Currently, all that one sees on such a failure is a numeric error code. For example: > > > 2096 20291 4 java.lang.CharacterData::of (136 bytes) > 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024) > > > Native Image is being enhanced to return an error message along with an error code by a non-standard `_strerror` argument passed to the `CreateJavaVM` JNI invocation interface function: > > > |---------------|-----------------------------------------------------------------------------------| > | _strerror | extraInfo is a "const char**" value. | > | | If CreateJavaVM returns non-zero, then extraInfo is assigned a newly malloc'ed | > | | 0-terminated C string describing the error if a description is available. | > |---------------|-----------------------------------------------------------------------------------| > > > This PR updates JVMCI to take advantage of this Native Image enhancement. > > This is sample `-XX:+PrintCompilation` output from testing this PR on libgraal: > > 2096 20291 4 java.lang.CharacterData::of (136 bytes) > 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024, Image page size is incompatible with run-time page size. Rebuild image with -H:PageSize=[pagesize] to set appropriately.) Doug Simon has updated the pull request incrementally with one additional commit since the last revision: renamed _strerror to _createvm_errorstr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16086/files - new: https://git.openjdk.org/jdk/pull/16086/files/a4592446..8f72f1e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16086&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16086&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16086.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16086/head:pull/16086 PR: https://git.openjdk.org/jdk/pull/16086 From svkamath at openjdk.org Wed Oct 11 22:05:08 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Wed, 11 Oct 2023 22:05:08 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v8] In-Reply-To: References: Message-ID: > Hi All, > I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. > > Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: > > |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup > |-------------|------------|---------------|------------------|-----------| > |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 > full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 > small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 > small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 > full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 > full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 > small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 > small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 > full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 > small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 > small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 > full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 > full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 > small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 > small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 > full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 > small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 > small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 > full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 > full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 > small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 > small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 > full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 > small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 > small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 > full.AESGCMBench.decryptMultiPart | 65536 | 42649.816 | 47591.587 |1.11 > full.AESGCMBe... Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Updated comments, removed unused labels ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15410/files - new: https://git.openjdk.org/jdk/pull/15410/files/7ce8068f..3551cefe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15410&range=06-07 Stats: 7 lines in 2 files changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/15410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15410/head:pull/15410 PR: https://git.openjdk.org/jdk/pull/15410 From manc at openjdk.org Wed Oct 11 23:54:22 2023 From: manc at openjdk.org (Man Cao) Date: Wed, 11 Oct 2023 23:54:22 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v18] In-Reply-To: References: Message-ID: On Fri, 15 Sep 2023 01:27:04 GMT, David Holmes wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up test and improve total counter name > > src/hotspot/share/runtime/perfData.hpp line 64: > >> 62: COM_THREADS, >> 63: SUN_THREADS, >> 64: SUN_THREADS_GCCPU, // Subsystem for Sun Threads GC CPU > > Really not sure about this naming ... +1, dropping the "GC" seems better, i.e. `SUN_THREADS_CPUTIME` and `sun.threads.cpu_time`. For example, `sun.threads.gc_cpu_time.vm` is strange since VM thread also does work unrelated to GC. For @simonis's point about avoid duplicating the "g1" part in each counter's name, I think it is doable. How about the following list of names? sun.threads.total_gc_cpu_time // Unchanged. Would sun.threads.cpu_time.gc_total look better? sun.threads.cpu_time.gc_parallel_workers sun.threads.cpu_time.gc_conc_mark sun.threads.cpu_time.gc_conc_refine sun.threads.cpu_time.vm sun.threads.cpu_time.conc_dedup `gc_conc_mark` and `gc_conc_refine` are currently tied to G1. It seems OK because these counters would not exist if G1 is not selected. If other collectors want to implement `gc_conc_mark` in the future, they could implement their own definition of this counter, or move G1's definition to a shared place. @simonis does the list of names above look good to you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1355842180 From manc at openjdk.org Wed Oct 11 23:54:19 2023 From: manc at openjdk.org (Man Cao) Date: Wed, 11 Oct 2023 23:54:19 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v27] In-Reply-To: <0AHhZD1JncFwC7z5rp1uufF4FAMrKR8mQXdDuwKNL4s=.02114c12-5a6b-4d3e-8b7a-421bb3eb8d47@github.com> References: <0AHhZD1JncFwC7z5rp1uufF4FAMrKR8mQXdDuwKNL4s=.02114c12-5a6b-4d3e-8b7a-421bb3eb8d47@github.com> Message-ID: On Thu, 5 Oct 2023 03:00:36 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > add comment and change if defined to ifdef Changes requested by manc (Committer). src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 900: > 898: // behavior, we should rethink if it is still safe. > 899: gc_threads_do(&tttc); > 900: } It should also call `CollectedHeap::publish_total_cpu_time()` here, right? src/hotspot/share/gc/shared/collectedHeap.cpp line 298: > 296: NOT_PRODUCT(_promotion_failure_alot_gc_number = 0;) > 297: > 298: if (UsePerfData && os::is_thread_cpu_time_supported()) { This condition should be a nested if inside `if (UsePerfData)`: if (os::is_thread_cpu_time_supported()) { _total_cpu_time = ...; _perf_parallel_worker_threads_cpu_time = ...; } Otherwise `_perf_gc_cause` and `_perf_gc_lastcause` could be broken. src/hotspot/share/gc/shared/collectedHeap.hpp line 211: > 209: }; > 210: > 211: Nit: unnecessary new line. src/hotspot/share/runtime/vmThread.cpp line 134: > 132: _terminate_lock = new Monitor(Mutex::nosafepoint, "VMThreadTerminate_lock"); > 133: > 134: if (UsePerfData && os::is_thread_cpu_time_supported()) { Similarly, check for `os::is_thread_cpu_time_supported()` should be a inner nested if, to avoid breaking `_perf_accumulated_vm_operation_time`. test/jdk/sun/tools/jcmd/TestGcCounters.java line 1: > 1: import static jdk.test.lib.Asserts.*; New file should have Copyright header comment. You can copy the header from test/jdk/javax/imageio/plugins/jpeg/LargeAdobeMarkerSegmentTest.java. ------------- PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1672683420 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1355881805 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1355874329 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1355880603 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1355891495 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1355839803 From manc at openjdk.org Thu Oct 12 01:31:15 2023 From: manc at openjdk.org (Man Cao) Date: Thu, 12 Oct 2023 01:31:15 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v27] In-Reply-To: <0AHhZD1JncFwC7z5rp1uufF4FAMrKR8mQXdDuwKNL4s=.02114c12-5a6b-4d3e-8b7a-421bb3eb8d47@github.com> References: <0AHhZD1JncFwC7z5rp1uufF4FAMrKR8mQXdDuwKNL4s=.02114c12-5a6b-4d3e-8b7a-421bb3eb8d47@github.com> Message-ID: On Thu, 5 Oct 2023 03:00:36 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > add comment and change if defined to ifdef test/jdk/sun/tools/jcmd/TestGcCounters.java line 17: > 15: public class TestGcCounters { > 16: > 17: private static final String[] VM_ARGS = new String[] { "-XX:+UsePerfData" }; The `VM_ARGS` is passed to the jcmd subprocess via `JcmdBase.jcmd()`. This is probably not what we want, we want to pass `-XX:+UsePerfData` to the main process that runs `TestGcCounters.main()`. We probably need this on line 13: @run main/othervm -XX:+UsePerfData TestGcCounters ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1355941023 From dlong at openjdk.org Thu Oct 12 02:10:21 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 12 Oct 2023 02:10:21 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object Yes, please add the new test. But now I'm confused. Is the load_interpreter_state() change needed or not? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1758799862 From thartmann at openjdk.org Thu Oct 12 05:06:22 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 12 Oct 2023 05:06:22 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v2] In-Reply-To: References: Message-ID: <9JLqpJazC8z28PDyWDNmcMdKuUv9Xvc6cwZ7ZEIcxS8=.b3f5c1de-cc69-45a7-b4d4-f6498ecb6e81@github.com> On Mon, 9 Oct 2023 21:47:45 GMT, Cesar Soares Lucas wrote: >> I didn't look at this in detail yet but submitted testing. I see the following failures. >> >> `compiler/eliminateAutobox/TestByteBoxing.java` with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`: >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/workspace/open/src/hotspot/share/opto/loopnode.cpp:2178), pid=951972, tid=951999 >> # assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed >> # >> # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2) >> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc >> >> Current CompileTask: >> C2: 1438 263 % b compiler.eliminateAutobox.TestByteBoxing::main @ 1358 (1805 bytes) >> >> Stack: [0x00007f0efc9cb000,0x00007f0efcacb000], sp=0x00007f0efcac57a0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc (loopnode.cpp:2178) >> V [libjvm.so+0x1256ead] PathFrequency::to(Node*)+0x70d (loopPredicate.cpp:988) >> V [libjvm.so+0x1258b49] PhaseIdealLoop::loop_predication_impl(IdealLoopTree*)+0x8e9 (loopPredicate.cpp:1462) >> V [libjvm.so+0x125989a] IdealLoopTree::loop_predication(PhaseIdealLoop*)+0x9a (loopPredicate.cpp:1536) >> V [libjvm.so+0x12a28d7] PhaseIdealLoop::build_and_optimize()+0xf57 (loopnode.cpp:4582) >> V [libjvm.so+0x9ee7fb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1114) >> V [libjvm.so+0x9e9db6] Compile::Optimize()+0xdf6 (compile.cpp:2362) >> >> >> `compiler/eliminateAutobox/TestByteBoxing.java` with `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers`: >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (workspace/open/src/hotspot/share/opto/loopnode.cpp:6035), pid=1353611, tid=1353627 >> # Error: ShouldNotReachHere() >> # >> # JRE version: Java(TM) SE Runtime Envi... > > Hello @TobiHartmann, I pushed a fix for the test failures that you reported. Could you please re-run your tests? Thank you. Hi @JohnTortugo, sure. I re-submitted testing and will report back once it finished. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1758916712 From djelinski at openjdk.org Thu Oct 12 06:24:23 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 12 Oct 2023 06:24:23 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v8] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 22:05:08 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments, removed unused labels Thanks for bearing with me. LGTM. ------------- Marked as reviewed by djelinski (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15410#pullrequestreview-1673128998 From phofer at openjdk.org Thu Oct 12 09:09:19 2023 From: phofer at openjdk.org (Peter Hofer) Date: Thu, 12 Oct 2023 09:09:19 GMT Subject: RFR: 8317689: [JVMCI] include error message when CreateJavaVM in libgraal fails [v2] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 21:11:41 GMT, Doug Simon wrote: >> Creating a new libgraal isolate can fail for a number of reasons. Currently, all that one sees on such a failure is a numeric error code. For example: >> >> >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024) >> >> >> Native Image is being [enhanced](https://github.com/oracle/graal/pull/7590/files#diff-bbd89cfdcf392cf7086e19898a19e85ddf820f2f1f52eec989223304aab77947R210-R213) to return an error message along with an error code by a non-standard `_createvm_errorstr` argument passed to the `CreateJavaVM` JNI invocation interface function: >> >> >> |--------------------|-----------------------------------------------------------------------------------| >> | _createvm_errorstr | extraInfo is a "const char**" value. | >> | | If CreateJavaVM returns non-zero, then extraInfo is assigned a newly malloc'ed | >> | | 0-terminated C string describing the error if a description is available, | >> | | otherwise extraInfo is set to null. | >> |--------------------|-----------------------------------------------------------------------------------| >> >> >> This PR updates JVMCI to take advantage of this Native Image enhancement. >> >> This is sample `-XX:+PrintCompilation` output from testing this PR on libgraal: >> >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024, Image page size is incompatible with run-time page size. Rebuild image with -H:PageSize=[pagesize] to set appropriately.) > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > renamed _strerror to _createvm_errorstr Marked as reviewed by phofer (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/16086#pullrequestreview-1673669271 From rrich at openjdk.org Thu Oct 12 09:15:18 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 12 Oct 2023 09:15:18 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v17] In-Reply-To: <7Yql4JbKg57I3Vi3BoKoyD8QleWQ0wPh_yBh6bNjhj4=.31078396-6df1-4439-9504-d84b62442f13@github.com> References: <7Wf_JP97N60baqhknTPclz-KiF8cBARxqTD8KtVfD0w=.631369a7-e665-406a-8e10-4942893dc578@github.com> <_jwOMcqQDRIjTc-CLlvGk42ySeBHAHa5OP1KxYiFWu8=.7735fa6f-4afb-45fa-b594-4ccba269be79@github.com> <7Yql4JbKg57I3Vi3BoKoyD8QleWQ0wPh_yBh6bNjhj4=.31078396-6df1-4439-9504-d84b62442f13@github.com> Message-ID: On Wed, 11 Oct 2023 17:01:05 GMT, Richard Reingruber wrote: >> Yes indeed. Looks correct to me. Nice simplification! > >> No, I mean L212 to L228. "Comment on lines +212 to +228" > > I looked at this at first but thought it is obviously wrong if `obj` is precisely marked after a previous young collection. But if it is precisely marked it is of course correct scan from the beginning of the dirty chunk (addr_l) to the obj/stripe end. I get `assert(should_scavenge(p, true)) failed: revisiting object?` with test/jdk/java/lang/Thread/virtual/stress/Skynet.java when the change is applied. It reproduces well but not 100%. Need to look into it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1356529361 From fweimer at openjdk.org Thu Oct 12 09:26:24 2023 From: fweimer at openjdk.org (Florian Weimer) Date: Thu, 12 Oct 2023 09:26:24 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v9] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 11 Oct 2023 17:29:11 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Add TestDenormalDouble.java test/hotspot/jtreg/compiler/floatingpoint/libfast-math.c line 37: > 35: > 36: #if defined(__GNUC__) > 37: static void __attribute__((constructor)) set_flush_to_zero(void) { Maybe add a comment that the ELF constructor is there to mimic historic GCC `-ffast-math` behavior on compilers that have been fixed not to do this for `-shared`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1356543707 From mdoerr at openjdk.org Thu Oct 12 10:09:19 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 12 Oct 2023 10:09:19 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object Hi Dean, let me try to summarize my experiments: - This PR is one solution which makes all tests pass. The other experiments don't use this PR. - The reversed order in `load_interpreter_state` makes the nsk/jdi/StepEvent pass, but my new test failing on PPC64. In addition, nsk/jvmti/GetObjectMonitorUsage/objmonusage006/TestDescription.java fails on all platforms with that change. So, we can't use it. - I couldn't reproduce any unlock issue on x86_64 with the checks I had implemented (regardless of the `load_interpreter_state` change) except objmonusage006 with `load_interpreter_state` change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1759311882 From rrich at openjdk.org Thu Oct 12 10:32:13 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 12 Oct 2023 10:32:13 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v18] In-Reply-To: References: Message-ID: <_zdWXMwfI--pEBEBq1iiZ2tZm64aU6-b0T4Qg-cndrc=.5d97336e-b7e8-4867-b447-7ca3c89e4fb5@github.com> > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with four additional commits since the last revision: - Make sure to scan obj reaching in just once - Simplification suggested by Albert - Don't overlap card table processing with scavenging for simplicity - Cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/8b544d84..c9e040f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=16-17 Stats: 83 lines in 2 files changed: 6 ins; 63 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Thu Oct 12 10:38:19 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 12 Oct 2023 10:38:19 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v17] In-Reply-To: References: <7Wf_JP97N60baqhknTPclz-KiF8cBARxqTD8KtVfD0w=.631369a7-e665-406a-8e10-4942893dc578@github.com> <_jwOMcqQDRIjTc-CLlvGk42ySeBHAHa5OP1KxYiFWu8=.7735fa6f-4afb-45fa-b594-4ccba269be79@github.com> <7Yql4JbKg57I3Vi3BoKoyD8QleWQ0wPh_yBh6bNjhj4=.31078396-6df1-4439-9504-d84b62442f13@github.com> Message-ID: On Thu, 12 Oct 2023 09:12:47 GMT, Richard Reingruber wrote: >>> No, I mean L212 to L228. "Comment on lines +212 to +228" >> >> I looked at this at first but thought it is obviously wrong if `obj` is precisely marked after a previous young collection. But if it is precisely marked it is of course correct scan from the beginning of the dirty chunk (addr_l) to the obj/stripe end. > > I get `assert(should_scavenge(p, true)) failed: revisiting object?` with test/jdk/java/lang/Thread/virtual/stress/Skynet.java when the change is applied. > > It reproduces well but not 100%. Need to look into it. `obj` can be scanned twice if it reaches into the stripe because `obj.addr > start` will still false upon the 2nd time. Fix is to check `i_addr > start`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1356624126 From ihse at openjdk.org Thu Oct 12 10:41:27 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 12 Oct 2023 10:41:27 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v9] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <7YMwHH-z-wkcCdp0xlVb41YGeX2TxY6XZqaXajywero=.206f5308-52ed-4e8c-bf53-fa803bfbab49@github.com> On Wed, 11 Oct 2023 17:29:11 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Add TestDenormalDouble.java The latest build changes look trivially good. (If you decide to make other changes to the build system, please await a re-review of build changes before pushing) ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10661#pullrequestreview-1673840086 From rrich at openjdk.org Thu Oct 12 10:53:14 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 12 Oct 2023 10:53:14 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v19] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Re-cleanup (was accidentally reverted) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/c9e040f5..443f4826 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=17-18 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From ayang at openjdk.org Thu Oct 12 11:10:34 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 12 Oct 2023 11:10:34 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v19] In-Reply-To: References: Message-ID: On Thu, 12 Oct 2023 10:53:14 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Re-cleanup (was accidentally reverted) Could you merge master? I think the patch is in a good state. Best to do some final perf/correctness testing. I have only some minor comments/suggestions. Very subjective, so up to you. src/hotspot/share/gc/parallel/psCardTable.cpp line 156: > 154: > 155: // Helper struct to keep the following code compact. > 156: struct Obj { The scan-dirty-chunk code below is already quite compact; this struct-def is bulky in contrast. Maybe using plain variables inside while-loop is sufficient. Not sure either is definitely superior though. src/hotspot/share/gc/parallel/psCardTable.cpp line 165: > 163: is_obj_array(obj->is_objArray()), > 164: end_addr(addr + obj->size()) {} > 165: void next() { Maybe `move_to_next` so that it's clear that it's an action not a accessor (noun)? src/hotspot/share/gc/parallel/psCardTable.cpp line 307: > 305: HeapWord* start_addr; > 306: HeapWord* end_addr; > 307: DEBUG_ONLY(HeapWord* _prev_query); This debug-field clutters the real logic, IMO. src/hotspot/share/gc/parallel/psCardTable.hpp line 52: > 50: #ifdef ASSERT > 51: , _table_end((const CardValue*)(uintptr_t(_table) + (align_up(stripe.byte_size(), _card_size) >> _card_shift))) > 52: #endif Not sure about its usefulness; the logic in the caller is super clear while this assertion logic obstructs the flow. src/hotspot/share/gc/parallel/psCardTable.hpp line 82: > 80: const CardValue* const end) { > 81: for (const CardValue* i = start; i < end; ++i) { > 82: if (!is_clean(i)) { Better to use `is_dirty` to match the method name. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14846#pullrequestreview-1673870236 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1356657815 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1356650924 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1356652217 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1356649524 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1356646834 From amitkumar at openjdk.org Thu Oct 12 14:04:09 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 12 Oct 2023 14:04:09 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v6] In-Reply-To: <17_uqfnizaJ93-eAhf9C0n5HGG-hGo8yxaIulgxM9q4=.6d950430-46b4-42a2-ac86-c84ae06df431@github.com> References: <17_uqfnizaJ93-eAhf9C0n5HGG-hGo8yxaIulgxM9q4=.6d950430-46b4-42a2-ac86-c84ae06df431@github.com> Message-ID: On Wed, 27 Sep 2023 18:25:02 GMT, Aleksey Shipilev wrote: >> I see the following disclaimer in the description: >>> Work in progress, submitting for broader attention. >> >> What are next steps and, more broadly, what's left to get the PR finalized? Especially, it's not clear how much performance testing it went through so far. > >> What are next steps and, more broadly, what's left to get the PR finalized? Especially, it's not clear how much performance testing it went through so far. > > I was waiting on Derek White to publish their benchmarks, so that we can decide reasonable defaults. While it seems too early to run larger scale benchmarks, feel free to give it a spin. Hi @shipilev, Please include s390 patch from here: [s390_port.patch](https://github.com/openjdk/jdk/files/12882890/s390_port.patch) Benchmark: Z15 Machine, with 64 CPUs -XX:SecondarySuperMissBackoff=0 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 10655.113 ? 6148.685 ns/op SecondarySuperCache.uncontended avgt 15 107.9 ? 54.109 ns/op -XX:SecondarySuperMissBackoff=10 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 2541.671 ? 1048.361 ns/op SecondarySuperCache.uncontended avgt 15 120.107 ? 63.345 ns/op -XX:SecondarySuperMissBackoff=100 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 662.473 ? 139.208 ns/op SecondarySuperCache.uncontended avgt 15 79.371 ? 40.232 ns/op -XX:SecondarySuperMissBackoff=1000 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 248.292 ? 77.175 ns/op SecondarySuperCache.uncontended avgt 15 87.333 ? 35.103 ns/op -XX:SecondarySuperMissBackoff=10000 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 139.901 ? 44.83 ns/op SecondarySuperCache.uncontended avgt 15 90.872 ? 31.791 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1759621237 From mdoerr at openjdk.org Thu Oct 12 14:04:11 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 12 Oct 2023 14:04:11 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v8] In-Reply-To: References: Message-ID: On Wed, 27 Sep 2023 19:46:39 GMT, Aleksey Shipilev wrote: >> Work in progress, submitting for broader attention. >> >> See more details in the bug and related issues. >> >> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. >> >> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. >> >> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. >> >> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. >> >> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Correct type for flag Seems like you have posted the wrong patch version. > Hi @shipilev, > > Please include s390 patch from here: > > [s390_port.patch](https://github.com/openjdk/jdk/files/12882890/s390_port.patch) > > Benchmark: Z15 Machine, with 64 CPUs > > ``` > -XX:SecondarySuperMissBackoff=0 > Benchmark Mode Cnt Score Error Units > SecondarySuperCache.contended avgt 15 10655.113 ? 6148.685 ns/op > SecondarySuperCache.uncontended avgt 15 107.9 ? 54.109 ns/op > > -XX:SecondarySuperMissBackoff=10 > Benchmark Mode Cnt Score Error Units > SecondarySuperCache.contended avgt 15 2541.671 ? 1048.361 ns/op > SecondarySuperCache.uncontended avgt 15 120.107 ? 63.345 ns/op > -XX:SecondarySuperMissBackoff=100 > > Benchmark Mode Cnt Score Error Units > SecondarySuperCache.contended avgt 15 662.473 ? 139.208 ns/op > SecondarySuperCache.uncontended avgt 15 79.371 ? 40.232 ns/op > > -XX:SecondarySuperMissBackoff=1000 > Benchmark Mode Cnt Score Error Units > SecondarySuperCache.contended avgt 15 248.292 ? 77.175 ns/op > SecondarySuperCache.uncontended avgt 15 87.333 ? 35.103 ns/op > > -XX:SecondarySuperMissBackoff=10000 > Benchmark Mode Cnt Score Error Units > SecondarySuperCache.contended avgt 15 139.901 ? 44.83 ns/op > SecondarySuperCache.uncontended avgt 15 90.872 ? 31.791 ns/op > ``` Seems like you have posted the wrong patch version. Please check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1759650990 From amitkumar at openjdk.org Thu Oct 12 14:04:12 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 12 Oct 2023 14:04:12 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v8] In-Reply-To: References: Message-ID: On Thu, 12 Oct 2023 13:49:39 GMT, Martin Doerr wrote: > Seems like you have posted the wrong patch version. Oops, you're right. ``` diff commit c70a38493746f658511a9c018b2241d5e33b2917 Author: Amit Kumar Date: Thu Sep 28 09:03:33 2023 +0530 s390 port diff --git a/src/hotspot/cpu/s390/globals_s390.hpp b/src/hotspot/cpu/s390/globals_s390.hpp index 895a3751777..ba42f15bb37 100644 --- a/src/hotspot/cpu/s390/globals_s390.hpp +++ b/src/hotspot/cpu/s390/globals_s390.hpp @@ -78,8 +78,7 @@ define_pd_global(bool, CompactStrings, true); // 8146801 (Short Array Allocation): No performance work done here yet. define_pd_global(intx, InitArrayShortSize, 1*BytesPerLong); -// Not implemented yet -define_pd_global(uint, SecondarySuperMissBackoff, 0); +define_pd_global(uint, SecondarySuperMissBackoff, 1000); #define ARCH_FLAGS(develop, \ product, \ diff --git a/src/hotspot/cpu/s390/macroAssembler_s390.cpp b/src/hotspot/cpu/s390/macroAssembler_s390.cpp index d95a0b3a3c5..69d305274e9 100644 --- a/src/hotspot/cpu/s390/macroAssembler_s390.cpp +++ b/src/hotspot/cpu/s390/macroAssembler_s390.cpp @@ -3027,7 +3027,8 @@ void MacroAssembler::check_klass_subtype_slow_path(Register Rsubklass, Label* L_failure) { // Input registers must not overlap. // Also check for R1 which is explicitly used here. - assert_different_registers(Z_R1, Rsubklass, Rsuperklass, Rarray_ptr, Rlength); + const Register temp = Z_R1_scratch; + assert_different_registers(temp, Rsubklass, Rsuperklass, Rarray_ptr, Rlength); NearLabel L_fallthrough; int label_nulls = 0; if (L_success == nullptr) { L_success = &L_fallthrough; label_nulls++; } @@ -3036,6 +3037,7 @@ void MacroAssembler::check_klass_subtype_slow_path(Register Rsubklass, const int ss_offset = in_bytes(Klass::secondary_supers_offset()); const int sc_offset = in_bytes(Klass::secondary_super_cache_offset()); + const int sm_offset = in_bytes(JavaThread::backoff_secondary_super_miss_offset()); const int length_offset = Array::length_offset_in_bytes(); const int base_offset = Array::base_offset_in_bytes(); @@ -3060,7 +3062,7 @@ void MacroAssembler::check_klass_subtype_slow_path(Register Rsubklass, // No match yet, so we must walk the array's elements. z_lngfr(Rlength, Rlength); z_sllg(Rlength, Rlength, LogBytesPerWord); // -#bytes of cache array - z_llill(Z_R1, BytesPerWord); // Set increment/end index. + z_llill(temp, BytesPerWord); // Set increment/end index. add2reg(Rlength, 2 * BytesPerWord); // start index = -(n-2)*BytesPerWord z_slgr(Rarray_ptr, Rlength); // start addr: += (n-2)*BytesPerWord z_bru(loop_count); @@ -3069,7 +3071,7 @@ void MacroAssembler::check_klass_subtype_slow_path(Register Rsubklass, z_cg(Rsuperklass, base_offset, Rlength, Rarray_ptr); // Check array element for match. z_bre(match); BIND(loop_count); - z_brxlg(Rlength, Z_R1, loop_iterate); + z_brxlg(Rlength, temp, loop_iterate); // Rsuperklass not found among secondary super classes -> failure. branch_optimized(Assembler::bcondAlways, *L_failure); @@ -3079,7 +3081,22 @@ void MacroAssembler::check_klass_subtype_slow_path(Register Rsubklass, BIND(match); - z_stg(Rsuperklass, sc_offset, Rsubklass); // Save result to cache. + // Success. Try to cache the super we found and proceed in triumph. + uint32_t super_cache_backoff = checked_cast(SecondarySuperMissBackoff); + if (super_cache_backoff > 0 && VM_Version::has_MemWithImmALUOps()) { + NearLabel L_skip; + z_asi(Address(Z_thread, sm_offset), -1); + branch_optimized(Assembler::bcondNotLow, L_skip); + + load_const_optimized(temp, super_cache_backoff); + z_st(temp, sm_offset, Z_thread); + + z_stg(Rsuperklass, sc_offset, Rsubklass); // Save result to cache. + + bind(L_skip); + } else { + z_stg(Rsuperklass, sc_offset, Rsubklass); // Save result to cache. + } final_jmp(*L_success); ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1759663458 From shade at openjdk.org Thu Oct 12 14:27:33 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 12 Oct 2023 14:27:33 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v8] In-Reply-To: References: Message-ID: On Thu, 12 Oct 2023 13:49:39 GMT, Martin Doerr wrote: > Please include s390 patch from here: Added! Please check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1759716812 From mdoerr at openjdk.org Thu Oct 12 14:29:27 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 12 Oct 2023 14:29:27 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes Message-ID: Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. ------------- Commit messages: - 8318015: Lock inflation not needed for OSR or Deopt for new locking modes Changes: https://git.openjdk.org/jdk/pull/16165/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16165&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318015 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16165.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16165/head:pull/16165 PR: https://git.openjdk.org/jdk/pull/16165 From rrich at openjdk.org Thu Oct 12 14:32:18 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 12 Oct 2023 14:32:18 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v20] In-Reply-To: References: Message-ID: <-KHu3mjJltdBchaw6Xb1l3Dr8Ri4DqF5cg_F-XHzP1k=.eb2ea3ec-3e08-450d-85c1-c0729b20d0c4@github.com> > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 32 additional commits since the last revision: - Merge branch 'master' - Re-cleanup (was accidentally reverted) - Make sure to scan obj reaching in just once - Simplification suggested by Albert - Don't overlap card table processing with scavenging for simplicity - Cleanup - Shadow table per stripe - find_first_clean_card: return end_card if final object extends beyond it. - Cleanup - Missed acquire semantics - ... and 22 more: https://git.openjdk.org/jdk/compare/880f96d2...381e001b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/443f4826..381e001b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=18-19 Stats: 99446 lines in 2718 files changed: 41295 ins; 19417 del; 38734 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From shade at openjdk.org Thu Oct 12 14:32:40 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 12 Oct 2023 14:32:40 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v9] In-Reply-To: References: Message-ID: > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - S390 implementation - Merge branch 'master' into JDK-8316180-backoff-secondary-super - Correct type for flag - Option is diagnostic, platform-dependent - Merge branch 'master' into JDK-8316180-backoff-secondary-super - Init with backoff right away - x86 cleanup - Denser AArch64 - Cleaner AArch64 code - Use proper 32-bit stores on AArch64 - ... and 10 more: https://git.openjdk.org/jdk/compare/4aa97c49...5806af4a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/8be561d7..5806af4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=07-08 Stats: 20134 lines in 562 files changed: 11222 ins; 4257 del; 4655 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From shade at openjdk.org Thu Oct 12 14:48:35 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 12 Oct 2023 14:48:35 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v10] In-Reply-To: References: Message-ID: > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Touchup benchmark metadata ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/5806af4a..48c67465 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=08-09 Stats: 24 lines in 1 file changed: 23 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From aph at openjdk.org Thu Oct 12 14:56:34 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 12 Oct 2023 14:56:34 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v8] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <1bpDDmcV654Y0ZlXd-tkddf8syD6JrlLQ70BnhG6EwE=.0eb242bb-bede-404f-b37e-9d7328e261b3@github.com> Message-ID: On Wed, 11 Oct 2023 18:41:08 GMT, Thomas Stuefe wrote: >> What difference does it make if it does? > > I was trying to understand the BSD+gcc combination. We use clang on MacOS, so the only platform I thought would be affected would be one of the BSDs. > > But I thought those also moved to clang in their builds. Hence my question. These third-party libraries can come from anywhere. Given that the overhead added to `dlopen()` is almost immeasurably small, I think it's better to keep it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1356951742 From aph at openjdk.org Thu Oct 12 14:56:41 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 12 Oct 2023 14:56:41 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v9] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 11 Oct 2023 17:57:27 GMT, Vladimir Ivanov wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Add TestDenormalDouble.java > > src/hotspot/os/bsd/os_bsd.cpp line 977: > >> 975: >> 976: void *os::Bsd::dlopen_helper(const char *filename, int mode) { >> 977: #if defined(__GNUC__) > > What's the intention of limiting it to GCC on BSD? `os_linux.cpp` doesn't have it. > > Also, since 3rd party libraries are the root of the problem, does the information about toolchain JDK is built with help in any way? Good point. I think the reason it's GCC-conditional is that a previous version used GCC extensions. I'll take it out. > src/hotspot/os/bsd/os_bsd.cpp line 1001: > >> 999: static const volatile double thresh >> 1000: = jdouble_cast(0x0000000000000003); // 0x0.0000000000003p-1022; >> 1001: if (unity + thresh == unity || -unity - thresh == -unity) { > > The check is duplicated in 4 places (twice in os_bsd.cpp and twice in os_linux.cpp). Maybe extract it into some central shared location? Also, unity and thresh constants are duplicated in `stubGenerator_x86_64.cpp`. Why don't you place everything on `StubRoutines` instead? OK. > test/hotspot/jtreg/compiler/floatingpoint/libfast-math.c line 39: > >> 37: static void __attribute__((constructor)) set_flush_to_zero(void) { >> 38: >> 39: #if defined(__x86_64__) && defined(SSE) > > x86-64 implies SSE2, doesn't it? I can't remember. I'll take it out. > test/hotspot/jtreg/compiler/floatingpoint/libfast-math.c line 51: > >> 49: { __asm__ __volatile__ ("msr fpcr, %0" : : "r" (fpcr)); } >> 50: /* Flush to zero, round to nearest, IEEE exceptions disabled. */ >> 51: _FPU_SETCW (_FPU_FPCR_FZ); > > Macros make it a bit harder to decipher what happens there. At least, I'd suggest to change formatting around `_FPU_SETCW`. At first glance, it looks like it is part of the macro definition. OK, I see. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1356948049 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1356921508 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1356919522 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1356953789 From aph at openjdk.org Thu Oct 12 14:56:42 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 12 Oct 2023 14:56:42 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v8] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <1bpDDmcV654Y0ZlXd-tkddf8syD6JrlLQ70BnhG6EwE=.0eb242bb-bede-404f-b37e-9d7328e261b3@github.com> Message-ID: <0vdNpovJYwX3AlMhs-ifh3oonY4lCckpnxN1Xvm3qOs=.fc0cd99d-a5cc-42fe-81ad-0fc54afffb7d@github.com> On Wed, 11 Oct 2023 19:51:34 GMT, Vladimir Ivanov wrote: > And I was confused at first why there's a volatile on `tresh`. A short comment describing the intentions would definitely help here. OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1356952291 From aph at openjdk.org Thu Oct 12 14:56:45 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 12 Oct 2023 14:56:45 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v9] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Thu, 12 Oct 2023 09:23:08 GMT, Florian Weimer wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Add TestDenormalDouble.java > > test/hotspot/jtreg/compiler/floatingpoint/libfast-math.c line 37: > >> 35: >> 36: #if defined(__GNUC__) >> 37: static void __attribute__((constructor)) set_flush_to_zero(void) { > > Maybe add a comment that the ELF constructor is there to mimic historic GCC `-ffast-math` behavior on compilers that have been fixed not to do this for `-shared`? OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1356953215 From rrich at openjdk.org Thu Oct 12 15:58:21 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 12 Oct 2023 15:58:21 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v20] In-Reply-To: <-KHu3mjJltdBchaw6Xb1l3Dr8Ri4DqF5cg_F-XHzP1k=.eb2ea3ec-3e08-450d-85c1-c0729b20d0c4@github.com> References: <-KHu3mjJltdBchaw6Xb1l3Dr8Ri4DqF5cg_F-XHzP1k=.eb2ea3ec-3e08-450d-85c1-c0729b20d0c4@github.com> Message-ID: On Thu, 12 Oct 2023 14:32:18 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 32 additional commits since the last revision: > > - Merge branch 'master' > - Re-cleanup (was accidentally reverted) > - Make sure to scan obj reaching in just once > - Simplification suggested by Albert > - Don't overlap card table processing with scavenging for simplicity > - Cleanup > - Shadow table per stripe > - find_first_clean_card: return end_card if final object extends beyond it. > - Cleanup > - Missed acquire semantics > - ... and 22 more: https://git.openjdk.org/jdk/compare/1fdb3ab3...381e001b JavadocHelperTest.java fails with parallel gc after merging master. I have opened https://bugs.openjdk.org/browse/JDK-8318025 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1759902498 From rrich at openjdk.org Thu Oct 12 16:24:17 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 12 Oct 2023 16:24:17 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v21] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Feedback Albert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/381e001b..d12e96e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=19-20 Stats: 47 lines in 2 files changed: 4 ins; 28 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Thu Oct 12 16:24:24 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 12 Oct 2023 16:24:24 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v19] In-Reply-To: References: Message-ID: On Thu, 12 Oct 2023 11:06:16 GMT, Albert Mingkun Yang wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Re-cleanup (was accidentally reverted) > > src/hotspot/share/gc/parallel/psCardTable.cpp line 156: > >> 154: >> 155: // Helper struct to keep the following code compact. >> 156: struct Obj { > > The scan-dirty-chunk code below is already quite compact; this struct-def is bulky in contrast. Maybe using plain variables inside while-loop is sufficient. Not sure either is definitely superior though. Ok. Done. > src/hotspot/share/gc/parallel/psCardTable.cpp line 165: > >> 163: is_obj_array(obj->is_objArray()), >> 164: end_addr(addr + obj->size()) {} >> 165: void next() { > > Maybe `move_to_next` so that it's clear that it's an action not a accessor (noun)? I removed the helper struct. > src/hotspot/share/gc/parallel/psCardTable.cpp line 307: > >> 305: HeapWord* start_addr; >> 306: HeapWord* end_addr; >> 307: DEBUG_ONLY(HeapWord* _prev_query); > > This debug-field clutters the real logic, IMO. I've removed it. > src/hotspot/share/gc/parallel/psCardTable.hpp line 52: > >> 50: #ifdef ASSERT >> 51: , _table_end((const CardValue*)(uintptr_t(_table) + (align_up(stripe.byte_size(), _card_size) >> _card_shift))) >> 52: #endif > > Not sure about its usefulness; the logic in the caller is super clear while this assertion logic obstructs the flow. I've reduce the code for the bounds check. > src/hotspot/share/gc/parallel/psCardTable.hpp line 82: > >> 80: const CardValue* const end) { >> 81: for (const CardValue* i = start; i < end; ++i) { >> 82: if (!is_clean(i)) { > > Better to use `is_dirty` to match the method name. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1357067138 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1357065828 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1357065987 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1357064658 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1357063772 From amenkov at openjdk.org Thu Oct 12 18:50:13 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 12 Oct 2023 18:50:13 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 06:11:29 GMT, Hannes Greule wrote: > > The fix itself looks good to me. How did you tested the change? Looks like we don't have test coverage for the correctness of the dumped fields. Would be nice to add it. > > Thanks. I ran `hotspot_serviceability` and also manually looked into more complex heap dumps. I agree that specific tests would be better. I'll need to figure out how that can be accomplished. If you have any pointers how to get started there, please let me know. We have test library to parse hprof files in test/lib/jdk/test/lib/hprof You can look at test/hotspot/jtreg/serviceability/jvmti/vthread/HeapDump/VThreadInHeapDump.java as an example of a test which generates heap dump for target application and verifies it contains expected data. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16083#issuecomment-1760192842 From jvernee at openjdk.org Thu Oct 12 19:53:54 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 12 Oct 2023 19:53:54 GMT Subject: Integrated: 8312522: Implementation of Foreign Function & Memory API In-Reply-To: References: Message-ID: On Tue, 1 Aug 2023 10:29:06 GMT, Jorn Vernee wrote: > This patch contains the implementation of the foreign linker & memory API JEP for Java 22. The initial patch is composed of commits brought over directly from the [panama-foreign repo](https://github.com/openjdk/panama-foreign). The main changes found in this patch come from the following PRs: > > 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous iterations supported converting Java strings to and from native strings in the UTF-8 encoding, we've extended the supported encoding to all the encodings found in the `java.nio.charset.StandardCharsets` class. > 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the `MemoryLayout::sequenceLayout` factory method which inferred the size of the sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A client is now required to explicitly specify the sequence size. > 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: `Linker::canonicalLayouts`, which exposes a map containing the platform-specific mappings of common C type names to memory layouts. > 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access varhandles, as well as byte offset and slice handles derived from memory layouts, now feature an additional 'base offset' coordinate that is added to the offset computed by the handle. This allows composing these handles with other offset computation strategies that may not be based on the same memory layout. This addresses use-cases where clients are working with 'dynamic' layouts, whose size might not be known statically, such as variable length arrays, or variable size matrices. > 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now redundant API. Clients can simply use the difference between the base address of two memory segments. > 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of `SegmentAllocator::allocateArray`, by renaming methods that both allocate + initialize memory segments to `allocateFrom`. (see the original PR for the problematic case) > 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the documentation for variadic functions. > 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to make sure the `jdk_foreign` tests can pass on 32-bit machines, taking linux-x86 as a test bed. > 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API required. The `Linker::nativeLinker` method is not longer allowed to throw an `UnsupportedOperationException` on ... This pull request has now been integrated. Changeset: 32ac72c3 Author: Jorn Vernee URL: https://git.openjdk.org/jdk/commit/32ac72c3d35138f5253e4defc948304ac3ea1b53 Stats: 4468 lines in 263 files changed: 2211 ins; 1196 del; 1061 mod 8312522: Implementation of Foreign Function & Memory API Co-authored-by: Maurizio Cimadamore Co-authored-by: Jorn Vernee Co-authored-by: Per Minborg Reviewed-by: dholmes, psandoz, mcimadamore, alanb ------------- PR: https://git.openjdk.org/jdk/pull/15103 From jjoo at openjdk.org Thu Oct 12 21:01:08 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 12 Oct 2023 21:01:08 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v28] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/590df03b..1e8c1a4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=26-27 Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Thu Oct 12 23:17:03 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 12 Oct 2023 23:17:03 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v29] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Add Copyright header to test and formatting changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/1e8c1a4e..fc5cf3d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=27-28 Stats: 28 lines in 2 files changed: 24 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From dlong at openjdk.org Thu Oct 12 23:29:15 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 12 Oct 2023 23:29:15 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object When I run TestUnlockOSR on x86_64 it doesn't trigger a C2 unlock because the OSR nmethod hits an uncommon trap first. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1760526844 From jjoo at openjdk.org Thu Oct 12 23:32:24 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 12 Oct 2023 23:32:24 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v27] In-Reply-To: References: <0AHhZD1JncFwC7z5rp1uufF4FAMrKR8mQXdDuwKNL4s=.02114c12-5a6b-4d3e-8b7a-421bb3eb8d47@github.com> Message-ID: On Wed, 11 Oct 2023 23:23:11 GMT, Man Cao wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment and change if defined to ifdef > > src/hotspot/share/gc/shared/collectedHeap.cpp line 298: > >> 296: NOT_PRODUCT(_promotion_failure_alot_gc_number = 0;) >> 297: >> 298: if (UsePerfData && os::is_thread_cpu_time_supported()) { > > This condition should be a nested if inside `if (UsePerfData)`: > > > if (os::is_thread_cpu_time_supported()) { > _total_cpu_time = ...; > _perf_parallel_worker_threads_cpu_time = ...; > } > > Otherwise `_perf_gc_cause` and `_perf_gc_lastcause` could be broken. Ah yes, good catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1357552757 From jjoo at openjdk.org Fri Oct 13 01:38:13 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Fri, 13 Oct 2023 01:38:13 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v30] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Add call to publish in parallel gc and update counter names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/fc5cf3d8..19fe9b3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=28-29 Stats: 31 lines in 9 files changed: 6 ins; 0 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Fri Oct 13 01:38:15 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Fri, 13 Oct 2023 01:38:15 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v18] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 22:25:04 GMT, Man Cao wrote: >> src/hotspot/share/runtime/perfData.hpp line 64: >> >>> 62: COM_THREADS, >>> 63: SUN_THREADS, >>> 64: SUN_THREADS_GCCPU, // Subsystem for Sun Threads GC CPU >> >> Really not sure about this naming ... > > +1, dropping the "GC" seems better, i.e. `SUN_THREADS_CPUTIME` and `sun.threads.cpu_time`. For example, `sun.threads.gc_cpu_time.vm` is strange since VM thread also does work unrelated to GC. > > For @simonis's point about avoid duplicating the "g1" part in each counter's name, I think it is doable. How about the following list of names? > > > sun.threads.total_gc_cpu_time // Unchanged. Would sun.threads.cpu_time.gc_total look better? > sun.threads.cpu_time.gc_parallel_workers > sun.threads.cpu_time.gc_conc_mark > sun.threads.cpu_time.gc_conc_refine > sun.threads.cpu_time.vm > sun.threads.cpu_time.conc_dedup > > > `gc_conc_mark` and `gc_conc_refine` are currently tied to G1. It seems OK because these counters would not exist if G1 is not selected. If other collectors want to implement `gc_conc_mark` in the future, they could implement their own definition of this counter, or move G1's definition to a shared place. > > @simonis does the list of names above look good to you? Updated the counters for now to these names, but open to renaming again based on feedback from @simonis! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1357639099 From dlong at openjdk.org Fri Oct 13 02:27:32 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 13 Oct 2023 02:27:32 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: <83QLHaEzML_o3epjMTb-Y79vIaZe77fn0DMxzZlZIm0=.50dcc0f4-7a9c-4f02-b496-e908278b71a6@github.com> On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object If I modify the test slightly to avoid the uncommon trap, I do hit the "Top of lock-stack does not match the unlocked object" stop on x86_64 with the new reversed order for load_interpreter_state. So it does look like that fix is not correct, and something else must be going wrong with nsk/jdi/StepEvent. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1760668257 From duke at openjdk.org Fri Oct 13 03:30:49 2023 From: duke at openjdk.org (xpbob) Date: Fri, 13 Oct 2023 03:30:49 GMT Subject: RFR: 8318058: Notify the jvm when the direct memory is oom Message-ID: Big data processes often experience situations where the direct memory oom process is alive but not serving properly. If the direct memory is oom, code can notify the jvm. Can bring the following benefits: 1. Analysis of direct memory Java. Nio. DirectByteBuffer need heapdumps reference relations. Can be used directly HeapDumpOnOutOfMemoryError. 2. In container environment, ExitOnOutOfMemoryError can be used to let the process that cannot provide services exit, so that the container can quickly pull up a new pod ------------- Commit messages: - 8318058: Notify the jvm when the direct memory is oom Changes: https://git.openjdk.org/jdk/pull/16176/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16176&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318058 Stats: 92 lines in 4 files changed: 87 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16176.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16176/head:pull/16176 PR: https://git.openjdk.org/jdk/pull/16176 From duke at openjdk.org Fri Oct 13 03:49:25 2023 From: duke at openjdk.org (nahidasu) Date: Fri, 13 Oct 2023 03:49:25 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v10] In-Reply-To: References: Message-ID: On Thu, 12 Oct 2023 14:48:35 GMT, Aleksey Shipilev wrote: >> Work in progress, submitting for broader attention. >> >> See more details in the bug and related issues. >> >> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. >> >> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. >> >> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. >> >> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. >> >> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Touchup benchmark metadata Hello, I'm Nahida from **Derek?s** team. We've conducted extensive testing on the patch using the specified test cases for both Shipilev and Mulugeta. We observed a similar trend across both benchmarks: as the number of SecondarySuperMissBackoff increases, the average time decreases. We ran both our Mulugeta test as well as the JMH test supplied in the patch with a larger machine with thread counts of 18, 60, and 240. For the Mulugeta?s test case, we explored three different scenarios: ? Equal Distribution (50% each): This represents the worst-case scenario. ? Exclusive Interface Calls (0%): Signifying the best-case scenario. ? 95% Same Interface, 5% Other Interface: Where 95% of the time is spent calling the same interface and 5% of the time on the other one. Here, sharing **Derek?s** perspective on this data: ?We see that it takes very little contention (5%) for the default behavior to perform poorly, and in the uncontended case there is no downside for using a large Backoff value. So backoff values of 1,000 or even 10,000 seem reasonable. This makes sense, because in the perfect world for the secondary supercache there is no update to the secondary supercache. Note in a previous version of the Mulugeta benchmark we tried increasing the length of the interface array being searched, and it made little performance impact until the interface depth got silly (100+). HW prefetch, OOO cores etc can chew through an array search pretty well once they get started.? [JDK-8180450_Secondary-super-cache_8316180-patch.xlsx](https://github.com/openjdk/jdk/files/12889239/JDK-8180450_Secondary-super-cache_8316180-patch.xlsx) I've attached the Excel file for your reference. If require any additional information or specific details, please feel free to let me know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1760714959 From stuefe at openjdk.org Fri Oct 13 05:29:14 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Oct 2023 05:29:14 GMT Subject: RFR: 8318058: Notify the jvm when the direct memory is oom In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 03:23:04 GMT, xpbob wrote: > Big data processes often experience situations where the direct memory oom process is alive but not serving properly. If the direct memory is oom, code can notify the jvm. Can bring the following benefits: > 1. Analysis of direct memory Java. Nio. DirectByteBuffer need heapdumps reference relations. Can be used directly HeapDumpOnOutOfMemoryError. > 2. In container environment, ExitOnOutOfMemoryError can be used to let the process that cannot provide services exit, so that the container can quickly pull up a new pod Undoubtedly useful, but there have been many discussions in the past about what does and does not constitute an OOM error, and IIRC, the stance of Oracle devs was "only if it is in java heap". Hence the missing OOM error when we cannot create threads, for instance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16176#issuecomment-1760881555 From alanb at openjdk.org Fri Oct 13 06:13:21 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 13 Oct 2023 06:13:21 GMT Subject: RFR: 8318058: Notify the jvm when the direct memory is oom In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 03:23:04 GMT, xpbob wrote: > Big data processes often experience situations where the direct memory oom process is alive but not serving properly. If the direct memory is oom, code can notify the jvm. Can bring the following benefits: > 1. Analysis of direct memory Java. Nio. DirectByteBuffer need heapdumps reference relations. Can be used directly HeapDumpOnOutOfMemoryError. > 2. In container environment, ExitOnOutOfMemoryError can be used to let the process that cannot provide services exit, so that the container can quickly pull up a new pod JDK-8294052 has some of the previous discussion on this issue. See also this thread where Man, Thomas, David and I discussed the topic: https://mail.openjdk.org/pipermail/nio-dev/2022-September/012119.html There were previous threads on the same topic, I can't find the links right now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16176#issuecomment-1760950115 From ihse at openjdk.org Fri Oct 13 12:02:44 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 13 Oct 2023 12:02:44 GMT Subject: RFR: 8318078: ADLC: pass ASSERT and PRODUCT flags In-Reply-To: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> References: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> Message-ID: On Fri, 13 Oct 2023 09:49:48 GMT, Emanuel Peter wrote: > @vnkozlov asked me to guard some debug AD file rules in `#ifdef ASSERT`. https://github.com/openjdk/jdk/pull/14785#discussion_r1349391130 > > We discovered that the `ASSERT` and `PRODUCT` are not yet passed to ADLC, and hence they are always considered `undefined`. Hence, all of these `ifdef` blocks would always be ignored. > > **Solution** > I added the flags to `make/hotspot/gensrc/GensrcAdlc.gmk`, just like in `make/hotspot/lib/JvmFlags.gmk`. > > As @erikj79 commented: we should probably unify this. But I leave that to the build team. > > **Testing** > With this code you can see what flags are passed to ADLC: > > --- a/src/hotspot/share/adlc/main.cpp > +++ b/src/hotspot/share/adlc/main.cpp > @@ -56,6 +56,11 @@ int main(int argc, char *argv[]) > // Check for proper arguments > if( argc == 1 ) usage(AD); // No arguments? Then print usage > > + for( int i = 1; i < argc; i++ ) { // For all arguments > + char *s = argv[i]; // Get option/filename > + fprintf(stderr, "ARGV[%d] %s\n", i, s); > + } > + > // Read command line arguments and file names > for( int i = 1; i < argc; i++ ) { // For all arguments > char *s = argv[i]; // Get option/filename > > > On `linux-x64` I get: > > ARGV[1] -q > ARGV[2] -T > ARGV[3] -DLINUX=1 > ARGV[4] -D_GNU_SOURCE=1 > ARGV[5] -g > ARGV[6] -DAMD64=1 > ARGV[7] -D_LP64=1 > ARGV[8] -DNDEBUG > ARGV[9] -DPRODUCT > > > And on `linux-x64-debug` I get: > > ARGV[1] -q > ARGV[2] -T > ARGV[3] -DLINUX=1 > ARGV[4] -D_GNU_SOURCE=1 > ARGV[5] -g > ARGV[6] -DAMD64=1 > ARGV[7] -D_LP64=1 > ARGV[8] -DASSERT > > > I verified that the `#ifdef` work as expected, by adding this code to `src/hotspot/cpu/x86/x86.ad`: > > #ifdef ASSERT > #ifdef PRODUCT > control > #endif > #endif > > #ifdef ASSERT > xxx > #endif > > #ifdef PRODUCT > yyy > #endif > > When compiling, I get complaints for `yyy` on `linux-x64` and for `xxx` on `linux-x64-debug`. But since `ASSERT` and `PRODUCT` never occur together, we never get complaints about `control`. > > **Running tier1-3 and stress testing ...** Looks good. I'd like to see someone from the hotspot team approve it as well. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16178#pullrequestreview-1676260669 From erikj at openjdk.org Fri Oct 13 13:22:25 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Fri, 13 Oct 2023 13:22:25 GMT Subject: RFR: 8318078: ADLC: pass ASSERT and PRODUCT flags In-Reply-To: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> References: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> Message-ID: On Fri, 13 Oct 2023 09:49:48 GMT, Emanuel Peter wrote: > @vnkozlov asked me to guard some debug AD file rules in `#ifdef ASSERT`. https://github.com/openjdk/jdk/pull/14785#discussion_r1349391130 > > We discovered that the `ASSERT` and `PRODUCT` are not yet passed to ADLC, and hence they are always considered `undefined`. Hence, all of these `ifdef` blocks would always be ignored. > > **Solution** > I added the flags to `make/hotspot/gensrc/GensrcAdlc.gmk`, just like in `make/hotspot/lib/JvmFlags.gmk`. > > As @erikj79 commented: we should probably unify this. But I leave that to the build team. > > **Testing** > With this code you can see what flags are passed to ADLC: > > --- a/src/hotspot/share/adlc/main.cpp > +++ b/src/hotspot/share/adlc/main.cpp > @@ -56,6 +56,11 @@ int main(int argc, char *argv[]) > // Check for proper arguments > if( argc == 1 ) usage(AD); // No arguments? Then print usage > > + for( int i = 1; i < argc; i++ ) { // For all arguments > + char *s = argv[i]; // Get option/filename > + fprintf(stderr, "ARGV[%d] %s\n", i, s); > + } > + > // Read command line arguments and file names > for( int i = 1; i < argc; i++ ) { // For all arguments > char *s = argv[i]; // Get option/filename > > > On `linux-x64` I get: > > ARGV[1] -q > ARGV[2] -T > ARGV[3] -DLINUX=1 > ARGV[4] -D_GNU_SOURCE=1 > ARGV[5] -g > ARGV[6] -DAMD64=1 > ARGV[7] -D_LP64=1 > ARGV[8] -DNDEBUG > ARGV[9] -DPRODUCT > > > And on `linux-x64-debug` I get: > > ARGV[1] -q > ARGV[2] -T > ARGV[3] -DLINUX=1 > ARGV[4] -D_GNU_SOURCE=1 > ARGV[5] -g > ARGV[6] -DAMD64=1 > ARGV[7] -D_LP64=1 > ARGV[8] -DASSERT > > > I verified that the `#ifdef` work as expected, by adding this code to `src/hotspot/cpu/x86/x86.ad`: > > #ifdef ASSERT > #ifdef PRODUCT > control > #endif > #endif > > #ifdef ASSERT > xxx > #endif > > #ifdef PRODUCT > yyy > #endif > > When compiling, I get complaints for `yyy` on `linux-x64` and for `xxx` on `linux-x64-debug`. But since `ASSERT` and `PRODUCT` never occur together, we never get complaints about `control`. > > **Running tier1-3 and stress testing ...** Marked as reviewed by erikj (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16178#pullrequestreview-1676490981 From pchilanomate at openjdk.org Fri Oct 13 14:29:21 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 13 Oct 2023 14:29:21 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes In-Reply-To: References: Message-ID: On Thu, 12 Oct 2023 14:03:20 GMT, Martin Doerr wrote: > Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. Looks good to me. src/hotspot/share/runtime/basicLock.cpp line 70: > 68: > 69: if (LockingMode == LM_LEGACY && displaced_header().is_neutral()) { > 70: // The object is locked and the resulting ObjectMonitor* will also be Isn't the set_displaced_header() below also unnecessary for other locking modes, i.e shouldn't it be `if (LockingMode != LM_LEGACY) { return; }`? ------------- PR Review: https://git.openjdk.org/jdk/pull/16165#pullrequestreview-1676607334 PR Review Comment: https://git.openjdk.org/jdk/pull/16165#discussion_r1358330805 From mdoerr at openjdk.org Fri Oct 13 14:29:25 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Oct 2023 14:29:25 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object I have an other interesting experiment: `make run-test TEST="vmTestbase/nsk/jdi/StepEvent" JTREG="VM_OPTIONS=-Xbatch"` reproduces the issue most of the time on PPC64. However, `make run-test TEST="vmTestbase/nsk/jdi/StepEvent" JTREG="VM_OPTIONS=-Xbatch -XX:AsyncDeflationInterval=0"` appears to run stable. Not sure if that's a coincidence or if the async deflation messes something up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1761602707 From ayang at openjdk.org Fri Oct 13 14:30:13 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 13 Oct 2023 14:30:13 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v21] In-Reply-To: References: Message-ID: <88WMBggE7v-4BZphBaKYrlmJ7uyu_fW7_4KKZU1ZSkg=.8594feb7-8961-41e5-8965-8bfa1ef22e73@github.com> On Thu, 12 Oct 2023 16:24:17 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Feedback Albert Thank you for the revision. Did some local testing; no issue found. src/hotspot/share/gc/parallel/psCardTable.hpp line 94: > 92: > 93: // Pre-scavenge support. > 94: // The pre-scavenge phase can overlap with scavenging. Is this obsolete? ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14846#pullrequestreview-1676623345 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1358340559 From rrich at openjdk.org Fri Oct 13 14:39:52 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 13 Oct 2023 14:39:52 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v21] In-Reply-To: <88WMBggE7v-4BZphBaKYrlmJ7uyu_fW7_4KKZU1ZSkg=.8594feb7-8961-41e5-8965-8bfa1ef22e73@github.com> References: <88WMBggE7v-4BZphBaKYrlmJ7uyu_fW7_4KKZU1ZSkg=.8594feb7-8961-41e5-8965-8bfa1ef22e73@github.com> Message-ID: On Fri, 13 Oct 2023 14:25:34 GMT, Albert Mingkun Yang wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback Albert > > src/hotspot/share/gc/parallel/psCardTable.hpp line 94: > >> 92: >> 93: // Pre-scavenge support. >> 94: // The pre-scavenge phase can overlap with scavenging. > > Is this obsolete? Oh sure. It's obsolete now. I removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1358347257 From rrich at openjdk.org Fri Oct 13 14:39:48 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 13 Oct 2023 14:39:48 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v22] In-Reply-To: References: Message-ID: <1_KC1smUe4FUpFs3u9GRAYc_RGtxMKFyc4SwnZWqJUk=.55658bac-4a3b-4e57-bdad-dbf2ec9b9170@github.com> > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Remove obsolete comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/d12e96e2..607f0c22 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=20-21 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Fri Oct 13 14:41:48 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 13 Oct 2023 14:41:48 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v21] In-Reply-To: References: Message-ID: <1s9_eK30_SkOiLxFIRRv5w_JEbmEz93C3zsZpNaYK0Q=.9c91e63a-9122-4e81-82b3-93104d9444a2@github.com> On Thu, 12 Oct 2023 16:24:17 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Feedback Albert Should https://bugs.openjdk.org/browse/JDK-8309960 be reverted? Better in a follow-up I guess. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1761632602 From mdoerr at openjdk.org Fri Oct 13 14:48:42 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Oct 2023 14:48:42 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v2] In-Reply-To: References: Message-ID: > Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Skip copying displaced header. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16165/files - new: https://git.openjdk.org/jdk/pull/16165/files/089a0820..fb112413 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16165&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16165&range=00-01 Stats: 31 lines in 2 files changed: 8 ins; 3 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/16165.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16165/head:pull/16165 PR: https://git.openjdk.org/jdk/pull/16165 From mdoerr at openjdk.org Fri Oct 13 14:48:44 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Oct 2023 14:48:44 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v2] In-Reply-To: References: Message-ID: <7kgLoKoKlR--amqX6KzgADC55XpBm5iOKM7FbaS95yw=.dac47071-f5bf-4eed-a006-96273ea69d09@github.com> On Fri, 13 Oct 2023 14:17:45 GMT, Patricio Chilano Mateo wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Skip copying displaced header. > > src/hotspot/share/runtime/basicLock.cpp line 70: > >> 68: >> 69: if (LockingMode == LM_LEGACY && displaced_header().is_neutral()) { >> 70: // The object is locked and the resulting ObjectMonitor* will also be > > Isn't the set_displaced_header() below also unnecessary for other locking modes, i.e shouldn't it be `if (LockingMode != LM_LEGACY) { return; }`? Correct. We can skip copying the displaced header completely (also for OSR). I've updated the PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16165#discussion_r1358361190 From ayang at openjdk.org Fri Oct 13 14:51:37 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 13 Oct 2023 14:51:37 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v21] In-Reply-To: <1s9_eK30_SkOiLxFIRRv5w_JEbmEz93C3zsZpNaYK0Q=.9c91e63a-9122-4e81-82b3-93104d9444a2@github.com> References: <1s9_eK30_SkOiLxFIRRv5w_JEbmEz93C3zsZpNaYK0Q=.9c91e63a-9122-4e81-82b3-93104d9444a2@github.com> Message-ID: <-tM40FGW10TWooaxzhFrtU7Xx9sQga4nhvZdUqDRLnQ=.dc86d37b-3502-43b4-8818-6ed13454d31b@github.com> On Fri, 13 Oct 2023 14:39:16 GMT, Richard Reingruber wrote: > Should https://bugs.openjdk.org/browse/JDK-8309960 be reverted? Better in a follow-up I guess. Better in its own PR, IMO. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1761646606 From igavrilin at openjdk.org Fri Oct 13 15:45:27 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Fri, 13 Oct 2023 15:45:27 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics Message-ID: Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. Tests: Performance tests on t-head board: With intrinsics: Benchmark (seed) Mode Cnt Score Error Units MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): Benchmark (seed) Mode Cnt Score Error Units MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms Regression tests: tier1, hotspot:tier2 on risc-v board. Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java index 6cd1353907e..0bee25366bf 100644 --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java @@ -143,12 +143,12 @@ public double ceilDouble() { @Benchmark public double copySignDouble() { - return Math.copySign(double81, doubleNegative12); + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); } @Benchmark public float copySignFloat() { - return Math.copySign(floatNegative99, float1); + return Math.copySign(floatNegative99, float1) + Math.copySign(eFloat, float1) + Math.copySign(eFloat, floatNegative99); } @Benchmark @@ -472,12 +472,12 @@ public float scalbFloatInt() { @Benchmark public double sigNumDouble() { - return Math.signum(double4Dot1); + return Math.signum(double4Dot1) + Math.signum(doubleNegative12) + Math.signum(double81); } @Benchmark public double signumFloat() { - return Math.signum(floatNegative99); + return Math.signum(floatNegative99) + Math.signum(float2) + Math.signum(float7); } @Benchmark ------------- Commit messages: - Implement copySign and signum intrinsics Changes: https://git.openjdk.org/jdk/pull/16186/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16186&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317971 Stats: 97 lines in 5 files changed: 96 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16186/head:pull/16186 PR: https://git.openjdk.org/jdk/pull/16186 From pchilanomate at openjdk.org Fri Oct 13 16:03:34 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 13 Oct 2023 16:03:34 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v2] In-Reply-To: References: Message-ID: <6NdhYSj2EsaRyDL4uEUkqVihqUGtYVryMhKD7TI5rw0=.7a0becd7-c938-450c-b55d-56d0eb18a8a2@github.com> On Fri, 13 Oct 2023 14:48:42 GMT, Martin Doerr wrote: >> Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Skip copying displaced header. Looks good to me, thanks. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16165#pullrequestreview-1676802176 From kvn at openjdk.org Fri Oct 13 17:26:12 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Oct 2023 17:26:12 GMT Subject: RFR: 8318078: ADLC: pass ASSERT and PRODUCT flags In-Reply-To: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> References: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> Message-ID: On Fri, 13 Oct 2023 09:49:48 GMT, Emanuel Peter wrote: > @vnkozlov asked me to guard some debug AD file rules in `#ifdef ASSERT`. https://github.com/openjdk/jdk/pull/14785#discussion_r1349391130 > > We discovered that the `ASSERT` and `PRODUCT` are not yet passed to ADLC, and hence they are always considered `undefined`. Hence, all of these `ifdef` blocks would always be ignored. > > **Solution** > I added the flags to `make/hotspot/gensrc/GensrcAdlc.gmk`, just like in `make/hotspot/lib/JvmFlags.gmk`. > > As @erikj79 commented: we should probably unify this. But I leave that to the build team. > > **Testing** > With this code you can see what flags are passed to ADLC: > > --- a/src/hotspot/share/adlc/main.cpp > +++ b/src/hotspot/share/adlc/main.cpp > @@ -56,6 +56,11 @@ int main(int argc, char *argv[]) > // Check for proper arguments > if( argc == 1 ) usage(AD); // No arguments? Then print usage > > + for( int i = 1; i < argc; i++ ) { // For all arguments > + char *s = argv[i]; // Get option/filename > + fprintf(stderr, "ARGV[%d] %s\n", i, s); > + } > + > // Read command line arguments and file names > for( int i = 1; i < argc; i++ ) { // For all arguments > char *s = argv[i]; // Get option/filename > > > On `linux-x64` I get: > > ARGV[1] -q > ARGV[2] -T > ARGV[3] -DLINUX=1 > ARGV[4] -D_GNU_SOURCE=1 > ARGV[5] -g > ARGV[6] -DAMD64=1 > ARGV[7] -D_LP64=1 > ARGV[8] -DNDEBUG > ARGV[9] -DPRODUCT > > > And on `linux-x64-debug` I get: > > ARGV[1] -q > ARGV[2] -T > ARGV[3] -DLINUX=1 > ARGV[4] -D_GNU_SOURCE=1 > ARGV[5] -g > ARGV[6] -DAMD64=1 > ARGV[7] -D_LP64=1 > ARGV[8] -DASSERT > > > I verified that the `#ifdef` work as expected, by adding this code to `src/hotspot/cpu/x86/x86.ad`: > > #ifdef ASSERT > #ifdef PRODUCT > control > #endif > #endif > > #ifdef ASSERT > xxx > #endif > > #ifdef PRODUCT > yyy > #endif > > When compiling, I get complaints for `yyy` on `linux-x64` and for `xxx` on `linux-x64-debug`. But since `ASSERT` and `PRODUCT` never occur together, we never get complaints about `control`. > > **Running tier1-3 and stress testing ...** make/hotspot/gensrc/GensrcAdlc.gmk line 138: > 136: # Set ASSERT, NDEBUG and PRODUCT flags just like in JvmFlags.gmk > 137: ifeq ($(DEBUG_LEVEL), release) > 138: ADLCFLAGS += -DNDEBUG May be you should also copy all comments from `JvmFlags.gmk` to avoid confusion because, for example, `NDEBUG` is not used directly in HotSpot code but it is needed "to disable uses of assert macro from ." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16178#discussion_r1358574163 From svkamath at openjdk.org Fri Oct 13 18:08:12 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Fri, 13 Oct 2023 18:08:12 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v8] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 22:05:08 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments, removed unused labels @vnkozlov, I have received two approvals for this PR. Could you kindly run this through your testing? Thanks for your time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15410#issuecomment-1761961395 From hgreule at openjdk.org Fri Oct 13 19:08:19 2023 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 13 Oct 2023 19:08:19 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v2] In-Reply-To: References: Message-ID: > See the bug description for more information. > > This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - whitespace - add test to verify all instance fields are present in heap dump - Merge remote-tracking branch 'upstream/master' into perf/fieldstream - whitespaces - Iterate fields forwards on thread dump ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16083/files - new: https://git.openjdk.org/jdk/pull/16083/files/0bcab694..63887422 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16083&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16083&range=00-01 Stats: 19028 lines in 522 files changed: 10287 ins; 4212 del; 4529 mod Patch: https://git.openjdk.org/jdk/pull/16083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16083/head:pull/16083 PR: https://git.openjdk.org/jdk/pull/16083 From hgreule at openjdk.org Fri Oct 13 19:08:20 2023 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 13 Oct 2023 19:08:20 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 In-Reply-To: References: Message-ID: On Thu, 12 Oct 2023 18:47:18 GMT, Alex Menkov wrote: > We have test library to parse hprof files in test/lib/jdk/test/lib/hprof You can look at test/hotspot/jtreg/serviceability/jvmti/vthread/HeapDump/VThreadInHeapDump.java as an example of a test which generates heap dump for target application and verifies it contains expected data. Thanks, I added a test case that ensures that the instance fields are all present. This is a very basic test, but it covers super types and also makes sure the order of supertypes is correct. If you want me to add something, please let me know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16083#issuecomment-1762048855 From dlong at openjdk.org Fri Oct 13 20:02:09 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 13 Oct 2023 20:02:09 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v2] In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 14:48:42 GMT, Martin Doerr wrote: >> Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Skip copying displaced header. In debug builds, it might be useful to write an illegal value into the displaced header instead of just skipping it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16165#issuecomment-1762131090 From dlong at openjdk.org Fri Oct 13 21:15:26 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 13 Oct 2023 21:15:26 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: <0sUz142mQSpOZo16aXReYJIC5DifiEeqA8XNsT1LDww=.6e36c9a4-dcd9-46c6-bd54-485af0e01349@github.com> On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object If the locks are inflated then you won't hit the top of stack check in the fast path. Can you reproduce the StepEvent problem with C1 using -XX:TieredStopAtLevel=3? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1762210306 From mdoerr at openjdk.org Fri Oct 13 21:33:32 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Oct 2023 21:33:32 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v3] In-Reply-To: References: Message-ID: > Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Write values into displaced header slots in debug builds. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16165/files - new: https://git.openjdk.org/jdk/pull/16165/files/fb112413..3f886a86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16165&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16165&range=01-02 Stats: 10 lines in 2 files changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16165.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16165/head:pull/16165 PR: https://git.openjdk.org/jdk/pull/16165 From mdoerr at openjdk.org Fri Oct 13 21:33:41 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Oct 2023 21:33:41 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v2] In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 14:48:42 GMT, Martin Doerr wrote: >> Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Skip copying displaced header. That may help debugging. I've added illegal values: 0x 05A looks a bit like OSR and 0x DE0BD sounds a bit like deopt :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16165#issuecomment-1762181428 From dlong at openjdk.org Fri Oct 13 23:36:00 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 13 Oct 2023 23:36:00 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v3] In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 21:33:32 GMT, Martin Doerr wrote: >> Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Write values into displaced header slots in debug builds. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16165#pullrequestreview-1677531085 From vlivanov at openjdk.org Fri Oct 13 23:50:51 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 13 Oct 2023 23:50:51 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v3] In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 21:33:32 GMT, Martin Doerr wrote: >> Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Write values into displaced header slots in debug builds. src/hotspot/share/runtime/basicLock.cpp line 88: > 86: #ifdef ASSERT > 87: else { > 88: dest->set_displaced_header(markWord(0xde0bd000)); // eye-catcher Such constants for debugging purposes are usually defined in a central place: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/globalDefinitions.hpp#L1023 It makes sense to follow that practice here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16165#discussion_r1358961376 From amenkov at openjdk.org Sat Oct 14 01:20:12 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Sat, 14 Oct 2023 01:20:12 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 19:02:37 GMT, Hannes Greule wrote: > Thanks, I added a test case that ensures that the instance fields are all present. This is a very basic test, but it covers super types and also makes sure the order of supertypes is correct. If you want me to add something, please let me know. Could you add testcases for corner cases: no fields: interface I1 { } class NoFields1 { } class NoFields2 extends NoFields1 implements I1 { } no parent fields: class NoParentFields extends NoFields1 implements I1 { int i1 = 1; int i2 = 2; } only parent fields: class Parent1 { int i3 = 3; } class OnlyParentFields extends Parent1 { } ------------- PR Comment: https://git.openjdk.org/jdk/pull/16083#issuecomment-1762460816 From amenkov at openjdk.org Sat Oct 14 01:20:14 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Sat, 14 Oct 2023 01:20:14 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v2] In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 19:08:19 GMT, Hannes Greule wrote: >> See the bug description for more information. >> >> This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. > > Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - whitespace > - add test to verify all instance fields are present in heap dump > - Merge remote-tracking branch 'upstream/master' into perf/fieldstream > - whitespaces > - Iterate fields forwards on thread dump test/hotspot/jtreg/serviceability/HeapDump/FieldsInInstanceTest.java line 124: > 122: > 123: Iterable objects = snapshot.getThings()::asIterator; > 124: for (JavaHeapObject heapObj : objects) { Instead if iteration over all dumped objects it would be faster to: ` (JavaObject)snapshot.findClass(className).getInstances(false).nextElement(); ` test/hotspot/jtreg/serviceability/HeapDump/FieldsInInstanceTest.java line 126: > 124: for (JavaHeapObject heapObj : objects) { > 125: if (heapObj instanceof JavaObject javaObj) { > 126: if (javaObj.getClazz().getName().endsWith("$B")) { Suggestion: if (javaObj.getClazz().getName().equals(FieldsInInstanceTarg.B.class.getName())) { test/hotspot/jtreg/serviceability/HeapDump/FieldsInInstanceTest.java line 139: > 137: Asserts.assertTrue(asString.contains("3"), "value for field A.a not found"); > 138: Asserts.assertTrue(asString.contains("Field"), "value for field A.s not found"); > 139: System.out.println(fields); Suggestion: log(fields); And I think it makes sense to print field values before asserts (so they appear in the log if some assertion throw an exception) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16083#discussion_r1359022900 PR Review Comment: https://git.openjdk.org/jdk/pull/16083#discussion_r1359000757 PR Review Comment: https://git.openjdk.org/jdk/pull/16083#discussion_r1359005734 From mdoerr at openjdk.org Sat Oct 14 09:46:35 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 14 Oct 2023 09:46:35 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v4] In-Reply-To: References: Message-ID: > Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Move constants to globalDefinitions.hpp. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16165/files - new: https://git.openjdk.org/jdk/pull/16165/files/3f886a86..5f791772 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16165&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16165&range=02-03 Stats: 4 lines in 3 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16165.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16165/head:pull/16165 PR: https://git.openjdk.org/jdk/pull/16165 From mdoerr at openjdk.org Sat Oct 14 09:46:37 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 14 Oct 2023 09:46:37 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v3] In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 23:47:43 GMT, Vladimir Ivanov wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Write values into displaced header slots in debug builds. > > src/hotspot/share/runtime/basicLock.cpp line 88: > >> 86: #ifdef ASSERT >> 87: else { >> 88: dest->set_displaced_header(markWord(0xde0bd000)); // eye-catcher > > Such constants for debugging purposes are usually defined in a central place: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/globalDefinitions.hpp#L1023 > > It makes sense to follow that practice here. Done. Thanks! We may remove the displaced header slots and all related code when removing LM_LEGACY. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16165#discussion_r1359288837 From duke at openjdk.org Sat Oct 14 09:58:40 2023 From: duke at openjdk.org (Francesco Nigro) Date: Sat, 14 Oct 2023 09:58:40 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v10] In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 03:43:26 GMT, nahidasu wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Touchup benchmark metadata > > Hello, I'm Nahida from **Derek?s** team. We've conducted extensive testing on the patch using the specified test cases for both Shipilev and Mulugeta. We observed a similar trend across both benchmarks: as the number of SecondarySuperMissBackoff increases, the average time decreases. We ran both our Mulugeta test as well as the JMH test supplied in the patch with a larger machine with thread counts of 18, 60, and 240. For the Mulugeta?s test case, we explored three different scenarios: > > ? Equal Distribution (50% each): This represents the worst-case scenario. > ? Exclusive Interface Calls (0%): Signifying the best-case scenario. > ? 95% Same Interface, 5% Other Interface: Where 95% of the time is spent calling the same interface and 5% of the time on the other one. > > Here, sharing **Derek?s** perspective on this data: > > ?We see that it takes very little contention (5%) for the default behavior to perform poorly, and in the uncontended case there is no downside for using a large Backoff value. So backoff values of 1,000 or even 10,000 seem reasonable. This makes sense, because in the perfect world for the secondary supercache there is no update to the secondary supercache. Note in a previous version of the Mulugeta benchmark we tried increasing the length of the interface array being searched, and it made little performance impact until the interface depth got silly (100+). HW prefetch, OOO cores etc can chew through an array search pretty well once they get started.? > [JDK-8180450_Secondary-super-cache_8316180-patch.xlsx](https://github.com/openjdk/jdk/files/12889239/JDK-8180450_Secondary-super-cache_8316180-patch.xlsx) > > > I've attached the Excel file for your reference. If require any additional information or specific details, please feel free to let me know. @nahidasu Hi, I have quickly looked at the data of the benchmark from Mulugeta and I see it stop at C1 comp level. I believe we should have a benchmark where C2 kicks-in too, because it likely won't have the amortization due to the stub call, hence increasing the contention (and by consequence can draw different conclusions/give another data point on the effectiveness of the backoff values). The downside of using C2 is instead due to a nice improvement made by @rwestrel ie https://github.com/openjdk/jdk/pull/14375, which can remove (I didn't checked the Mulugeta benchmarking code yet) some of the checks thanks to a bimorphic guard check; meaning we should pollute the type profile till making sure C2 fully perform the type check, and keep on using the last stable bet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1762782206 From mdoerr at openjdk.org Sat Oct 14 10:04:42 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 14 Oct 2023 10:04:42 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: <8F5VFYFxV4EJOul45FzuQNAX568j3tTh3HTFALjr-cY=.9dfd6bed-9c92-431b-87f2-e82afcaa4da8@github.com> On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object I guess that in the error scenario, the locks got async deflated before we run into the issue (async deflation not switched off by -XX:AsyncDeflationInterval=0). Could be that they stay inflated when I use that switch and the unlock order is not checked. No, I can't reproduce the issue with -XX:TieredStopAtLevel=3 (or 1). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1762783344 From hgreule at openjdk.org Sat Oct 14 20:12:25 2023 From: hgreule at openjdk.org (Hannes Greule) Date: Sat, 14 Oct 2023 20:12:25 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v3] In-Reply-To: References: Message-ID: > See the bug description for more information. > > This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: add more tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16083/files - new: https://git.openjdk.org/jdk/pull/16083/files/63887422..26f5dcdd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16083&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16083&range=01-02 Stats: 65 lines in 1 file changed: 41 ins; 4 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/16083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16083/head:pull/16083 PR: https://git.openjdk.org/jdk/pull/16083 From hgreule at openjdk.org Sat Oct 14 20:17:31 2023 From: hgreule at openjdk.org (Hannes Greule) Date: Sat, 14 Oct 2023 20:17:31 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v4] In-Reply-To: References: Message-ID: > See the bug description for more information. > > This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: reword -> initial klass ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16083/files - new: https://git.openjdk.org/jdk/pull/16083/files/26f5dcdd..f303a227 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16083&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16083&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16083/head:pull/16083 PR: https://git.openjdk.org/jdk/pull/16083 From hgreule at openjdk.org Sat Oct 14 20:17:32 2023 From: hgreule at openjdk.org (Hannes Greule) Date: Sat, 14 Oct 2023 20:17:32 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 In-Reply-To: References: Message-ID: On Sat, 14 Oct 2023 01:16:26 GMT, Alex Menkov wrote: > > Thanks, I added a test case that ensures that the instance fields are all present. This is a very basic test, but it covers super types and also makes sure the order of supertypes is correct. If you want me to add something, please let me know. > > Could you add testcases for corner cases: no fields: interface I1 { } class NoFields1 { } class NoFields2 extends NoFields1 implements I1 { } > > no parent fields: class NoParentFields extends NoFields1 implements I1 { int i1 = 1; int i2 = 2; } > > only parent fields: class Parent1 { int i3 = 3; } class OnlyParentFields extends Parent1 { } Done. I also added a test case where a class class in the "middle" class has no fields. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16083#issuecomment-1763167022 From dholmes at openjdk.org Sun Oct 15 23:17:47 2023 From: dholmes at openjdk.org (David Holmes) Date: Sun, 15 Oct 2023 23:17:47 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: On Mon, 9 Oct 2023 18:38:24 GMT, Patricio Chilano Mateo wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - add comment to tests >> - use driver + @requires vm.flagless > > @dholmes-ora are you okay with the last version? @pchilano sorry you were waiting for me. I'm not familiar enough with the Aarch64 code to Review it. My comments were just in passing on the shared code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15972#issuecomment-1763532866 From kbarrett at openjdk.org Mon Oct 16 01:29:32 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 16 Oct 2023 01:29:32 GMT Subject: RFR: 8314258: checked_cast doesn't properly check some cases Message-ID: Please review this improvement to the `checked_cast` utility. checked_cast was added by JDK-8255544 to permit silencing of certain compiler warnings (such as from gcc's -Wconversion) for narrowing conversions when the value is "known" to be safely convertible. It provides debug-only runtime verification that the conversion preserves the value while changing the type. There has been a recent effort to apply checked_cast to eliminate -Wconversion warnings, with the eventual goal of turning on such warnings by default - see JDK-8135181. The existing implementation checks that the value is unchanged by a round-trip conversion, and has no restrictions on the arguments. There are several problems with this. (1) There are some cases where conversion of an integral value to a different integral type may pass the check, even though the value isn't in the range of the destination type. (2) Floating point to integral conversions are often intended to discard the fractional part. But that won't pass the round-trip conversion test, making checked_cast mostly useless for such conversions. (3) Integral to floating point conversions are often intended to be indifferent to loss of precision. But again, that won't pass the round-trip conversion test, making checked_cast mostly useless for such conversions. This change to checked_cast supports integral to integral conversions, but not conversions involving floating point types. The intent is that we'll use "-Wconversion -Wno-float-conversion" instead of -Wconversion alone. If/when we later want to enable -Wfloat-conversion, we can either extend checked_cast for that purpose, or probably better, add new functions tailored for the various use-cases. It also supports enum to integral conversions, mostly for compatibility with old code that uses class-scoped enums instead of class-scoped static const integral members, to work around ancient broken compilers. We still have a lot of such code. This new checked_cast ensures (in debugging builds) that the value being converted is in the range of the destination type. It does so while avoiding tautological comparisons, as some versions of some compilers may warn about such. Note that this means it can also be used to suppress -Wsign-conversion warnings (which are not included in -Wconversion when compiling C++), which we might explore enabling in the future. It also verifies a runtime check is needed, producing a compile-time error if not. Unnecessary checked_casts make the code harder to understand. This will alert a developer that a change is rendering a checked_cast unnecessary, so it can be removed. This compile-time check can be suppressed on a per-call basis, as there are cases where a runtime check might only sometimes be needed. Aside: Using C++17 if-constexpr would eliminate the metaprogramming needed to implement the tautological comparison avoidance and the unnecessary checked_cast error. The resulting implementation would be less than 1/3 the size from this proposal, and easier to understand. Maybe later... This change removes a small number of unnecessary checked_casts, found by the check for such. It also suppresses that check in a few call sites where that's needed. Finally, it removes a call involving floating point. Testing: New gtests for checked_cast. mach5 tier1-5 OpenJDK GHA Sanity Checks ------------- Commit messages: - remove float checked_cast - suppress tautological conversion failures - improved checked_cast - remove unneeded checked_casts Changes: https://git.openjdk.org/jdk/pull/16005/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16005&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314258 Stats: 397 lines in 10 files changed: 374 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/16005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16005/head:pull/16005 PR: https://git.openjdk.org/jdk/pull/16005 From dlong at openjdk.org Mon Oct 16 01:29:38 2023 From: dlong at openjdk.org (Dean Long) Date: Mon, 16 Oct 2023 01:29:38 GMT Subject: RFR: 8314258: checked_cast doesn't properly check some cases In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 03:10:29 GMT, Kim Barrett wrote: > Please review this improvement to the `checked_cast` utility. > > checked_cast was added by JDK-8255544 to permit silencing of certain compiler > warnings (such as from gcc's -Wconversion) for narrowing conversions when the > value is "known" to be safely convertible. It provides debug-only runtime > verification that the conversion preserves the value while changing the type. > > There has been a recent effort to apply checked_cast to eliminate -Wconversion > warnings, with the eventual goal of turning on such warnings by default - see > JDK-8135181. > > The existing implementation checks that the value is unchanged by a round-trip > conversion, and has no restrictions on the arguments. There are several > problems with this. > > (1) There are some cases where conversion of an integral value to a different > integral type may pass the check, even though the value isn't in the range of > the destination type. > > (2) Floating point to integral conversions are often intended to discard the > fractional part. But that won't pass the round-trip conversion test, making > checked_cast mostly useless for such conversions. > > (3) Integral to floating point conversions are often intended to be > indifferent to loss of precision. But again, that won't pass the round-trip > conversion test, making checked_cast mostly useless for such conversions. > > This change to checked_cast supports integral to integral conversions, but not > conversions involving floating point types. The intent is that we'll use > "-Wconversion -Wno-float-conversion" instead of -Wconversion alone. If/when > we later want to enable -Wfloat-conversion, we can either extend checked_cast > for that purpose, or probably better, add new functions tailored for the > various use-cases. > > It also supports enum to integral conversions, mostly for compatibility with > old code that uses class-scoped enums instead of class-scoped static const > integral members, to work around ancient broken compilers. We still have a > lot of such code. > > This new checked_cast ensures (in debugging builds) that the value being > converted is in the range of the destination type. It does so while avoiding > tautological comparisons, as some versions of some compilers may warn about > such. Note that this means it can also be used to suppress -Wsign-conversion > warnings (which are not included in -Wconversion when compiling C++), which we > might explore enabling in the future. > > It also verifies a runtime check is needed, producing a compile-time error if > not. Unnecessary checked_cast... src/hotspot/share/compiler/oopMap.hpp line 115: > 113: stream->write_int(value()); > 114: if(is_callee_saved() || is_derived_oop()) { > 115: stream->write_int(checked_cast(content_reg()->value())); int --> juint, we need checked_cast here because of -Wsign-conversion src/hotspot/share/compiler/oopMap.hpp line 115: > 113: stream->write_int(value()); > 114: if(is_callee_saved() || is_derived_oop()) { > 115: stream->write_int(content_reg()->value()); Suggestion: stream->write_int(checked_cast(content_reg()->value())); src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 192: > 190: // we would grow again quickly. > 191: const float WantedLoadFactor = 0.5; > 192: assert((current_size / WantedLoadFactor) <= SIZE_MAX, "table overflow"); Surprisingly, this might not work. See https://bugs.openjdk.org/browse/JDK-8287052. src/hotspot/share/gc/parallel/mutableNUMASpace.cpp line 712: > 710: os::page_info page_expected, page_found; > 711: page_expected.size = page_size; > 712: page_expected.lgrp_id = lgrp_id(); Suggestion: page_expected.lgrp_id = checked_cast(lgrp_id()); uint --> int Or maybe fix the types to be the same. src/hotspot/share/utilities/align.hpp line 75: > 73: template::value)> > 74: constexpr T align_up(T size, A alignment) { > 75: T adjusted = checked_cast(size + alignment_mask(alignment)); I guess the checked_cast is needed for sizeof(T) < sizeof(int), but there is still a possible signed overflow UB here. src/hotspot/share/utilities/checkedCast.hpp line 55: > 53: static constexpr bool check_range(From from) { > 54: To to_max = std::numeric_limits::max(); > 55: return from <= static_cast(to_max); Don't we need to check that From is big enough to contain to_max? Are we allowed to use checked_cast for signed to unsigned widening? checked_cast // fail if source is negative src/hotspot/share/utilities/checkedCast.hpp line 100: > 98: static_assert(permit_tautology || is_narrowing, "tautological checked_cast"); > 99: constexpr bool operator()(From from) const { > 100: return !is_narrowing || (static_cast(static_cast(from)) == from); OK, so we are allowing compiler-specific behavior now? Does that mean we can get rid of the weird-looking `reinterpret_cast(ures)` in `JAVA_INTEGER_OP`? src/hotspot/share/utilities/checkedCast.hpp line 139: > 137: return (from >= 0); > 138: } > 139: }; So instead of checked_cast or checked_cast, callers should check from < 0 and then do a static_cast<>? It might be better to allow checked_cast to support non-narrowed signed to unsigned, because I think the alternative is more error-prone. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1359606831 PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1359617905 PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1359626718 PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1359617358 PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1359650061 PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1359652902 PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1359656238 PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1359658456 From kbarrett at openjdk.org Mon Oct 16 01:44:17 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 16 Oct 2023 01:44:17 GMT Subject: RFR: 8314258: checked_cast doesn't properly check some cases In-Reply-To: References: Message-ID: On Sat, 14 Oct 2023 19:28:20 GMT, Dean Long wrote: >> Please review this improvement to the `checked_cast` utility. >> >> checked_cast was added by JDK-8255544 to permit silencing of certain compiler >> warnings (such as from gcc's -Wconversion) for narrowing conversions when the >> value is "known" to be safely convertible. It provides debug-only runtime >> verification that the conversion preserves the value while changing the type. >> >> There has been a recent effort to apply checked_cast to eliminate -Wconversion >> warnings, with the eventual goal of turning on such warnings by default - see >> JDK-8135181. >> >> The existing implementation checks that the value is unchanged by a round-trip >> conversion, and has no restrictions on the arguments. There are several >> problems with this. >> >> (1) There are some cases where conversion of an integral value to a different >> integral type may pass the check, even though the value isn't in the range of >> the destination type. >> >> (2) Floating point to integral conversions are often intended to discard the >> fractional part. But that won't pass the round-trip conversion test, making >> checked_cast mostly useless for such conversions. >> >> (3) Integral to floating point conversions are often intended to be >> indifferent to loss of precision. But again, that won't pass the round-trip >> conversion test, making checked_cast mostly useless for such conversions. >> >> This change to checked_cast supports integral to integral conversions, but not >> conversions involving floating point types. The intent is that we'll use >> "-Wconversion -Wno-float-conversion" instead of -Wconversion alone. If/when >> we later want to enable -Wfloat-conversion, we can either extend checked_cast >> for that purpose, or probably better, add new functions tailored for the >> various use-cases. >> >> It also supports enum to integral conversions, mostly for compatibility with >> old code that uses class-scoped enums instead of class-scoped static const >> integral members, to work around ancient broken compilers. We still have a >> lot of such code. >> >> This new checked_cast ensures (in debugging builds) that the value being >> converted is in the range of the destination type. It does so while avoiding >> tautological comparisons, as some versions of some compilers may warn about >> such. Note that this means it can also be used to suppress -Wsign-conversion >> warnings (which are not included in -Wconversion when compiling C++), which we >> might explore enabling in the future. >> >> It also verifi... > > src/hotspot/share/gc/parallel/mutableNUMASpace.cpp line 712: > >> 710: os::page_info page_expected, page_found; >> 711: page_expected.size = page_size; >> 712: page_expected.lgrp_id = lgrp_id(); > > Suggestion: > > page_expected.lgrp_id = checked_cast(lgrp_id()); > > uint --> int > Or maybe fix the types to be the same. It seems like there are some on-going changes occurring in this area. `lgrp_id()` was recently changed from returning `int` (so why did we have a checked_cast to uint at all?) to uint, rendering the existing checked_cast unnecessary. As you note, we'll now need to do something about the mismatch between the function's result type and the page_info member type. I'm going to leave that to folks working on getting `-Wsign-conversion` working, whether it's by adding a new checked_cast here or changing the types to be the same. I think that kind of change is out of scope for this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1360011619 From kbarrett at openjdk.org Mon Oct 16 01:51:19 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 16 Oct 2023 01:51:19 GMT Subject: RFR: 8314258: checked_cast doesn't properly check some cases In-Reply-To: References: Message-ID: On Sat, 14 Oct 2023 19:22:25 GMT, Dean Long wrote: >> Please review this improvement to the `checked_cast` utility. >> >> checked_cast was added by JDK-8255544 to permit silencing of certain compiler >> warnings (such as from gcc's -Wconversion) for narrowing conversions when the >> value is "known" to be safely convertible. It provides debug-only runtime >> verification that the conversion preserves the value while changing the type. >> >> There has been a recent effort to apply checked_cast to eliminate -Wconversion >> warnings, with the eventual goal of turning on such warnings by default - see >> JDK-8135181. >> >> The existing implementation checks that the value is unchanged by a round-trip >> conversion, and has no restrictions on the arguments. There are several >> problems with this. >> >> (1) There are some cases where conversion of an integral value to a different >> integral type may pass the check, even though the value isn't in the range of >> the destination type. >> >> (2) Floating point to integral conversions are often intended to discard the >> fractional part. But that won't pass the round-trip conversion test, making >> checked_cast mostly useless for such conversions. >> >> (3) Integral to floating point conversions are often intended to be >> indifferent to loss of precision. But again, that won't pass the round-trip >> conversion test, making checked_cast mostly useless for such conversions. >> >> This change to checked_cast supports integral to integral conversions, but not >> conversions involving floating point types. The intent is that we'll use >> "-Wconversion -Wno-float-conversion" instead of -Wconversion alone. If/when >> we later want to enable -Wfloat-conversion, we can either extend checked_cast >> for that purpose, or probably better, add new functions tailored for the >> various use-cases. >> >> It also supports enum to integral conversions, mostly for compatibility with >> old code that uses class-scoped enums instead of class-scoped static const >> integral members, to work around ancient broken compilers. We still have a >> lot of such code. >> >> This new checked_cast ensures (in debugging builds) that the value being >> converted is in the range of the destination type. It does so while avoiding >> tautological comparisons, as some versions of some compilers may warn about >> such. Note that this means it can also be used to suppress -Wsign-conversion >> warnings (which are not included in -Wconversion when compiling C++), which we >> might explore enabling in the future. >> >> It also verifi... > > src/hotspot/share/compiler/oopMap.hpp line 115: > >> 113: stream->write_int(value()); >> 114: if(is_callee_saved() || is_derived_oop()) { >> 115: stream->write_int(checked_cast(content_reg()->value())); > > int --> juint, we need checked_cast here because of -Wsign-conversion The result of `content_reg()->value()` is of type `int`, making this checked_cast unnecessary. Dealing with possibly missing checked_casts to suppress warnings is out of scope for this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1360014254 From kbarrett at openjdk.org Mon Oct 16 02:08:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 16 Oct 2023 02:08:36 GMT Subject: RFR: 8314258: checked_cast doesn't properly check some cases In-Reply-To: References: Message-ID: On Sat, 14 Oct 2023 19:29:44 GMT, Dean Long wrote: >> Please review this improvement to the `checked_cast` utility. >> >> checked_cast was added by JDK-8255544 to permit silencing of certain compiler >> warnings (such as from gcc's -Wconversion) for narrowing conversions when the >> value is "known" to be safely convertible. It provides debug-only runtime >> verification that the conversion preserves the value while changing the type. >> >> There has been a recent effort to apply checked_cast to eliminate -Wconversion >> warnings, with the eventual goal of turning on such warnings by default - see >> JDK-8135181. >> >> The existing implementation checks that the value is unchanged by a round-trip >> conversion, and has no restrictions on the arguments. There are several >> problems with this. >> >> (1) There are some cases where conversion of an integral value to a different >> integral type may pass the check, even though the value isn't in the range of >> the destination type. >> >> (2) Floating point to integral conversions are often intended to discard the >> fractional part. But that won't pass the round-trip conversion test, making >> checked_cast mostly useless for such conversions. >> >> (3) Integral to floating point conversions are often intended to be >> indifferent to loss of precision. But again, that won't pass the round-trip >> conversion test, making checked_cast mostly useless for such conversions. >> >> This change to checked_cast supports integral to integral conversions, but not >> conversions involving floating point types. The intent is that we'll use >> "-Wconversion -Wno-float-conversion" instead of -Wconversion alone. If/when >> we later want to enable -Wfloat-conversion, we can either extend checked_cast >> for that purpose, or probably better, add new functions tailored for the >> various use-cases. >> >> It also supports enum to integral conversions, mostly for compatibility with >> old code that uses class-scoped enums instead of class-scoped static const >> integral members, to work around ancient broken compilers. We still have a >> lot of such code. >> >> This new checked_cast ensures (in debugging builds) that the value being >> converted is in the range of the destination type. It does so while avoiding >> tautological comparisons, as some versions of some compilers may warn about >> such. Note that this means it can also be used to suppress -Wsign-conversion >> warnings (which are not included in -Wconversion when compiling C++), which we >> might explore enabling in the future. >> >> It also verifi... > > src/hotspot/share/compiler/oopMap.hpp line 115: > >> 113: stream->write_int(value()); >> 114: if(is_callee_saved() || is_derived_oop()) { >> 115: stream->write_int(content_reg()->value()); > > Suggestion: > > stream->write_int(checked_cast(content_reg()->value())); Again here, just removing existing unnecessary cast. Dealing with any further mismatch is out of scope for this PR. > src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 192: > >> 190: // we would grow again quickly. >> 191: const float WantedLoadFactor = 0.5; >> 192: assert((current_size / WantedLoadFactor) <= SIZE_MAX, "table overflow"); > > Surprisingly, this might not work. See https://bugs.openjdk.org/browse/JDK-8287052. It looks like for clang we should use -Wimplicit-int-conversion instead of gcc's "-Wconversion -fno-float-conversion". clang seems to have a much richer set of warning controls in this area than does gcc. It looks like for clang we should use -Wimplicit-int-conversion instead of gcc's "-Wconversion -fno-float-conversion". clang seems to have a much richer set of warning controls in this area than does gcc. That way we don't implicitly get -Wimplicit-int-float-conversion, which is what is triggering the warning mentioned in JDK-8287052. In this case the loss of precision leading to that warning does not seem important. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1360021515 PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1360020983 From duke at openjdk.org Mon Oct 16 02:15:32 2023 From: duke at openjdk.org (xpbob) Date: Mon, 16 Oct 2023 02:15:32 GMT Subject: Withdrawn: 8318058: Notify the jvm when the direct memory is oom In-Reply-To: References: Message-ID: <0FMEIyZwzX0DoJNG8hnJpN0ey6Ox_SjPLK0gJ1COJ3A=.21bbcd0a-ded3-4a66-b13e-c190f094b3ef@github.com> On Fri, 13 Oct 2023 03:23:04 GMT, xpbob wrote: > Big data processes often experience situations where the direct memory oom process is alive but not serving properly. If the direct memory is oom, code can notify the jvm. Can bring the following benefits: > 1. Analysis of direct memory Java. Nio. DirectByteBuffer need heapdumps reference relations. Can be used directly HeapDumpOnOutOfMemoryError. > 2. In container environment, ExitOnOutOfMemoryError can be used to let the process that cannot provide services exit, so that the container can quickly pull up a new pod This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16176 From kbarrett at openjdk.org Mon Oct 16 02:45:55 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 16 Oct 2023 02:45:55 GMT Subject: RFR: 8314258: checked_cast doesn't properly check some cases In-Reply-To: References: Message-ID: On Sat, 14 Oct 2023 21:29:21 GMT, Dean Long wrote: >> Please review this improvement to the `checked_cast` utility. >> >> checked_cast was added by JDK-8255544 to permit silencing of certain compiler >> warnings (such as from gcc's -Wconversion) for narrowing conversions when the >> value is "known" to be safely convertible. It provides debug-only runtime >> verification that the conversion preserves the value while changing the type. >> >> There has been a recent effort to apply checked_cast to eliminate -Wconversion >> warnings, with the eventual goal of turning on such warnings by default - see >> JDK-8135181. >> >> The existing implementation checks that the value is unchanged by a round-trip >> conversion, and has no restrictions on the arguments. There are several >> problems with this. >> >> (1) There are some cases where conversion of an integral value to a different >> integral type may pass the check, even though the value isn't in the range of >> the destination type. >> >> (2) Floating point to integral conversions are often intended to discard the >> fractional part. But that won't pass the round-trip conversion test, making >> checked_cast mostly useless for such conversions. >> >> (3) Integral to floating point conversions are often intended to be >> indifferent to loss of precision. But again, that won't pass the round-trip >> conversion test, making checked_cast mostly useless for such conversions. >> >> This change to checked_cast supports integral to integral conversions, but not >> conversions involving floating point types. The intent is that we'll use >> "-Wconversion -Wno-float-conversion" instead of -Wconversion alone. If/when >> we later want to enable -Wfloat-conversion, we can either extend checked_cast >> for that purpose, or probably better, add new functions tailored for the >> various use-cases. >> >> It also supports enum to integral conversions, mostly for compatibility with >> old code that uses class-scoped enums instead of class-scoped static const >> integral members, to work around ancient broken compilers. We still have a >> lot of such code. >> >> This new checked_cast ensures (in debugging builds) that the value being >> converted is in the range of the destination type. It does so while avoiding >> tautological comparisons, as some versions of some compilers may warn about >> such. Note that this means it can also be used to suppress -Wsign-conversion >> warnings (which are not included in -Wconversion when compiling C++), which we >> might explore enabling in the future. >> >> It also verifi... > > src/hotspot/share/utilities/align.hpp line 75: > >> 73: template::value)> >> 74: constexpr T align_up(T size, A alignment) { >> 75: T adjusted = checked_cast(size + alignment_mask(alignment)); > > I guess the checked_cast is needed for sizeof(T) < sizeof(int), but there is still a possible signed overflow UB here. Yeah, looks like align_up has always had that potential overflow problem. https://bugs.openjdk.org/browse/JDK-8318127 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1360039190 From duke at openjdk.org Mon Oct 16 03:22:04 2023 From: duke at openjdk.org (xpbob) Date: Mon, 16 Oct 2023 03:22:04 GMT Subject: RFR: 8318058: Notify the jvm when the direct memory is oom In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 05:26:37 GMT, Thomas Stuefe wrote: >> Big data processes often experience situations where the direct memory oom process is alive but not serving properly. If the direct memory is oom, code can notify the jvm. Can bring the following benefits: >> 1. Analysis of direct memory Java. Nio. DirectByteBuffer need heapdumps reference relations. Can be used directly HeapDumpOnOutOfMemoryError. >> 2. In container environment, ExitOnOutOfMemoryError can be used to let the process that cannot provide services exit, so that the container can quickly pull up a new pod > > Undoubtedly useful, but there have been many discussions in the past about what does and does not constitute an OOM error, and IIRC, the stance of Oracle devs was "only if it is in java heap". Hence the missing OOM error when we cannot create threads, for instance. @tstuefe @AlanBateman Thanks for for sharing this information, We look forward to other solutions to this problem in the future ------------- PR Comment: https://git.openjdk.org/jdk/pull/16176#issuecomment-1763663559 From kbarrett at openjdk.org Mon Oct 16 04:38:52 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 16 Oct 2023 04:38:52 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v7] In-Reply-To: References: Message-ID: > Please review this new facility, providing a general mechanism for intrusive > doubly-linked lists. A class supports inclusion in a list by having an > IntrusiveListEntry member, and providing structured information about how to > access that member. A class supports inclusion in multiple lists by having > multiple IntrusiveListEntry members, with different lists specified to use > different members. > > The IntrusiveList class template provides the list management. It is modelled > on bidirectional containers such as std::list and boost::intrusive::list, > providing many of the expected member types and functions. (Note that the > member types use the Standard's naming conventions.) (Not all standard > container requirements are met; some operations are not presently supported > because they haven't been needed yet.) This includes iteration support using > (mostly) standard-conforming iterator types (they are presently missing > iterator_category member types, pending being able to include so we > can use std::bidirectional_iterator_tag). > > This change only provides the new facility, and doesn't include any uses of > it. It is intended to replace the 4-5 (or maybe more) competing intrusive > doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of > those alterantives, this proposal provides a suite of unit tests. > > An example of a place that I think might benefit from this is G1's region > handling. There are various places where G1 iterates over all regions in order > to do something with those which satisfy some property (humongous regions, > regions in the collection set, &etc). If it were trivial to create new region > sublists (and this facility makes that easy), some of these could be turned > into direct iteration over only the regions of interest. > > Some specific points to consider when reviewing this proposal: > > (1) This proposal follows Standard Library API conventions, which differ from > HotSpot in various ways. > > (1a) Lists and iterators provide various type members, with names per the > Standard Library. There has been discussion of using some parts of the > Standard Library eventually, in which case this would be important. But for > now some of the naming choices are atypical for HotSpot. > > (1b) Some of the function signatures follow the Standard Library APIs even > though the reasons for that form might not apply to HotSpot. For example, the > list pop operations don't return the removed value. For node-based containers > in Standard Library that would introduce exception... Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: const-element nomenclature, other review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15896/files - new: https://git.openjdk.org/jdk/pull/15896/files/4a959bd7..be191f3b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=05-06 Stats: 48 lines in 1 file changed: 9 ins; 3 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/15896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15896/head:pull/15896 PR: https://git.openjdk.org/jdk/pull/15896 From kbarrett at openjdk.org Mon Oct 16 04:38:53 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 16 Oct 2023 04:38:53 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: <1Eb95C9_AH5UD6ldaBIIywbyR7hj1LG12xmEiQ7_wLM=.e85fbdf8-4f72-440e-bec7-b8bb691abe6c@github.com> References: <1Eb95C9_AH5UD6ldaBIIywbyR7hj1LG12xmEiQ7_wLM=.e85fbdf8-4f72-440e-bec7-b8bb691abe6c@github.com> Message-ID: On Mon, 9 Oct 2023 13:17:03 GMT, Daniel D. Daugherty wrote: >> Is something like this more clear? In my experience, non-intrusive collections >> with const-qualified elements are uncommon, so I found the implications here >> not immediately obvious. >> >> >> * A const iterator has a const-qualified element type, and provides const >> * access to the elements of the associated list. A non-const iterator has an >> * unqualified element type, and provides mutable element access. A non-const >> * iterator is implicitly convertible to a corresponding const iterator. >> * >> * A const list provides const iterators and access to const-qualified >> * elements, and cannot be used to modify the sequence of elements. Only a >> * non-const list can be used to modify the sequence of elements. >> * >> * A list can have a const-qualified element type, providing const iterators >> * and access to const-qualified elements. A const object cannot be added to >> * a list with an unqualified element type, as that wuold be an implicit >> * casting away of the const qualifier. > > Nit typo: s/wuold/would/ My suggestion isn't quite right either; the nomenclature for a list/iterator with const elements needs a new term. I've pushed another rewrite, defining the terms "const-element list/iterator" and using that terminology elsewhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1360093199 From fyang at openjdk.org Mon Oct 16 07:40:13 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 16 Oct 2023 07:40:13 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 15:36:56 GMT, Ilya Gavrilin wrote: > Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. > CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. > Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. > > Tests: > Performance tests on t-head board: > With intrinsics: > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms > MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms > MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms > MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms > > Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms > MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms > MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms > MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms > > Regression tests: tier1, hotspot:tier2 on risc-v board. > > Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. > Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: > > diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java > index 6cd1353907e..0bee25366bf 100644 > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -143,12 +143,12 @@ public double ceilDouble() { > > @Benchmark > public double copySignDouble() { > - return Math.copySign(double81, doubleNegative12); > + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); > } > > @Benchmark > public float copySignFloat() { > - return Math.copySign(floatNegative99, float1); > + return Math.copySign(floatNegative99, float1) + Math.copySign(eFloat, float1) + Math.copySign... Some comments after a brief look. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1659: > 1657: // on input we have NaN or +/-0.0 value we should return it, > 1658: // otherwise return +/- 1.0 using sign of input. > 1659: // tmp1 - used to store result of fclass operation, Seems to me that scratch register `t0` could be used in this function in order to save use of `tmp1`. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1661: > 1659: // tmp1 - used to store result of fclass operation, > 1660: // one - gives us a floating-point 1.0 (got from matching rule) > 1661: // bool single_precision - specififes single or double precision operations will be used. Suggestion: s/bool single_precision - specififes/bool is_double - specifies/ src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1663: > 1661: // bool single_precision - specififes single or double precision operations will be used. > 1662: void C2_MacroAssembler::signum_fp(FloatRegister dst, FloatRegister src, FloatRegister one, Register tmp1, bool is_double) { > 1663: assert_different_registers(dst, src, one); This constraint could be relaxed if we use scratch register `t0` here instead of `tmp`. src/hotspot/cpu/riscv/riscv.ad line 7511: > 7509: // Copysign and signum intrinsics > 7510: > 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{ A more simpler `immD zero` will do here which is similar with the x86 counterpart. src/hotspot/cpu/riscv/riscv.ad line 7513: > 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{ > 7512: match(Set dst (CopySignD src1 (Binary src2 zero))); > 7513: effect(TEMP_DEF dst, USE src1, USE src2); Unnecessary effect. src/hotspot/cpu/riscv/riscv.ad line 7526: > 7524: instruct copySignF_reg(fRegF dst, fRegF src1, fRegF src2) %{ > 7525: match(Set dst (CopySignF src1 src2)); > 7526: effect(TEMP_DEF dst, USE src1, USE src2); Unnecessary effect. src/hotspot/cpu/riscv/riscv.ad line 7537: > 7535: %} > 7536: > 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one, iRegINoSp tmp1) %{ Use `immD zero` instread, which will help avoid reserving one FP register here. src/hotspot/cpu/riscv/riscv.ad line 7539: > 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one, iRegINoSp tmp1) %{ > 7538: match(Set dst (SignumD src (Binary zero one))); > 7539: effect(TEMP_DEF dst, USE src, USE one, TEMP tmp1); Unnecessary effect if changed to use `t0` instead of `tmp1` in `signum_fp`. src/hotspot/cpu/riscv/riscv.ad line 7548: > 7546: %} > 7547: > 7548: instruct signumF_reg(fRegF dst, fRegF src, fRegF zero, fRegF one, iRegINoSp tmp1) %{ Use `immF zero` instread, which will help avoid reserving one FP register here. src/hotspot/cpu/riscv/riscv.ad line 7550: > 7548: instruct signumF_reg(fRegF dst, fRegF src, fRegF zero, fRegF one, iRegINoSp tmp1) %{ > 7549: match(Set dst (SignumF src (Binary zero one))); > 7550: effect(TEMP_DEF dst, USE src, USE one, TEMP tmp1); Unnecessary effect if changed to use `t0` instead of `tmp1` in `signum_fp`. ------------- PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1679272664 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360216914 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360215721 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360219620 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360221019 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360230241 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360230310 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360222911 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360231462 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360223205 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360231642 From kbarrett at openjdk.org Mon Oct 16 08:01:53 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 16 Oct 2023 08:01:53 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v7] In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 04:38:52 GMT, Kim Barrett wrote: >> Please review this new facility, providing a general mechanism for intrusive >> doubly-linked lists. A class supports inclusion in a list by having an >> IntrusiveListEntry member, and providing structured information about how to >> access that member. A class supports inclusion in multiple lists by having >> multiple IntrusiveListEntry members, with different lists specified to use >> different members. >> >> The IntrusiveList class template provides the list management. It is modelled >> on bidirectional containers such as std::list and boost::intrusive::list, >> providing many of the expected member types and functions. (Note that the >> member types use the Standard's naming conventions.) (Not all standard >> container requirements are met; some operations are not presently supported >> because they haven't been needed yet.) This includes iteration support using >> (mostly) standard-conforming iterator types (they are presently missing >> iterator_category member types, pending being able to include so we >> can use std::bidirectional_iterator_tag). >> >> This change only provides the new facility, and doesn't include any uses of >> it. It is intended to replace the 4-5 (or maybe more) competing intrusive >> doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of >> those alterantives, this proposal provides a suite of unit tests. >> >> An example of a place that I think might benefit from this is G1's region >> handling. There are various places where G1 iterates over all regions in order >> to do something with those which satisfy some property (humongous regions, >> regions in the collection set, &etc). If it were trivial to create new region >> sublists (and this facility makes that easy), some of these could be turned >> into direct iteration over only the regions of interest. >> >> Some specific points to consider when reviewing this proposal: >> >> (1) This proposal follows Standard Library API conventions, which differ from >> HotSpot in various ways. >> >> (1a) Lists and iterators provide various type members, with names per the >> Standard Library. There has been discussion of using some parts of the >> Standard Library eventually, in which case this would be important. But for >> now some of the naming choices are atypical for HotSpot. >> >> (1b) Some of the function signatures follow the Standard Library APIs even >> though the reasons for that form might not apply to HotSpot. For example, the >> list pop operations don't return the removed... > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > const-element nomenclature, other review comments src/hotspot/share/utilities/intrusiveList.hpp line 87: > 85: * * Base is the base class for the list. This is typically > 86: * used to specify the allocation class, such as CHeapObj<>. The default > 87: * is void, indicating the list is not derived from an allocation class. I'm not certain this Base class for allocation support is actually needed. I remember one of the alternatives had (or used to have?) allocation base class support, but haven't found it when I looked recently. But we have a lot of these doubly-linked-lists in HotSpot. Do we have a use-case for a "heap" allocated bare (as in not embedded in some other object) list? Removing it would save ~25 lines of code/comments. src/hotspot/share/utilities/intrusiveList.hpp line 994: > 992: const_reference operator[](size_type n) const { > 993: return nth_element(cbegin(), cend(), n); > 994: } Do we need these operator[]'s? Neither std::list nor boost::intrusive::list have such, and I don't think any of the existing intrusive lists in HotSpot have such either. Maybe there's no real use-case. Removal would save 25-30 lines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1360268160 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1360270735 From thartmann at openjdk.org Mon Oct 16 10:31:06 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 16 Oct 2023 10:31:06 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v3] In-Reply-To: <4EB3jcooR9miV6MpOHmX_A_Zp-j-CkmBNXn9QjCC6L0=.59e315e6-9397-48cb-a372-b826a4703231@github.com> References: <4EB3jcooR9miV6MpOHmX_A_Zp-j-CkmBNXn9QjCC6L0=.59e315e6-9397-48cb-a372-b826a4703231@github.com> Message-ID: On Fri, 6 Oct 2023 18:48:46 GMT, Cesar Soares Lucas wrote: >> ### Description >> >> Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. >> >> Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. >> >> The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. >> >> ### Benchmarking >> >> **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. >> **Note 2:** Marging of error was negligible. >> >> | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | >> |--------------------------------------|------------------|-------------------| >> | TestTrapAfterMerge | 19.515 | 13.386 | >> | TestArgEscape | 33.165 | 33.254 | >> | TestCallTwoSide | 70.547 | 69.427 | >> | TestCmpAfterMerge | 16.400 | 2.984 | >> | TestCmpMergeWithNull_Second | 27.204 | 27.293 | >> | TestCmpMergeWithNull | 8.248 | 4.920 | >> | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | >> | TestCondAfterMergeWithNull | 6.265 | 5.078 | >> | TestCondLoadAfterMerge | 12.713 | 5.163 | >> | TestConsecutiveSimpleMerge | 30.863 | 4.068 | >> | TestDoubleIfElseMerge | 16.069 | 2.444 | >> | TestEscapeInCallAfterMerge | 23.111 | 22.924 | >> | TestGlobalEscape | 14.459 | 14.425 | >> | TestIfElseInLoop | 246.061 | 42.786 | >> | TestLoadAfterLoopAlias | 45.808 | 45.812 | >> ... > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Refrain from RAM of arrays and Phis controlled by Loop nodes. I'm still seeing the following failures: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/escape.cpp:1299), pid=1574160, tid=1574500 # assert(false) failed: SafePointScalarMerge nodes can't be nested. # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0xab151c] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x6e8 Current CompileTask: C2:39141 8262 ! 4 akka.actor.ActorCell::invokeAll$1 (577 bytes) Stack: [0x0000fffea024c000,0x0000fffea044a000], sp=0x0000fffea0444d50, free space=2019k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xab151c] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x6e8 (escape.cpp:1299) V [libjvm.so+0x90d1e4] Compile::Optimize()+0x744 (compile.cpp:2336) V [libjvm.so+0x90f098] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1504 (compile.cpp:854) V [libjvm.so+0x75b12c] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x10c (c2compiler.cpp:130) V [libjvm.so+0x91b124] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x8e4 (compileBroker.cpp:2282) V [libjvm.so+0x91bc3c] CompileBroker::compiler_thread_loop()+0x5bc (compileBroker.cpp:1943) V [libjvm.so+0xdb4bc0] JavaThread::thread_main_inner()+0xec (javaThread.cpp:720) V [libjvm.so+0x1600764] Thread::call_run()+0xb0 (thread.cpp:220) V [libjvm.so+0x1368ff8] thread_native_entry(Thread*)+0x138 (os_linux.cpp:785) C [libc.so.6+0x82a28] start_thread+0x2d4 # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/narrowptrnode.cpp:84), pid=3481386, tid=3481478 # assert(t != TypeNarrowKlass::NULL_PTR) failed: null klass? # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x140fcf4] DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4 Current CompileTask: C2:44601 8049 4 akka.dispatch.NodeMessageQueue::cleanUp (32 bytes) Stack: [0x00007f90834f6000,0x00007f90835f6000], sp=0x00007f90835f0d00, free space=1003k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x140fcf4] DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4 (narrowptrnode.cpp:84) V [libjvm.so+0x12aa37a] PhaseIdealLoop::split_thru_phi(Node*, Node*, int)+0x30a (loopopts.cpp:103) V [libjvm.so+0x12ae400] PhaseIdealLoop::split_if_with_blocks_pre(Node*)+0x270 (loopopts.cpp:1165) V [libjvm.so+0x12b325f] PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x15f (loopopts.cpp:1877) V [libjvm.so+0x12a66ff] PhaseIdealLoop::build_and_optimize()+0xf9f (loopnode.cpp:4572) V [libjvm.so+0x9f940b] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1112) V [libjvm.so+0x9f4991] Compile::Optimize()+0xd91 (compile.cpp:2171) V [libjvm.so+0x9f81e0] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b90 (compile.cpp:854) V [libjvm.so+0x848bc9] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x159 (c2compiler.cpp:130) V [libjvm.so+0xa040d0] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x980 (compileBroker.cpp:2282) V [libjvm.so+0xa04e58] CompileBroker::compiler_thread_loop()+0x508 (compileBroker.cpp:1943) V [libjvm.so+0xebf52c] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:720) V [libjvm.so+0x1793bea] Thread::call_run()+0xba (thread.cpp:220) V [libjvm.so+0x14a20da] thread_native_entry(Thread*)+0x12a (os_linux.cpp:785) Unfortunately, they happen with an internal stress test based on the Renaissance Benchmark that I can't share. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1764086364 From epeter at openjdk.org Mon Oct 16 10:34:38 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Oct 2023 10:34:38 GMT Subject: RFR: 8318078: ADLC: pass ASSERT and PRODUCT flags [v2] In-Reply-To: References: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> Message-ID: On Fri, 13 Oct 2023 17:23:32 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add comments like Vladimir requested > > make/hotspot/gensrc/GensrcAdlc.gmk line 138: > >> 136: # Set ASSERT, NDEBUG and PRODUCT flags just like in JvmFlags.gmk >> 137: ifeq ($(DEBUG_LEVEL), release) >> 138: ADLCFLAGS += -DNDEBUG > > May be you should also copy all comments from `JvmFlags.gmk` to avoid confusion because, for example, `NDEBUG` is not used directly in HotSpot code but it is needed "to disable uses of assert macro from ." Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16178#discussion_r1360341172 From epeter at openjdk.org Mon Oct 16 10:34:37 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Oct 2023 10:34:37 GMT Subject: RFR: 8318078: ADLC: pass ASSERT and PRODUCT flags [v2] In-Reply-To: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> References: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> Message-ID: > @vnkozlov asked me to guard some debug AD file rules in `#ifdef ASSERT`. https://github.com/openjdk/jdk/pull/14785#discussion_r1349391130 > > We discovered that the `ASSERT` and `PRODUCT` are not yet passed to ADLC, and hence they are always considered `undefined`. Hence, all of these `ifdef` blocks would always be ignored. > > **Solution** > I added the flags to `make/hotspot/gensrc/GensrcAdlc.gmk`, just like in `make/hotspot/lib/JvmFlags.gmk`. > > As @erikj79 commented: we should probably unify this. But I leave that to the build team. > > **Testing** > With this code you can see what flags are passed to ADLC: > > --- a/src/hotspot/share/adlc/main.cpp > +++ b/src/hotspot/share/adlc/main.cpp > @@ -56,6 +56,11 @@ int main(int argc, char *argv[]) > // Check for proper arguments > if( argc == 1 ) usage(AD); // No arguments? Then print usage > > + for( int i = 1; i < argc; i++ ) { // For all arguments > + char *s = argv[i]; // Get option/filename > + fprintf(stderr, "ARGV[%d] %s\n", i, s); > + } > + > // Read command line arguments and file names > for( int i = 1; i < argc; i++ ) { // For all arguments > char *s = argv[i]; // Get option/filename > > > On `linux-x64` I get: > > ARGV[1] -q > ARGV[2] -T > ARGV[3] -DLINUX=1 > ARGV[4] -D_GNU_SOURCE=1 > ARGV[5] -g > ARGV[6] -DAMD64=1 > ARGV[7] -D_LP64=1 > ARGV[8] -DNDEBUG > ARGV[9] -DPRODUCT > > > And on `linux-x64-debug` I get: > > ARGV[1] -q > ARGV[2] -T > ARGV[3] -DLINUX=1 > ARGV[4] -D_GNU_SOURCE=1 > ARGV[5] -g > ARGV[6] -DAMD64=1 > ARGV[7] -D_LP64=1 > ARGV[8] -DASSERT > > > I verified that the `#ifdef` work as expected, by adding this code to `src/hotspot/cpu/x86/x86.ad`: > > #ifdef ASSERT > #ifdef PRODUCT > control > #endif > #endif > > #ifdef ASSERT > xxx > #endif > > #ifdef PRODUCT > yyy > #endif > > When compiling, I get complaints for `yyy` on `linux-x64` and for `xxx` on `linux-x64-debug`. But since `ASSERT` and `PRODUCT` never occur together, we never get complaints about `control`. > > **Running tier1-3 and stress testing ...** Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add comments like Vladimir requested ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16178/files - new: https://git.openjdk.org/jdk/pull/16178/files/b2875032..299ac4a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16178&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16178&range=00-01 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16178.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16178/head:pull/16178 PR: https://git.openjdk.org/jdk/pull/16178 From iwalulya at openjdk.org Mon Oct 16 11:12:58 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 16 Oct 2023 11:12:58 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v7] In-Reply-To: References: Message-ID: <1PcaddoCAHSctiUdslv577vO1CO5LbElCt407lvYHJM=.2813c0ac-ad5c-4d08-a4fa-91aa868da9a3@github.com> On Mon, 16 Oct 2023 07:58:27 GMT, Kim Barrett wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> const-element nomenclature, other review comments > > src/hotspot/share/utilities/intrusiveList.hpp line 994: > >> 992: const_reference operator[](size_type n) const { >> 993: return nth_element(cbegin(), cend(), n); >> 994: } > > Do we need these operator[]'s? Neither std::list nor boost::intrusive::list have such, and I don't think any of the > existing intrusive lists in HotSpot have such either. Maybe there's no real use-case. Removal would save > 25-30 lines. If we cannot find a real use-case for HotSpot, then maybe we shouldn't include them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1360496623 From ayang at openjdk.org Mon Oct 16 13:31:41 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 16 Oct 2023 13:31:41 GMT Subject: RFR: 8317350: Move code cache purging out of CodeCache::UnloadingScope In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 12:56:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring that moves actual code cache flushing/purging out of `CodeCache::UnloadingScope`. Reasons: > > * I prefer that a destructor does not do anything substantial - in some cases, 90% of time is spent in the destructor in that extracted method (due to https://bugs.openjdk.org/browse/JDK-8316959) > * imho it does not fit the class which does nothing but sets/resets some code cache unloading behavior (probably should be renamed to `UnloadingBehaviorScope` too in a separate CR). > * other existing methods at that level are placed out of that (or any other) scope object too - which is already the case for when doing concurrent unloading. > * putting it there makes future logging of the various phases a little bit easier, not having `GCTraceTimer` et al. in various places. > > Testing: gha > > Thanks, > Thomas src/hotspot/share/gc/parallel/psParallelCompact.cpp line 2068: > 2066: > 2067: // Release unloaded nmethods's memory. > 2068: CodeCache::flush_unlinked_nmethods(); The fact that recycling resources for nmethods involves two steps, unlink and free-unlinked, seems an insignificant impl detail in this caller context. Can that be hidden away from the public API, e.g. `do_unloading` perform these two steps directly? Is there a reason why `flush_unlinked_nmethods` is outside the scope? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16011#discussion_r1360667446 From erikj at openjdk.org Mon Oct 16 13:45:12 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 16 Oct 2023 13:45:12 GMT Subject: RFR: 8318078: ADLC: pass ASSERT and PRODUCT flags [v2] In-Reply-To: References: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> Message-ID: On Mon, 16 Oct 2023 10:34:37 GMT, Emanuel Peter wrote: >> @vnkozlov asked me to guard some debug AD file rules in `#ifdef ASSERT`. https://github.com/openjdk/jdk/pull/14785#discussion_r1349391130 >> >> We discovered that the `ASSERT` and `PRODUCT` are not yet passed to ADLC, and hence they are always considered `undefined`. Hence, all of these `ifdef` blocks would always be ignored. >> >> **Solution** >> I added the flags to `make/hotspot/gensrc/GensrcAdlc.gmk`, just like in `make/hotspot/lib/JvmFlags.gmk`. >> >> As @erikj79 commented: we should probably unify this. But I leave that to the build team. >> >> **Testing** >> With this code you can see what flags are passed to ADLC: >> >> --- a/src/hotspot/share/adlc/main.cpp >> +++ b/src/hotspot/share/adlc/main.cpp >> @@ -56,6 +56,11 @@ int main(int argc, char *argv[]) >> // Check for proper arguments >> if( argc == 1 ) usage(AD); // No arguments? Then print usage >> >> + for( int i = 1; i < argc; i++ ) { // For all arguments >> + char *s = argv[i]; // Get option/filename >> + fprintf(stderr, "ARGV[%d] %s\n", i, s); >> + } >> + >> // Read command line arguments and file names >> for( int i = 1; i < argc; i++ ) { // For all arguments >> char *s = argv[i]; // Get option/filename >> >> >> On `linux-x64` I get: >> >> ARGV[1] -q >> ARGV[2] -T >> ARGV[3] -DLINUX=1 >> ARGV[4] -D_GNU_SOURCE=1 >> ARGV[5] -g >> ARGV[6] -DAMD64=1 >> ARGV[7] -D_LP64=1 >> ARGV[8] -DNDEBUG >> ARGV[9] -DPRODUCT >> >> >> And on `linux-x64-debug` I get: >> >> ARGV[1] -q >> ARGV[2] -T >> ARGV[3] -DLINUX=1 >> ARGV[4] -D_GNU_SOURCE=1 >> ARGV[5] -g >> ARGV[6] -DAMD64=1 >> ARGV[7] -D_LP64=1 >> ARGV[8] -DASSERT >> >> >> I verified that the `#ifdef` work as expected, by adding this code to `src/hotspot/cpu/x86/x86.ad`: >> >> #ifdef ASSERT >> #ifdef PRODUCT >> control >> #endif >> #endif >> >> #ifdef ASSERT >> xxx >> #endif >> >> #ifdef PRODUCT >> yyy >> #endif >> >> When compiling, I get complaints for `yyy` on `linux-x64` and for `xxx` on `linux-x64-debug`. But since `ASSERT` and `PRODUCT` never occur together, we never get complaints about `control`. >> >> **Running tier1-3 and stress testing ...** > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add comments like Vladimir requested Marked as reviewed by erikj (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16178#pullrequestreview-1680021787 From rrich at openjdk.org Mon Oct 16 14:39:17 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 16 Oct 2023 14:39:17 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v22] In-Reply-To: <1_KC1smUe4FUpFs3u9GRAYc_RGtxMKFyc4SwnZWqJUk=.55658bac-4a3b-4e57-bdad-dbf2ec9b9170@github.com> References: <1_KC1smUe4FUpFs3u9GRAYc_RGtxMKFyc4SwnZWqJUk=.55658bac-4a3b-4e57-bdad-dbf2ec9b9170@github.com> Message-ID: On Fri, 13 Oct 2023 14:39:48 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove obsolete comment I've done some jbb2005 benchmarking. The way I did it is not compliant but I though it would be ok to compare baseline with this pr: * 5 jbb2005 runs with baseline and another 5 with this pr. * 4 iterations per run, each with 8 warehouses. * Warm-up: first 3 iterations. * Result: gc pauses from the last iteration. I dont't see a significant difference between baseline and this pr, neither in gc times nor in benchmark throughput. [jbb2005_gc_times.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/jbb2005_gc_times.pdf) is a summary (generated from [jbb2005_gc_times.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/jbb2005_gc_times.ods)). The complete output from the runs with baseline is given in [jbb2005_5_runs_pr_14846_baseline.log.gz](https://cr.openjdk.org/~rrich/webrevs/8310031/jbb2005_5_runs_pr_14846_baseline.log.gz). The complete output from the runs with this pr is given in [jbb2005_5_runs_pr_14846.log.gz](https://cr.openjdk.org/~rrich/webrevs/8310031/jbb2005_5_runs_pr_14846.log.gz). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1764620634 From tschatzl at openjdk.org Mon Oct 16 14:40:49 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 16 Oct 2023 14:40:49 GMT Subject: RFR: 8317350: Move code cache purging out of CodeCache::UnloadingScope In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 13:28:03 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> please review this refactoring that moves actual code cache flushing/purging out of `CodeCache::UnloadingScope`. Reasons: >> >> * I prefer that a destructor does not do anything substantial - in some cases, 90% of time is spent in the destructor in that extracted method (due to https://bugs.openjdk.org/browse/JDK-8316959) >> * imho it does not fit the class which does nothing but sets/resets some code cache unloading behavior (probably should be renamed to `UnloadingBehaviorScope` too in a separate CR). >> * other existing methods at that level are placed out of that (or any other) scope object too - which is already the case for when doing concurrent unloading. >> * putting it there makes future logging of the various phases a little bit easier, not having `GCTraceTimer` et al. in various places. >> >> Testing: gha >> >> Thanks, >> Thomas > > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 2068: > >> 2066: >> 2067: // Release unloaded nmethods's memory. >> 2068: CodeCache::flush_unlinked_nmethods(); > > The fact that recycling resources for nmethods involves two steps, unlink and free-unlinked, seems an insignificant impl detail in this caller context. Can that be hidden away from the public API, e.g. `do_unloading` perform these two steps directly? > > Is there a reason why `flush_unlinked_nmethods` is outside the scope? It is maybe an insignificant detail in the context of stw collectors, but for gcs doing concurrent class/code unloading there needs to be a handshake/memory synchronization between unlinking and freeing, so these two operations need to be split. I.e. what they do at a high level is: unlink classes unlink code cache unlink other stuff handshake free unlinked classes free unlinked code free other stuff Similarly I would like to split G1 class unloading into a STW part (unlinking stuff) and make all the freeing concurrent to avoid the additional (mostly) implementation overhead. Or at least start with that and then see if it is worth making everything concurrent. (The unlinking can be trimmed down further afaics). Back to this CR: as the description states I think the current code wrongly hides this free-unlinked procedure in that `UnloadingScope` (there is a reason it is only used for the STW collectors) unnecessarily making the observed behavior (of one of the most time-consuming parts of class/code unloading) surprising. Putting it on this level also allows more straightforward logging. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16011#discussion_r1360764085 From aph at openjdk.org Mon Oct 16 14:43:46 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 16 Oct 2023 14:43:46 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v9] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 11 Oct 2023 18:08:55 GMT, Vladimir Ivanov wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Add TestDenormalDouble.java > > test/hotspot/jtreg/compiler/floatingpoint/libfast-math.c line 26: > >> 24: #include >> 25: #include >> 26: #include "jni.h" > > Redundant includes? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1360767425 From mbaesken at openjdk.org Mon Oct 16 15:52:47 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 16 Oct 2023 15:52:47 GMT Subject: RFR: JDK-8313764: Offer JVM HS functionality to shared lib load operations done by the JDK codebase [v2] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 15:18:03 GMT, Matthias Baesken wrote: >> Currently there is a number of functionality that would be interesting to have for shared lib load operations in the JDK C code. >> Some examples : >> Events::log_dll_message for hs-err files reporting >> JFR event NativeLibraryLoad >> There is the need to update the shared lib Cache on AIX ( see LoadedLibraries::reload() , see also https://bugs.openjdk.org/browse/JDK-8314152 ), >> this is currently not fully in sync with libs loaded form jdk c-libs and sometimes reports outdated information >> >> Offer an interface (e.g. jvm.cpp) to support this. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > windows aarch64 build issues Hello, any comments about the idea of calling into 'os::dll_load' instead ? That would indeed make the coding smaller and less 'messy' . ------------- PR Comment: https://git.openjdk.org/jdk/pull/15264#issuecomment-1764687056 From kvn at openjdk.org Mon Oct 16 16:05:53 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 16 Oct 2023 16:05:53 GMT Subject: RFR: 8318078: ADLC: pass ASSERT and PRODUCT flags [v2] In-Reply-To: References: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> Message-ID: On Mon, 16 Oct 2023 10:34:37 GMT, Emanuel Peter wrote: >> @vnkozlov asked me to guard some debug AD file rules in `#ifdef ASSERT`. https://github.com/openjdk/jdk/pull/14785#discussion_r1349391130 >> >> We discovered that the `ASSERT` and `PRODUCT` are not yet passed to ADLC, and hence they are always considered `undefined`. Hence, all of these `ifdef` blocks would always be ignored. >> >> **Solution** >> I added the flags to `make/hotspot/gensrc/GensrcAdlc.gmk`, just like in `make/hotspot/lib/JvmFlags.gmk`. >> >> As @erikj79 commented: we should probably unify this. But I leave that to the build team. >> >> **Testing** >> With this code you can see what flags are passed to ADLC: >> >> --- a/src/hotspot/share/adlc/main.cpp >> +++ b/src/hotspot/share/adlc/main.cpp >> @@ -56,6 +56,11 @@ int main(int argc, char *argv[]) >> // Check for proper arguments >> if( argc == 1 ) usage(AD); // No arguments? Then print usage >> >> + for( int i = 1; i < argc; i++ ) { // For all arguments >> + char *s = argv[i]; // Get option/filename >> + fprintf(stderr, "ARGV[%d] %s\n", i, s); >> + } >> + >> // Read command line arguments and file names >> for( int i = 1; i < argc; i++ ) { // For all arguments >> char *s = argv[i]; // Get option/filename >> >> >> On `linux-x64` I get: >> >> ARGV[1] -q >> ARGV[2] -T >> ARGV[3] -DLINUX=1 >> ARGV[4] -D_GNU_SOURCE=1 >> ARGV[5] -g >> ARGV[6] -DAMD64=1 >> ARGV[7] -D_LP64=1 >> ARGV[8] -DNDEBUG >> ARGV[9] -DPRODUCT >> >> >> And on `linux-x64-debug` I get: >> >> ARGV[1] -q >> ARGV[2] -T >> ARGV[3] -DLINUX=1 >> ARGV[4] -D_GNU_SOURCE=1 >> ARGV[5] -g >> ARGV[6] -DAMD64=1 >> ARGV[7] -D_LP64=1 >> ARGV[8] -DASSERT >> >> >> I verified that the `#ifdef` work as expected, by adding this code to `src/hotspot/cpu/x86/x86.ad`: >> >> #ifdef ASSERT >> #ifdef PRODUCT >> control >> #endif >> #endif >> >> #ifdef ASSERT >> xxx >> #endif >> >> #ifdef PRODUCT >> yyy >> #endif >> >> When compiling, I get complaints for `yyy` on `linux-x64` and for `xxx` on `linux-x64-debug`. But since `ASSERT` and `PRODUCT` never occur together, we never get complaints about `control`. >> >> **Running tier1-3 and stress testing ...** > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add comments like Vladimir requested Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16178#pullrequestreview-1680416164 From cslucas at openjdk.org Mon Oct 16 16:18:16 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 16 Oct 2023 16:18:16 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges Message-ID: ### Description Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. ### Benchmarking **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. **Note 2:** Marging of error was negligible. | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | |--------------------------------------|------------------|-------------------| | TestTrapAfterMerge | 19.515 | 13.386 | | TestArgEscape | 33.165 | 33.254 | | TestCallTwoSide | 70.547 | 69.427 | | TestCmpAfterMerge | 16.400 | 2.984 | | TestCmpMergeWithNull_Second | 27.204 | 27.293 | | TestCmpMergeWithNull | 8.248 | 4.920 | | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | | TestCondAfterMergeWithNull | 6.265 | 5.078 | | TestCondLoadAfterMerge | 12.713 | 5.163 | | TestConsecutiveSimpleMerge | 30.863 | 4.068 | | TestDoubleIfElseMerge | 16.069 | 2.444 | | TestEscapeInCallAfterMerge | 23.111 | 22.924 | | TestGlobalEscape | 14.459 | 14.425 | | TestIfElseInLoop | 246.061 | 42.786 | | TestLoadAfterLoopAlias | 45.808 | 45.812 | | TestLoadAfterTrap | 28.370 | 28.514 | | TestLoadInCondAfterMerge | 12.538 | 4.720 | | TestLoadInLoop | 25.534 | 17.079 | | TestMergedAccessAfterCallNoWrite | 169.837 | 169.881 | | TestMergedAccessAfterCallWithWrite | 149.669 | 152.105 | | TestMergedLoadAfterDirectStore | 16.496 | 16.473 | | TestMergesAndMixedEscape | 28.821 | 19.701 | | TestNestedObjectsArray | 31.207 | 27.832 | | TestNestedObjectsNoEscapeObject | 16.162 | 12.544 | | TestNestedObjectsObject | 16.117 | 12.204 | | TestNoEscapeWithLoadInLoop | 253.903 | 247.400 | | TestNoEscapeWithWriteInLoop | 113.710 | 113.714 | | TestObjectIdentity | 2.442 | 2.442 | | TestPartialPhis | 4.340 | 4.340 | | TestPollutedNoWrite | 7.817 | 1.991 | | TestPollutedPolymorphic | 11.017 | 1.991 | | TestPollutedWithWrite | 8.596 | 8.593 | | TestSRAndNSR_NoTrap_caller | 14.865 | 8.536 | | TestSRAndNSR_Trap_caller | 45.689 | 40.930 | | TestSimpleAliasedAlloc | 16.297 | 2.447 | | TestSimpleDoubleMerge | 23.786 | 2.997 | | TestString_one_caller | 15.484 | 15.271 | | TestString_two_caller | 15.456 | 14.996 | | TestSubclassesTrapping | 26.820 | 26.143 | | TestSubclasses | 6.521 | 3.834 | | TestThreeWayAliasedAlloc | 16.307 | 2.308 | | TestTrappingAfterMerge | 13.683 | 6.804 | ### Tests - Linux x86_64: Tier1-4, DaCapo, Renaissance, SpecJBB - MacOS Aarch64: Tier1-4 - Windows x86_64: Tier1-4 ------------- Commit messages: - Refrain from RAM of arrays and Phis controlled by Loop nodes. - Fix typo in test. - Fix build after merge. - Fix merge - Support for reducing nullable allocation merges. Changes: https://git.openjdk.org/jdk/pull/15825/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316991 Stats: 2291 lines in 13 files changed: 2051 ins; 94 del; 146 mod Patch: https://git.openjdk.org/jdk/pull/15825.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15825/head:pull/15825 PR: https://git.openjdk.org/jdk/pull/15825 From thartmann at openjdk.org Mon Oct 16 16:18:16 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 16 Oct 2023 16:18:16 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 18:54:34 GMT, Cesar Soares Lucas wrote: > ### Description > > Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. > > Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. > > The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. > > ### Benchmarking > > **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. > **Note 2:** Marging of error was negligible. > > | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | > |--------------------------------------|------------------|-------------------| > | TestTrapAfterMerge | 19.515 | 13.386 | > | TestArgEscape | 33.165 | 33.254 | > | TestCallTwoSide | 70.547 | 69.427 | > | TestCmpAfterMerge | 16.400 | 2.984 | > | TestCmpMergeWithNull_Second | 27.204 | 27.293 | > | TestCmpMergeWithNull | 8.248 | 4.920 | > | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | > | TestCondAfterMergeWithNull | 6.265 | 5.078 | > | TestCondLoadAfterMerge | 12.713 | 5.163 | > | TestConsecutiveSimpleMerge | 30.863 | 4.068 | > | TestDoubleIfElseMerge | 16.069 | 2.444 | > | TestEscapeInCallAfterMerge | 23.111 | 22.924 | > | TestGlobalEscape | 14.459 | 14.425 | > | TestIfElseInLoop | 246.061 | 42.786 | > | TestLoadAfterLoopAlias | 45.808 | 45.812 | > | TestLoadAfterTrap | 28.370 | ... I didn't look at this in detail yet but submitted testing. I see the following failures. `compiler/eliminateAutobox/TestByteBoxing.java` with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/loopnode.cpp:2178), pid=951972, tid=951999 # assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc Current CompileTask: C2: 1438 263 % b compiler.eliminateAutobox.TestByteBoxing::main @ 1358 (1805 bytes) Stack: [0x00007f0efc9cb000,0x00007f0efcacb000], sp=0x00007f0efcac57a0, free space=1001k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc (loopnode.cpp:2178) V [libjvm.so+0x1256ead] PathFrequency::to(Node*)+0x70d (loopPredicate.cpp:988) V [libjvm.so+0x1258b49] PhaseIdealLoop::loop_predication_impl(IdealLoopTree*)+0x8e9 (loopPredicate.cpp:1462) V [libjvm.so+0x125989a] IdealLoopTree::loop_predication(PhaseIdealLoop*)+0x9a (loopPredicate.cpp:1536) V [libjvm.so+0x12a28d7] PhaseIdealLoop::build_and_optimize()+0xf57 (loopnode.cpp:4582) V [libjvm.so+0x9ee7fb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1114) V [libjvm.so+0x9e9db6] Compile::Optimize()+0xdf6 (compile.cpp:2362) `compiler/eliminateAutobox/TestByteBoxing.java` with `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers`: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (workspace/open/src/hotspot/share/opto/loopnode.cpp:6035), pid=1353611, tid=1353627 # Error: ShouldNotReachHere() # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x129062c] PhaseIdealLoop::verify_strip_mined_scheduling(Node*, Node*)+0x26c Current CompileTask: C2: 547 68 b compiler.eliminateAutobox.TestDoubleBoxing::sump (48 bytes) Stack: [0x00007f1814966000,0x00007f1814a66000], sp=0x00007f1814a60c20, free space=1003k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x129062c] PhaseIdealLoop::verify_strip_mined_scheduling(Node*, Node*)+0x26c (loopnode.cpp:6035) V [libjvm.so+0x12a0fb0] PhaseIdealLoop::build_loop_late_post_work(Node*, bool)+0x420 (loopnode.cpp:6222) V [libjvm.so+0x12a166d] PhaseIdealLoop::build_loop_late(VectorSet&, Node_List&, Node_Stack&)+0xbd (loopnode.cpp:6045) V [libjvm.so+0x12a1f9d] PhaseIdealLoop::build_and_optimize()+0x61d (loopnode.cpp:4461) V [libjvm.so+0x9ee7fb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1114) V [libjvm.so+0x9e9498] Compile::Optimize()+0x4d8 (compile.cpp:2354) Same failures with other tests in `compiler/eliminateAutobox/` `compiler/intrinsics/unsafe/AllocateUninitializedArray.java` with `-XX:-TieredCompilation -XX:+AlwaysIncrementalInline`: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/narrowptrnode.cpp:84), pid=2114638, tid=2114665 # assert(t != TypeNarrowKlass::NULL_PTR) failed: null klass? # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x140c554] DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4 Current CompileTask: C2: 5582 123 compiler.intrinsics.unsafe.AllocateUninitializedArray::testOK (110 bytes) Stack: [0x00007fbb8b172000,0x00007fbb8b272000], sp=0x00007fbb8b26cce0, free space=1003k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x140c554] DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4 (narrowptrnode.cpp:84) V [libjvm.so+0x12a659a] PhaseIdealLoop::split_thru_phi(Node*, Node*, int)+0x30a (loopopts.cpp:103) V [libjvm.so+0x12aa620] PhaseIdealLoop::split_if_with_blocks_pre(Node*)+0x270 (loopopts.cpp:1165) V [libjvm.so+0x12af47f] PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x15f (loopopts.cpp:1877) V [libjvm.so+0x12a291f] PhaseIdealLoop::build_and_optimize()+0xf9f (loopnode.cpp:4572) V [libjvm.so+0x9ee7fb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1114) V [libjvm.so+0x9e9d51] Compile::Optimize()+0xd91 (compile.cpp:2171) I'm still seeing the following failures: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/escape.cpp:1299), pid=1574160, tid=1574500 # assert(false) failed: SafePointScalarMerge nodes can't be nested. # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0xab151c] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x6e8 Current CompileTask: C2:39141 8262 ! 4 akka.actor.ActorCell::invokeAll$1 (577 bytes) Stack: [0x0000fffea024c000,0x0000fffea044a000], sp=0x0000fffea0444d50, free space=2019k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xab151c] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x6e8 (escape.cpp:1299) V [libjvm.so+0x90d1e4] Compile::Optimize()+0x744 (compile.cpp:2336) V [libjvm.so+0x90f098] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1504 (compile.cpp:854) V [libjvm.so+0x75b12c] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x10c (c2compiler.cpp:130) V [libjvm.so+0x91b124] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x8e4 (compileBroker.cpp:2282) V [libjvm.so+0x91bc3c] CompileBroker::compiler_thread_loop()+0x5bc (compileBroker.cpp:1943) V [libjvm.so+0xdb4bc0] JavaThread::thread_main_inner()+0xec (javaThread.cpp:720) V [libjvm.so+0x1600764] Thread::call_run()+0xb0 (thread.cpp:220) V [libjvm.so+0x1368ff8] thread_native_entry(Thread*)+0x138 (os_linux.cpp:785) C [libc.so.6+0x82a28] start_thread+0x2d4 # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/narrowptrnode.cpp:84), pid=3481386, tid=3481478 # assert(t != TypeNarrowKlass::NULL_PTR) failed: null klass? # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x140fcf4] DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4 Current CompileTask: C2:44601 8049 4 akka.dispatch.NodeMessageQueue::cleanUp (32 bytes) Stack: [0x00007f90834f6000,0x00007f90835f6000], sp=0x00007f90835f0d00, free space=1003k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x140fcf4] DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4 (narrowptrnode.cpp:84) V [libjvm.so+0x12aa37a] PhaseIdealLoop::split_thru_phi(Node*, Node*, int)+0x30a (loopopts.cpp:103) V [libjvm.so+0x12ae400] PhaseIdealLoop::split_if_with_blocks_pre(Node*)+0x270 (loopopts.cpp:1165) V [libjvm.so+0x12b325f] PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x15f (loopopts.cpp:1877) V [libjvm.so+0x12a66ff] PhaseIdealLoop::build_and_optimize()+0xf9f (loopnode.cpp:4572) V [libjvm.so+0x9f940b] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1112) V [libjvm.so+0x9f4991] Compile::Optimize()+0xd91 (compile.cpp:2171) V [libjvm.so+0x9f81e0] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b90 (compile.cpp:854) V [libjvm.so+0x848bc9] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x159 (c2compiler.cpp:130) V [libjvm.so+0xa040d0] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x980 (compileBroker.cpp:2282) V [libjvm.so+0xa04e58] CompileBroker::compiler_thread_loop()+0x508 (compileBroker.cpp:1943) V [libjvm.so+0xebf52c] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:720) V [libjvm.so+0x1793bea] Thread::call_run()+0xba (thread.cpp:220) V [libjvm.so+0x14a20da] thread_native_entry(Thread*)+0x12a (os_linux.cpp:785) Unfortunately, they happen with an internal stress test based on the Renaissance Benchmark that I can't share. ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15825#pullrequestreview-1654667506 PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1764086364 From cslucas at openjdk.org Mon Oct 16 16:18:17 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 16 Oct 2023 16:18:17 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges In-Reply-To: References: Message-ID: <5Wj8SVRwRqlVyO2I1Os9_3WvW476UMPh8KsbDrJOwEo=.c385335f-e86b-4223-8738-c16022602887@github.com> On Tue, 3 Oct 2023 08:43:46 GMT, Tobias Hartmann wrote: >> ### Description >> >> Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. >> >> Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. >> >> The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. >> >> ### Benchmarking >> >> **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. >> **Note 2:** Marging of error was negligible. >> >> | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | >> |--------------------------------------|------------------|-------------------| >> | TestTrapAfterMerge | 19.515 | 13.386 | >> | TestArgEscape | 33.165 | 33.254 | >> | TestCallTwoSide | 70.547 | 69.427 | >> | TestCmpAfterMerge | 16.400 | 2.984 | >> | TestCmpMergeWithNull_Second | 27.204 | 27.293 | >> | TestCmpMergeWithNull | 8.248 | 4.920 | >> | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | >> | TestCondAfterMergeWithNull | 6.265 | 5.078 | >> | TestCondLoadAfterMerge | 12.713 | 5.163 | >> | TestConsecutiveSimpleMerge | 30.863 | 4.068 | >> | TestDoubleIfElseMerge | 16.069 | 2.444 | >> | TestEscapeInCallAfterMerge | 23.111 | 22.924 | >> | TestGlobalEscape | 14.459 | 14.425 | >> | TestIfElseInLoop | 246.061 | 42.786 | >> | TestLoadAfterLoopAlias | 45.808 | 45.812 | >> ... > > I didn't look at this in detail yet but submitted testing. I see the following failures. > > `compiler/eliminateAutobox/TestByteBoxing.java` with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/workspace/open/src/hotspot/share/opto/loopnode.cpp:2178), pid=951972, tid=951999 > # assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed > # > # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc > > Current CompileTask: > C2: 1438 263 % b compiler.eliminateAutobox.TestByteBoxing::main @ 1358 (1805 bytes) > > Stack: [0x00007f0efc9cb000,0x00007f0efcacb000], sp=0x00007f0efcac57a0, free space=1001k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc (loopnode.cpp:2178) > V [libjvm.so+0x1256ead] PathFrequency::to(Node*)+0x70d (loopPredicate.cpp:988) > V [libjvm.so+0x1258b49] PhaseIdealLoop::loop_predication_impl(IdealLoopTree*)+0x8e9 (loopPredicate.cpp:1462) > V [libjvm.so+0x125989a] IdealLoopTree::loop_predication(PhaseIdealLoop*)+0x9a (loopPredicate.cpp:1536) > V [libjvm.so+0x12a28d7] PhaseIdealLoop::build_and_optimize()+0xf57 (loopnode.cpp:4582) > V [libjvm.so+0x9ee7fb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1114) > V [libjvm.so+0x9e9db6] Compile::Optimize()+0xdf6 (compile.cpp:2362) > > > `compiler/eliminateAutobox/TestByteBoxing.java` with `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers`: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (workspace/open/src/hotspot/share/opto/loopnode.cpp:6035), pid=1353611, tid=1353627 > # Error: ShouldNotReachHere() > # > # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartma... Thank you @TobiHartmann . I'll take a look into the failures. Hello @TobiHartmann, I pushed a fix for the test failures that you reported. Could you please re-run your tests? Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1745344848 PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1753928457 From thartmann at openjdk.org Mon Oct 16 16:18:18 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 16 Oct 2023 16:18:18 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges In-Reply-To: <5Wj8SVRwRqlVyO2I1Os9_3WvW476UMPh8KsbDrJOwEo=.c385335f-e86b-4223-8738-c16022602887@github.com> References: <5Wj8SVRwRqlVyO2I1Os9_3WvW476UMPh8KsbDrJOwEo=.c385335f-e86b-4223-8738-c16022602887@github.com> Message-ID: <9JLqpJazC8z28PDyWDNmcMdKuUv9Xvc6cwZ7ZEIcxS8=.aac68c0f-ee64-416c-bbac-9782d787a882@github.com> On Mon, 9 Oct 2023 21:47:45 GMT, Cesar Soares Lucas wrote: >> I didn't look at this in detail yet but submitted testing. I see the following failures. >> >> `compiler/eliminateAutobox/TestByteBoxing.java` with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`: >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/workspace/open/src/hotspot/share/opto/loopnode.cpp:2178), pid=951972, tid=951999 >> # assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed >> # >> # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2) >> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc >> >> Current CompileTask: >> C2: 1438 263 % b compiler.eliminateAutobox.TestByteBoxing::main @ 1358 (1805 bytes) >> >> Stack: [0x00007f0efc9cb000,0x00007f0efcacb000], sp=0x00007f0efcac57a0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x128082c] LoopNode::verify_strip_mined(int) const+0xcc (loopnode.cpp:2178) >> V [libjvm.so+0x1256ead] PathFrequency::to(Node*)+0x70d (loopPredicate.cpp:988) >> V [libjvm.so+0x1258b49] PhaseIdealLoop::loop_predication_impl(IdealLoopTree*)+0x8e9 (loopPredicate.cpp:1462) >> V [libjvm.so+0x125989a] IdealLoopTree::loop_predication(PhaseIdealLoop*)+0x9a (loopPredicate.cpp:1536) >> V [libjvm.so+0x12a28d7] PhaseIdealLoop::build_and_optimize()+0xf57 (loopnode.cpp:4582) >> V [libjvm.so+0x9ee7fb] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab (loopnode.hpp:1114) >> V [libjvm.so+0x9e9db6] Compile::Optimize()+0xdf6 (compile.cpp:2362) >> >> >> `compiler/eliminateAutobox/TestByteBoxing.java` with `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers`: >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (workspace/open/src/hotspot/share/opto/loopnode.cpp:6035), pid=1353611, tid=1353627 >> # Error: ShouldNotReachHere() >> # >> # JRE version: Java(TM) SE Runtime Envi... > > Hello @TobiHartmann, I pushed a fix for the test failures that you reported. Could you please re-run your tests? Thank you. Hi @JohnTortugo, sure. I re-submitted testing and will report back once it finished. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1758916712 From aph at openjdk.org Mon Oct 16 16:22:27 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 16 Oct 2023 16:22:27 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v10] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <-uTYDm4GQeD1SeD-qNyK_s0jrNC5Ft9tv3ZdPcAz4ts=.1b1077c4-4f69-41ee-bf2f-de3364d8770e@github.com> > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Comments only. - Review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/01f6e224..a3305e5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=08-09 Stats: 103 lines in 11 files changed: 59 ins; 25 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From pchilanomate at openjdk.org Mon Oct 16 16:28:44 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 16 Oct 2023 16:28:44 GMT Subject: RFR: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame [v2] In-Reply-To: References: Message-ID: <1_4hKYV0tNzeTRrvCyyACQIGasrNvRTHkHvFA8jruwI=.59795ade-0cbe-4afb-ab60-0c7f02803fe5@github.com> On Mon, 9 Oct 2023 18:38:24 GMT, Patricio Chilano Mateo wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - add comment to tests >> - use driver + @requires vm.flagless > > @dholmes-ora are you okay with the last version? > @pchilano sorry you were waiting for me. I'm not familiar enough with the Aarch64 code to Review it. My comments were just in passing on the shared code. > Ok, thanks for taking a look anyways. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15972#issuecomment-1764831207 From pchilanomate at openjdk.org Mon Oct 16 16:32:08 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 16 Oct 2023 16:32:08 GMT Subject: Integrated: 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 21:07:09 GMT, Patricio Chilano Mateo wrote: > Please review the following patch. As explained in the bug comments the problem is that os::get_sender_for_C_frame() always constructs a frame as if the sender is also a native C/C++ frame. Setting a correct value for _unextended_sp is important to avoid crashes if this value is later used to get that frame's caller, which will happen if we end up calling frame::sender_for_compiled_frame(). > > The issue exists on aarch64 for both linux and macos but the fix for linux is different. The "Procedure Call Standard for the Arm 64-bit Architecture" doesn't specify a location for the frame record within a stack frame (6.4.6), and gcc happens to choose to save it the top of the frame (lowest address) rather than the bottom. This means that changing fr->link() for fr->sender_sp() won't work. The fix is to use the value of fr->link() but adjusted using the code blob frame size before setting it as the _unextended_sp of the sender frame. While working on this fix I realized the issue is not only when the sender is a native nmethod but with all frames associated with a CodeBlob with a frame size > 0 (runtime stub, safepoint stub, etc) so the check takes that into account. I also made a small fix to next_frame() since these mentioned frames should also use frame::sender(). > > I created a new test to verify that walking the stack over a native nmethod or runtime stub now works okay. I'll try to add a reliable test case for walking over a safepoint stub too. I tested the fix by running the new test and also running tiers1-4 in mach5. I'll run the upper tiers too. > > Thanks, > Patricio This pull request has now been integrated. Changeset: 2d38495b Author: Patricio Chilano Mateo URL: https://git.openjdk.org/jdk/commit/2d38495b61ec4a8144fe187b5b11883add3dfd49 Stats: 155 lines in 6 files changed: 149 ins; 0 del; 6 mod 8316309: AArch64: VMError::print_native_stack() crashes on Java native method frame Reviewed-by: lmesnik, aph ------------- PR: https://git.openjdk.org/jdk/pull/15972 From mdoerr at openjdk.org Mon Oct 16 17:47:46 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 16 Oct 2023 17:47:46 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v4] In-Reply-To: References: Message-ID: <8yBW8ra0KCBpckZ-UZKRv_Uz4eoSeTzifwjwL8fo34c=.f520afa0-2fbe-4e3c-879b-f1de1917f4c5@github.com> On Sat, 14 Oct 2023 09:46:35 GMT, Martin Doerr wrote: >> Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move constants to globalDefinitions.hpp. I'm planning to integrate tomorrow if there are no further comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16165#issuecomment-1764970428 From matsaave at openjdk.org Mon Oct 16 18:15:32 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 16 Oct 2023 18:15:32 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache Message-ID: The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. To streamline the review, please consider these major areas that have been changed: 1. ResolvedMethodEntry class 2. Rewriter for initialization of the structure 3. cpCache for resolution 4. InterpreterRuntime, linkResolver, and templateTable 5. JVMCI 6. SA Verified with tier 1-9 tests. This change supports the following platforms: x86, aarch64 ------------- Commit messages: - 8301997: Move method resolution information out of the cpCache Changes: https://git.openjdk.org/jdk/pull/15455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301997 Stats: 2833 lines in 64 files changed: 880 ins; 1415 del; 538 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From amenkov at openjdk.org Mon Oct 16 20:12:23 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Mon, 16 Oct 2023 20:12:23 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v4] In-Reply-To: References: Message-ID: On Sat, 14 Oct 2023 20:17:31 GMT, Hannes Greule wrote: >> See the bug description for more information. >> >> This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > reword -> initial klass Marked as reviewed by amenkov (Reviewer). test/hotspot/jtreg/serviceability/HeapDump/FieldsInInstanceTest.java line 66: > 64: > 65: interface I { > 66: int i = -10; wrong indent ------------- PR Review: https://git.openjdk.org/jdk/pull/16083#pullrequestreview-1680863228 PR Review Comment: https://git.openjdk.org/jdk/pull/16083#discussion_r1361204991 From duke at openjdk.org Mon Oct 16 20:12:25 2023 From: duke at openjdk.org (Leela Mohan Venati) Date: Mon, 16 Oct 2023 20:12:25 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v2] In-Reply-To: References: Message-ID: <39f-0nlOdQBABHr1cQOq7jITuRLzM9yLDUEUm1--0N8=.f3aeed11-7151-4885-8376-7d91ca84e8a7@github.com> On Wed, 11 Oct 2023 18:50:14 GMT, Zhengyu Gu wrote: >> Interpreter oop maps are computed lazily during GC root scan and they are expensive to compute. >> >> GCs uses a small hash table per instance class to cache computed oop maps during STW root scan, but not for concurrent root scan. >> >> This patch is intended to enable `OopMapCache` for concurrent GCs. >> >> Test: >> tier1 and tier2 fastdebug and release on MacOSX, Linux 86_84 and Linux 86_32. > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup old oop map cache entry after class redefinition Changes requested by leelamv at github.com (no known OpenJDK username). src/hotspot/share/gc/shenandoah/shenandoahVMOperations.cpp line 64: > 62: OopMapCache::cleanup_old_entries(); > 63: } > 64: Do you think, VM_ShenandoahFinalMarkStartEvac walks the stack roots. If yes, i recommend adding OopMapCache::cleanup_old_entries() in VM_ShenandoahOperation::doit_epilogue(). And this would make the change simple and also revert the change in this [PR](https://github.com/openjdk/jdk/pull/15921) src/hotspot/share/oops/method.cpp line 311: > 309: void Method::mask_for(int bci, InterpreterOopMap* mask) { > 310: methodHandle h_this(Thread::current(), this); > 311: method_holder()->mask_for(h_this, bci, mask); Removing this condition allows all the threads including java threads to use/mutate oopMapCache. For ex: Java threads calls [JVM_CallStackWalk](https://github.com/openjdk/jdk/blob/741ae06c55de65dcdfe38e328022bd8dde4fa007/src/hotspot/share/prims/jvm.cpp#L586) which walks the stack and calls locals() and expressions [here](https://github.com/openjdk/jdk/blob/741ae06c55de65dcdfe38e328022bd8dde4fa007/src/hotspot/share/prims/stackwalk.cpp#L345) which access oopMapCache. ------------- PR Review: https://git.openjdk.org/jdk/pull/16074#pullrequestreview-1680858067 PR Review Comment: https://git.openjdk.org/jdk/pull/16074#discussion_r1361210493 PR Review Comment: https://git.openjdk.org/jdk/pull/16074#discussion_r1361202988 From igavrilin at openjdk.org Mon Oct 16 21:14:39 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Mon, 16 Oct 2023 21:14:39 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v2] In-Reply-To: References: Message-ID: > Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. > CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. > Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. > > Tests: > Performance tests on t-head board: > With intrinsics: > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms > MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms > MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms > MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms > > Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms > MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms > MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms > MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms > > Regression tests: tier1, hotspot:tier2 on risc-v board. > > Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. > Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: > > diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java > index 6cd1353907e..0bee25366bf 100644 > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -143,12 +143,12 @@ public double ceilDouble() { > > @Benchmark > public double copySignDouble() { > - return Math.copySign(double81, doubleNegative12); > + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); > } > > @Benchmark > public float copySignFloat() { > - return Math.copySign(floatNegative99, float1); > + return Math.copySign(floatNegative99, float1) + Math.copySign(eFloat, float1) + Math.copySign... Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Fix some registers usages and typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16186/files - new: https://git.openjdk.org/jdk/pull/16186/files/6939eff8..b0a53a0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16186&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16186&range=00-01 Stats: 14 lines in 3 files changed: 1 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/16186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16186/head:pull/16186 PR: https://git.openjdk.org/jdk/pull/16186 From dlong at openjdk.org Mon Oct 16 23:21:27 2023 From: dlong at openjdk.org (Dean Long) Date: Mon, 16 Oct 2023 23:21:27 GMT Subject: RFR: 8314258: checked_cast doesn't properly check some cases In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 02:03:26 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 192: >> >>> 190: // we would grow again quickly. >>> 191: const float WantedLoadFactor = 0.5; >>> 192: assert((current_size / WantedLoadFactor) <= SIZE_MAX, "table overflow"); >> >> Surprisingly, this might not work. See https://bugs.openjdk.org/browse/JDK-8287052. > > It looks like for clang we should use -Wimplicit-int-conversion instead of > gcc's "-Wconversion -fno-float-conversion". clang seems to have a much richer > set of warning controls in this area than does gcc. > > It looks like for clang we should use -Wimplicit-int-conversion instead of > gcc's "-Wconversion -fno-float-conversion". clang seems to have a much richer > set of warning controls in this area than does gcc. > > That way we don't implicitly get -Wimplicit-int-float-conversion, which is > what is triggering the warning mentioned in JDK-8287052. In this case the > loss of precision leading to that warning does not seem important. In my experiments, values up to 0x4000000000000200 will get pass the assert, but then we should hit a different assert later in round_up_power_of_2(). SIZE_MAX can't be represented exactly as a double, so it gets rounded up, and then that assert doesn't correctly check for overflow. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1361351061 From dlong at openjdk.org Mon Oct 16 23:21:29 2023 From: dlong at openjdk.org (Dean Long) Date: Mon, 16 Oct 2023 23:21:29 GMT Subject: RFR: 8314258: checked_cast doesn't properly check some cases In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 03:10:29 GMT, Kim Barrett wrote: > Please review this improvement to the `checked_cast` utility. > > checked_cast was added by JDK-8255544 to permit silencing of certain compiler > warnings (such as from gcc's -Wconversion) for narrowing conversions when the > value is "known" to be safely convertible. It provides debug-only runtime > verification that the conversion preserves the value while changing the type. > > There has been a recent effort to apply checked_cast to eliminate -Wconversion > warnings, with the eventual goal of turning on such warnings by default - see > JDK-8135181. > > The existing implementation checks that the value is unchanged by a round-trip > conversion, and has no restrictions on the arguments. There are several > problems with this. > > (1) There are some cases where conversion of an integral value to a different > integral type may pass the check, even though the value isn't in the range of > the destination type. > > (2) Floating point to integral conversions are often intended to discard the > fractional part. But that won't pass the round-trip conversion test, making > checked_cast mostly useless for such conversions. > > (3) Integral to floating point conversions are often intended to be > indifferent to loss of precision. But again, that won't pass the round-trip > conversion test, making checked_cast mostly useless for such conversions. > > This change to checked_cast supports integral to integral conversions, but not > conversions involving floating point types. The intent is that we'll use > "-Wconversion -Wno-float-conversion" instead of -Wconversion alone. If/when > we later want to enable -Wfloat-conversion, we can either extend checked_cast > for that purpose, or probably better, add new functions tailored for the > various use-cases. > > It also supports enum to integral conversions, mostly for compatibility with > old code that uses class-scoped enums instead of class-scoped static const > integral members, to work around ancient broken compilers. We still have a > lot of such code. > > This new checked_cast ensures (in debugging builds) that the value being > converted is in the range of the destination type. It does so while avoiding > tautological comparisons, as some versions of some compilers may warn about > such. Note that this means it can also be used to suppress -Wsign-conversion > warnings (which are not included in -Wconversion when compiling C++), which we > might explore enabling in the future. > > It also verifies a runtime check is needed, producing a compile-time error if > not. Unnecessary checked_cast... src/hotspot/share/gc/g1/g1CodeRootSet.cpp line 193: > 191: const float WantedLoadFactor = 0.5; > 192: assert((current_size / WantedLoadFactor) <= SIZE_MAX, "table overflow"); > 193: size_t min_expected_size = current_size / WantedLoadFactor; Is it UB or compiler-specific behavior if the double --> size_t overflows? Because gcc returns SIZE_MAX. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16005#discussion_r1361351563 From cslucas at openjdk.org Mon Oct 16 23:24:24 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 16 Oct 2023 23:24:24 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 09:32:40 GMT, Tobias Hartmann wrote: >> ### Description >> >> Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. >> >> Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. >> >> The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. >> >> ### Benchmarking >> >> **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. >> **Note 2:** Marging of error was negligible. >> >> | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | >> |--------------------------------------|------------------|-------------------| >> | TestTrapAfterMerge | 19.515 | 13.386 | >> | TestArgEscape | 33.165 | 33.254 | >> | TestCallTwoSide | 70.547 | 69.427 | >> | TestCmpAfterMerge | 16.400 | 2.984 | >> | TestCmpMergeWithNull_Second | 27.204 | 27.293 | >> | TestCmpMergeWithNull | 8.248 | 4.920 | >> | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | >> | TestCondAfterMergeWithNull | 6.265 | 5.078 | >> | TestCondLoadAfterMerge | 12.713 | 5.163 | >> | TestConsecutiveSimpleMerge | 30.863 | 4.068 | >> | TestDoubleIfElseMerge | 16.069 | 2.444 | >> | TestEscapeInCallAfterMerge | 23.111 | 22.924 | >> | TestGlobalEscape | 14.459 | 14.425 | >> | TestIfElseInLoop | 246.061 | 42.786 | >> | TestLoadAfterLoopAlias | 45.808 | 45.812 | >> ... > > I'm still seeing the following failures: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/workspace/open/src/hotspot/share/opto/escape.cpp:1299), pid=1574160, tid=1574500 > # assert(false) failed: SafePointScalarMerge nodes can't be nested. > # > # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # V [libjvm.so+0xab151c] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x6e8 > > Current CompileTask: > C2:39141 8262 ! 4 akka.actor.ActorCell::invokeAll$1 (577 bytes) > > Stack: [0x0000fffea024c000,0x0000fffea044a000], sp=0x0000fffea0444d50, free space=2019k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xab151c] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x6e8 (escape.cpp:1299) > V [libjvm.so+0x90d1e4] Compile::Optimize()+0x744 (compile.cpp:2336) > V [libjvm.so+0x90f098] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1504 (compile.cpp:854) > V [libjvm.so+0x75b12c] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x10c (c2compiler.cpp:130) > V [libjvm.so+0x91b124] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x8e4 (compileBroker.cpp:2282) > V [libjvm.so+0x91bc3c] CompileBroker::compiler_thread_loop()+0x5bc (compileBroker.cpp:1943) > V [libjvm.so+0xdb4bc0] JavaThread::thread_main_inner()+0xec (javaThread.cpp:720) > V [libjvm.so+0x1600764] Thread::call_run()+0xb0 (thread.cpp:220) > V [libjvm.so+0x1368ff8] thread_native_entry(Thread*)+0x138 (os_linux.cpp:785) > C [libc.so.6+0x82a28] start_thread+0x2d4 > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/workspace/open/src/hotspot/share/opto/narrowptrnode.cpp:84), pid=3481386, tid=3481478 > # assert(t != TypeNarrowKlass::NULL_PTR) failed: null klass? > # > # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x140fcf4] DecodeNKlass... Thank you again for running the tests @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1765416119 From dholmes at openjdk.org Tue Oct 17 02:18:38 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 17 Oct 2023 02:18:38 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v10] In-Reply-To: <-uTYDm4GQeD1SeD-qNyK_s0jrNC5Ft9tv3ZdPcAz4ts=.1b1077c4-4f69-41ee-bf2f-de3364d8770e@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <-uTYDm4GQeD1SeD-qNyK_s0jrNC5Ft9tv3ZdPcAz4ts=.1b1077c4-4f69-41ee-bf2f-de3364d8770e@github.com> Message-ID: On Mon, 16 Oct 2023 16:22:27 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: > > - Comments only. > - Review feedback src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5169: > 5167: // Perform a little arithmetic to make sure that denormal > 5168: // numbers are handled correctly, i.e. that the "Denormals Are > 5169: // Zeros" flag has not been set. I don't understand what this part is doing. I thought it was simply checking so you could log/warn if the unexpected mode was detected. But it seems to cause MXCSR to not be restored when there is an issue, where I would expect you would always want to restore to overwrite the invalid mode the JNI call made. ?? src/hotspot/os/linux/os_linux.cpp line 1817: > 1815: fenv_t default_fenv; > 1816: int rtn = fegetenv(&default_fenv); > 1817: assert(rtn == 0, "fegetnv must succeed"); typo: fetgetnv -> fegetenv src/hotspot/share/runtime/stubRoutines.cpp line 330: > 328: // _small_denormal is the smallest denormal number that has two bits > 329: // set. _large_denormal is a number such that, when _small_denormal > 330: // is added it it, must be rounded according to the mode. These two s/it it, must/to it, it must/ test/hotspot/jtreg/compiler/floatingpoint/TestDenormalDouble.java line 42: > 40: for (double x = lastDouble * 2; x <= 0x1.0p1022; x *= 2) { > 41: if (x != x || x <= lastDouble) { > 42: throw new AssertionError("TEST FAILED: " + x); Tests don't normally use AssertionError like this, just a plain Error or RuntimeException. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1361432675 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1361412874 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1361427020 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1361428895 From duke at openjdk.org Tue Oct 17 02:27:29 2023 From: duke at openjdk.org (duke) Date: Tue, 17 Oct 2023 02:27:29 GMT Subject: Withdrawn: 8314571: GrowableArray should move its old data and not copy it In-Reply-To: <5qhT7CDsB-wvvdPHCEWjzx8xzZ_AFHtIgRr3ugtQD2Y=.92d80b23-715a-44c6-afbb-e8babf19fd2c@github.com> References: <5qhT7CDsB-wvvdPHCEWjzx8xzZ_AFHtIgRr3ugtQD2Y=.92d80b23-715a-44c6-afbb-e8babf19fd2c@github.com> Message-ID: On Fri, 18 Aug 2023 10:39:28 GMT, Johan Sj?len wrote: > Given some `GrowableArray` where `E` is non-copyable with a move constructor will currently fail to compile. This is because `GrowableArray`'s expand and shrink calls the copy constructor. We cast the values to rvalues (akin to `std::move`) to instead call the move constructor if available. If there is no move constructor but there is a copy constructor, then that will be called instead. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/15344 From rehn at openjdk.org Tue Oct 17 06:52:26 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 17 Oct 2023 06:52:26 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v2] In-Reply-To: References: Message-ID: <3vQCTZow94rNSxugl2YprwwejDZQdeSOjfpV2DerClQ=.3236fb4c-02e1-47f0-82e7-bdb6843dd836@github.com> On Mon, 16 Oct 2023 21:14:39 GMT, Ilya Gavrilin wrote: >> Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. >> CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. >> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. >> >> Tests: >> Performance tests on t-head board: >> With intrinsics: >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms >> MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms >> MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms >> MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms >> >> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms >> MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms >> MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms >> MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms >> >> Regression tests: tier1, hotspot:tier2 on risc-v board. >> >> Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. >> Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: >> >> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> index 6cd1353907e..0bee25366bf 100644 >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -143,12 +143,12 @@ public double ceilDouble() { >> >> @Benchmark >> public double copySignDouble() { >> - return Math.copySign(double81, doubleNegative12); >> + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); >> } >> >> @Benchmark >> public float copySignFloat() { >> - return Math.copySign(floatNegative99, float1); >> + return ... > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Fix some registers usages and typos Hey, I went ahead and create https://bugs.openjdk.org/browse/JDK-8318216. So we can track these and not do double work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16186#issuecomment-1765775211 From epeter at openjdk.org Tue Oct 17 07:14:24 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Oct 2023 07:14:24 GMT Subject: RFR: 8318078: ADLC: pass ASSERT and PRODUCT flags [v2] In-Reply-To: References: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> Message-ID: On Mon, 16 Oct 2023 16:03:39 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> add comments like Vladimir requested > > Good Thanks @vnkozlov @TobiHartmann for the help figuring this out. And thanks @erikj79 @magicus for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16178#issuecomment-1765803123 From epeter at openjdk.org Tue Oct 17 07:17:29 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Oct 2023 07:17:29 GMT Subject: Integrated: 8318078: ADLC: pass ASSERT and PRODUCT flags In-Reply-To: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> References: <-CKTyTM4X49hqWzyGGdBK3BaX3iSADnVEXQTq_j94lI=.ad1fece3-e049-4b5d-bbf3-3dd714b74bf6@github.com> Message-ID: On Fri, 13 Oct 2023 09:49:48 GMT, Emanuel Peter wrote: > @vnkozlov asked me to guard some debug AD file rules in `#ifdef ASSERT`. https://github.com/openjdk/jdk/pull/14785#discussion_r1349391130 > > We discovered that the `ASSERT` and `PRODUCT` are not yet passed to ADLC, and hence they are always considered `undefined`. Hence, all of these `ifdef` blocks would always be ignored. > > **Solution** > I added the flags to `make/hotspot/gensrc/GensrcAdlc.gmk`, just like in `make/hotspot/lib/JvmFlags.gmk`. > > As @erikj79 commented: we should probably unify this. But I leave that to the build team. > > **Testing** > With this code you can see what flags are passed to ADLC: > > --- a/src/hotspot/share/adlc/main.cpp > +++ b/src/hotspot/share/adlc/main.cpp > @@ -56,6 +56,11 @@ int main(int argc, char *argv[]) > // Check for proper arguments > if( argc == 1 ) usage(AD); // No arguments? Then print usage > > + for( int i = 1; i < argc; i++ ) { // For all arguments > + char *s = argv[i]; // Get option/filename > + fprintf(stderr, "ARGV[%d] %s\n", i, s); > + } > + > // Read command line arguments and file names > for( int i = 1; i < argc; i++ ) { // For all arguments > char *s = argv[i]; // Get option/filename > > > On `linux-x64` I get: > > ARGV[1] -q > ARGV[2] -T > ARGV[3] -DLINUX=1 > ARGV[4] -D_GNU_SOURCE=1 > ARGV[5] -g > ARGV[6] -DAMD64=1 > ARGV[7] -D_LP64=1 > ARGV[8] -DNDEBUG > ARGV[9] -DPRODUCT > > > And on `linux-x64-debug` I get: > > ARGV[1] -q > ARGV[2] -T > ARGV[3] -DLINUX=1 > ARGV[4] -D_GNU_SOURCE=1 > ARGV[5] -g > ARGV[6] -DAMD64=1 > ARGV[7] -D_LP64=1 > ARGV[8] -DASSERT > > > I verified that the `#ifdef` work as expected, by adding this code to `src/hotspot/cpu/x86/x86.ad`: > > #ifdef ASSERT > #ifdef PRODUCT > control > #endif > #endif > > #ifdef ASSERT > xxx > #endif > > #ifdef PRODUCT > yyy > #endif > > When compiling, I get complaints for `yyy` on `linux-x64` and for `xxx` on `linux-x64-debug`. But since `ASSERT` and `PRODUCT` never occur together, we never get complaints about `control`. > > **Tier1-3 and stress testing passed.** This pull request has now been integrated. Changeset: 504b0bda Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/504b0bdaaa7fb7c822014d8bd2845299fbdaf0e8 Stats: 15 lines in 1 file changed: 15 ins; 0 del; 0 mod 8318078: ADLC: pass ASSERT and PRODUCT flags Reviewed-by: ihse, erikj, kvn ------------- PR: https://git.openjdk.org/jdk/pull/16178 From ayang at openjdk.org Tue Oct 17 07:34:22 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 17 Oct 2023 07:34:22 GMT Subject: RFR: 8317350: Move code cache purging out of CodeCache::UnloadingScope In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 14:37:04 GMT, Thomas Schatzl wrote: > It is maybe an insignificant detail in the context of stw collectors Then, could Serial and Parallel use APIs that don't expose these details? For instance, move `flush_unlinked_nmethods` inside `CodeCache::do_unloading`, as it is used only by those collectors. Why is `flush_unlinked_nmethods` outside of `UnloadingScope`? This newly-introduced scope in the caller context seems extremely out of place, IMO. > Putting it on this level also allows more straightforward logging. I don't get this. Can't see any log-print logic inside `flush_unlinked_nmethods`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16011#discussion_r1361632099 From sjohanss at openjdk.org Tue Oct 17 07:49:30 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 17 Oct 2023 07:49:30 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v7] In-Reply-To: References: Message-ID: <1EPccTSz8pdipxiCK9_w4UiQnFKzoLg132Vp8DCsGHo=.0a7e9d67-bd00-4629-a8ac-2b7ab1f4263a@github.com> On Mon, 16 Oct 2023 07:56:14 GMT, Kim Barrett wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> const-element nomenclature, other review comments > > src/hotspot/share/utilities/intrusiveList.hpp line 87: > >> 85: * * Base is the base class for the list. This is typically >> 86: * used to specify the allocation class, such as CHeapObj<>. The default >> 87: * is void, indicating the list is not derived from an allocation class. > > I'm not certain this Base class for allocation support is actually needed. I remember one of the alternatives had > (or used to have?) allocation base class support, but haven't found it when I looked recently. But we have a lot > of these doubly-linked-lists in HotSpot. Do we have a use-case for a "heap" allocated bare (as in not > embedded in some other object) list? Removing it would save ~25 lines of code/comments. Did a quick skim as well and yes, quite a few ad-hoc lists in the code base. Did not see a case needing this and I think removing this makes sense if we don't see an obvious use-case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1361667266 From sjohanss at openjdk.org Tue Oct 17 07:49:33 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 17 Oct 2023 07:49:33 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v7] In-Reply-To: <1PcaddoCAHSctiUdslv577vO1CO5LbElCt407lvYHJM=.2813c0ac-ad5c-4d08-a4fa-91aa868da9a3@github.com> References: <1PcaddoCAHSctiUdslv577vO1CO5LbElCt407lvYHJM=.2813c0ac-ad5c-4d08-a4fa-91aa868da9a3@github.com> Message-ID: On Mon, 16 Oct 2023 11:10:14 GMT, Ivan Walulya wrote: >> src/hotspot/share/utilities/intrusiveList.hpp line 994: >> >>> 992: const_reference operator[](size_type n) const { >>> 993: return nth_element(cbegin(), cend(), n); >>> 994: } >> >> Do we need these operator[]'s? Neither std::list nor boost::intrusive::list have such, and I don't think any of the >> existing intrusive lists in HotSpot have such either. Maybe there's no real use-case. Removal would save >> 25-30 lines. > > If we cannot find a real use-case for HotSpot, then maybe we shouldn't include them. I agree with Ivan. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1361668049 From kbarrett at openjdk.org Tue Oct 17 09:07:26 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 17 Oct 2023 09:07:26 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v8] In-Reply-To: References: Message-ID: > Please review this new facility, providing a general mechanism for intrusive > doubly-linked lists. A class supports inclusion in a list by having an > IntrusiveListEntry member, and providing structured information about how to > access that member. A class supports inclusion in multiple lists by having > multiple IntrusiveListEntry members, with different lists specified to use > different members. > > The IntrusiveList class template provides the list management. It is modelled > on bidirectional containers such as std::list and boost::intrusive::list, > providing many of the expected member types and functions. (Note that the > member types use the Standard's naming conventions.) (Not all standard > container requirements are met; some operations are not presently supported > because they haven't been needed yet.) This includes iteration support using > (mostly) standard-conforming iterator types (they are presently missing > iterator_category member types, pending being able to include so we > can use std::bidirectional_iterator_tag). > > This change only provides the new facility, and doesn't include any uses of > it. It is intended to replace the 4-5 (or maybe more) competing intrusive > doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of > those alterantives, this proposal provides a suite of unit tests. > > An example of a place that I think might benefit from this is G1's region > handling. There are various places where G1 iterates over all regions in order > to do something with those which satisfy some property (humongous regions, > regions in the collection set, &etc). If it were trivial to create new region > sublists (and this facility makes that easy), some of these could be turned > into direct iteration over only the regions of interest. > > Some specific points to consider when reviewing this proposal: > > (1) This proposal follows Standard Library API conventions, which differ from > HotSpot in various ways. > > (1a) Lists and iterators provide various type members, with names per the > Standard Library. There has been discussion of using some parts of the > Standard Library eventually, in which case this would be important. But for > now some of the naming choices are atypical for HotSpot. > > (1b) Some of the function signatures follow the Standard Library APIs even > though the reasons for that form might not apply to HotSpot. For example, the > list pop operations don't return the removed value. For node-based containers > in Standard Library that would introduce exception... Kim Barrett has updated the pull request incrementally with four additional commits since the last revision: - remove unnecessary code markers in comments - use override for virtual test support functions - remove support for allocation base classes - remove IntrusiveList::operator[] ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15896/files - new: https://git.openjdk.org/jdk/pull/15896/files/be191f3b..a6f3a8c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=06-07 Stats: 131 lines in 2 files changed: 0 ins; 112 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/15896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15896/head:pull/15896 PR: https://git.openjdk.org/jdk/pull/15896 From kbarrett at openjdk.org Tue Oct 17 09:07:29 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 17 Oct 2023 09:07:29 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v7] In-Reply-To: <1EPccTSz8pdipxiCK9_w4UiQnFKzoLg132Vp8DCsGHo=.0a7e9d67-bd00-4629-a8ac-2b7ab1f4263a@github.com> References: <1EPccTSz8pdipxiCK9_w4UiQnFKzoLg132Vp8DCsGHo=.0a7e9d67-bd00-4629-a8ac-2b7ab1f4263a@github.com> Message-ID: On Tue, 17 Oct 2023 07:46:33 GMT, Stefan Johansson wrote: >> src/hotspot/share/utilities/intrusiveList.hpp line 87: >> >>> 85: * * Base is the base class for the list. This is typically >>> 86: * used to specify the allocation class, such as CHeapObj<>. The default >>> 87: * is void, indicating the list is not derived from an allocation class. >> >> I'm not certain this Base class for allocation support is actually needed. I remember one of the alternatives had >> (or used to have?) allocation base class support, but haven't found it when I looked recently. But we have a lot >> of these doubly-linked-lists in HotSpot. Do we have a use-case for a "heap" allocated bare (as in not >> embedded in some other object) list? Removing it would save ~25 lines of code/comments. > > Did a quick skim as well and yes, quite a few ad-hoc lists in the code base. Did not see a case needing this and I think removing this makes sense if we don't see an obvious use-case. OK, removed. We can always add it back if we find a use-case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1361772816 From kbarrett at openjdk.org Tue Oct 17 09:07:33 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 17 Oct 2023 09:07:33 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v7] In-Reply-To: References: <1PcaddoCAHSctiUdslv577vO1CO5LbElCt407lvYHJM=.2813c0ac-ad5c-4d08-a4fa-91aa868da9a3@github.com> Message-ID: <4iL2wCerrcRTSmU2mwonVHZ-1_V3WPFa0BB7bD9_Cjk=.8447dc22-9cfb-4d10-8ec7-54bb06e0eb0c@github.com> On Tue, 17 Oct 2023 07:47:05 GMT, Stefan Johansson wrote: >> If we cannot find a real use-case for HotSpot, then maybe we shouldn't include them. > > I agree with Ivan. OK, removed. We can always add it back if we find a use-case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1361772984 From kbarrett at openjdk.org Tue Oct 17 09:07:31 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 17 Oct 2023 09:07:31 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v3] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 10:25:49 GMT, Johan Sj?len wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> add IntrusiveListEntry::is_attached() > > src/hotspot/share/utilities/intrusiveList.hpp line 99: > >> 97: * specialization of the IntrusiveList class, e.g. >> 98: * >> 99: * > > We don't need these tags Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1361772611 From tschatzl at openjdk.org Tue Oct 17 09:21:22 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 17 Oct 2023 09:21:22 GMT Subject: RFR: 8317350: Move code cache purging out of CodeCache::UnloadingScope In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 07:31:51 GMT, Albert Mingkun Yang wrote: >Then, could Serial and Parallel use APIs that don't expose these details? There is no such API. This change does not intend to expose such a new API too. Just moving the various phases of code/class unloading to the same level in the source code as apparent to me to keep surprises low (and simplify logging to be able to _see_ problems in the first place. Then we can fix them in subsequent PRs). Also I would like to keep the existing structure of class/code unloading for all collectors for uniformity (e.g. separate unlinking from free-unlinking - maybe call it "purging" in the future?) - so that they will have the same structure and print the same logging messages in the same order in the future (at least for the STW collectors including G1). Note that other collectors have exactly the same phases, and unlinking is separate from purging everywhere else too. They just re-implement some phases using their own code (doing some renames in the process). Having both the same structure/phases and same (more comprehensive) logging output for more collectors will make troubleshooting much easier. There is no renaming in this change, e.g. `CodeCache::do_unloading()` (="unlinking") as this method does not do the complete unloading either indicated by having this external `flush_unlinked_nmethods` (="purging"). This is not the change to do this imo. (Iirc `CodeCache::do_unloading()` still has does some purging in it anyway, but let's not digress too much here). > For instance, move flush_unlinked_nmethods inside CodeCache::do_unloading, as it is used only by those collectors. It could, but then for (future) logging you would have to pass `GCTraceTimer` as the common way to do timing in gc code into `do_unloading()` which is some compiler code. Not sure if this is what we want as it is imo awkward. It is imho better, cleaner and sufficient to have timing outside at least initially. I.e. I would at this point prefer the style of { // Phase x GCTraceTimer x("do phase X"); do_phase_x(); } as how to time can be heavily collector dependent too. I would at the end of all that refactoring see if there is some useful coalescing of the methods into helpers that can be done. Maybe these scopes should be put into separate helper methods, I haven't decided yet what's best. >Why is flush_unlinked_nmethods outside of UnloadingScope? This newly-introduced scope in the caller context seems extremely out of place, IMO. I believe most of your questions stem from the naming. Please, do not be too hung up on names. The description of this PR already mentioned that `UnloadingScope` is a misnomer (and misnamed code is common in Hotspot code, or responsibilities changed over the course of years without fixing up the naming). So in addition to this class, be aware that there are lots of misnamed and not uniformly named methods in class/code unloading code too. `UnloadingScope` controls *unlinking* behavior by setting some globals and does not control the whole unloading process (ie. unlinking + purging). Additionally `UnloadingScope` actually did _not_ really contain `flush_unlinked_nmethods` earlier either. It has been the last call in the destructor of the `UnloadingScope`, _outside_, after the unlinking behavior has been reset. This is even more strange if you think about it. So from my understanding of the code this scope object ought to enclose only the unlinking part of the unloading (i.e. the decision of what to unlink, that is `CodeCache::do_unloading()`) before. This change only at most exposes the existing (as you say ugly - I agree) structure. Which isn't a bad thing to me to have exposed. It's not in scope of this change to fix this imo because that would mix it with other unrelated changes. (`UnloadingScope` should be called something different, maybe after this discussion something like `UnlinkingScope` or similar? Idk.) >> Putting it on this level also allows more straightforward logging. >I don't get this. Can't see any log-print logic inside flush_unlinked_nmethods. ... in the future. (https://bugs.openjdk.org/browse/JDK-8315504)[https://bugs.openjdk.org/browse/JDK-8315504] intends to add more timing/logging for every phase. There is currently no plan for comprehensive logging inside the various phases of the class/code unloading (at least not to the level of selected methods I did in recent investigations). What I intend to do is revisiting the phases in the future and move around work to better reflect the unlinking/purging split, and also link them together with a real `UnloadingScope` covering the whole unloading process (not to be mixed up with the current `UnloadingScope`) to allow gc-specific replacements/optimizations of the phases. Obviously there can be helper methods that simplify things for various gcs. This PR isn't this change though. Hth, Thomas ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16011#discussion_r1361795034 From fyang at openjdk.org Tue Oct 17 09:21:28 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 17 Oct 2023 09:21:28 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v2] In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 21:14:39 GMT, Ilya Gavrilin wrote: >> Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. >> CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. >> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. >> >> Tests: >> Performance tests on t-head board: >> With intrinsics: >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms >> MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms >> MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms >> MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms >> >> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms >> MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms >> MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms >> MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms >> >> Regression tests: tier1, hotspot:tier2 on risc-v board. >> >> Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. >> Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: >> >> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> index 6cd1353907e..0bee25366bf 100644 >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -143,12 +143,12 @@ public double ceilDouble() { >> >> @Benchmark >> public double copySignDouble() { >> - return Math.copySign(double81, doubleNegative12); >> + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); >> } >> >> @Benchmark >> public float copySignFloat() { >> - return Math.copySign(floatNegative99, float1); >> + return ... > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Fix some registers usages and typos Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1659: > 1657: // on input we have NaN or +/-0.0 value we should return it, > 1658: // otherwise return +/- 1.0 using sign of input. > 1659: // tmp1 - alias for t0 register, Maybe remove this line (L1659) of the comment? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1663: > 1661: // bool is_double - specififes single or double precision operations will be used. > 1662: void C2_MacroAssembler::signum_fp(FloatRegister dst, FloatRegister src, FloatRegister one, bool is_double) { > 1663: assert_different_registers(dst, src, one); Any reason to keep the assertion? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1682: > 1680: // use floating-point 1.0 with a sign of input > 1681: is_double ? fsgnj_d(dst, one, src) > 1682: : fsgnj_s(dst, one, src); What if the `src` argument contains zero? Math.signum(float/double) is supposed to return zero if the argument is zero [1]. [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L2602 src/hotspot/cpu/riscv/riscv.ad line 7537: > 7535: instruct signumD_reg(fRegD dst, fRegD src, immD zero, fRegD one) %{ > 7536: match(Set dst (SignumD src (Binary zero one))); > 7537: effect(TEMP_DEF dst, USE src, USE one); Any reason to keep this effect? src/hotspot/cpu/riscv/riscv.ad line 7548: > 7546: instruct signumF_reg(fRegF dst, fRegF src, immF zero, fRegF one) %{ > 7547: match(Set dst (SignumF src (Binary zero one))); > 7548: effect(TEMP_DEF dst, USE src, USE one); Any reason to keep this effect? ------------- PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1681791129 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361790804 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361788992 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361782859 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361793207 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361793356 From shade at openjdk.org Tue Oct 17 09:39:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 17 Oct 2023 09:39:39 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v10] In-Reply-To: References: Message-ID: <7DenZ_i83kiIQIVJWaYCsE4kHcGa93PcNXW4VgGbaUw=.a84889a5-77ed-475c-a072-4da395fe4cd0@github.com> On Thu, 12 Oct 2023 14:48:35 GMT, Aleksey Shipilev wrote: >> Work in progress, submitting for broader attention. >> >> See more details in the bug and related issues. >> >> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. >> >> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. >> >> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. >> >> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. >> >> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Touchup benchmark metadata @RealFYang, do you want to do the RISC-V version, or should I take a stab at it? @tstuefe, do you want to do the ARM version, or should I take a stab at it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1766044277 From igavrilin at openjdk.org Tue Oct 17 09:58:19 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Tue, 17 Oct 2023 09:58:19 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 09:09:52 GMT, Fei Yang wrote: >> Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix some registers usages and typos > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1682: > >> 1680: // use floating-point 1.0 with a sign of input >> 1681: is_double ? fsgnj_d(dst, one, src) >> 1682: : fsgnj_s(dst, one, src); > > What if the `src` argument contains zero? Math.signum(float/double) is supposed to return zero if the argument is zero [1]. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L2602 According to IEEE754, we can get positive or negative zero in the `src` register (also positive zero can be named as zero) , and these cases included to mask for the tmp1 (L1671-1676) and `src` value returned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361840030 From adinn at openjdk.org Tue Oct 17 10:36:15 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Oct 2023 10:36:15 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 19:49:23 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1804: > 1802: // Get index out of bytecode pointer > 1803: get_cache_index_at_bcp(index, bcp_offset, sizeof(u2)); > 1804: // Take shortcut if the size is a power of 2 Comment needs removing (size is not a power of 2). src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2261: > 2259: // volatile-loads. > 2260: > 2261: void TemplateTable::resolve_cache_and_index_for_field(int byte_no, This change looks more confusing than it actually needs because you deleted the definition for `resolve_cache_and_index` that preceded `resolve_cache_and_index_for_field` and then added `resolve_cache_and_index_for_method` after `resolve_cache_and_index_for_field`. As a result we see changes made to `resolve_cache_and_index` that re-introduce changes already made for `resolve_cache_and_index_for_field` followed by changes to `resolve_cache_and_index_for_field` that rework it to remove field-specific operations and replace them with method-specific operations. That's going to make it a lot harder for maintainers and back-porters to work out what this change is really doing. If instead you place the definition for `resolve_cache_and_index_for_method` before the definition for `resolve_cache_and_index_for_field` then the diff will show `resolve_cache_and_index` being repurposed to cater for methods and the definition of `resolve_cache_and_index_for_field` should remain unchanged. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1361861892 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1361881145 From aph at openjdk.org Tue Oct 17 10:59:24 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 17 Oct 2023 10:59:24 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v10] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <-uTYDm4GQeD1SeD-qNyK_s0jrNC5Ft9tv3ZdPcAz4ts=.1b1077c4-4f69-41ee-bf2f-de3364d8770e@github.com> Message-ID: On Tue, 17 Oct 2023 02:15:45 GMT, David Holmes wrote: >> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: >> >> - Comments only. >> - Review feedback > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5169: > >> 5167: // Perform a little arithmetic to make sure that denormal >> 5168: // numbers are handled correctly, i.e. that the "Denormals Are >> 5169: // Zeros" flag has not been set. > > I don't understand what this part is doing. I thought it was simply checking so you could log/warn if the unexpected mode was detected. But it seems to cause MXCSR to not be restored when there is an issue, where I would expect you would always want to restore to overwrite the invalid mode the JNI call made. ?? If we reach `FAIL`, MXCSR is reloaded from `addr_mxcsr_std()`, restoring correct IEEE behaviour. That's what RestoreMXCSROnJNICalls is supposed to do, as far as I can tell. But I will take this part out, because there are other flags in MXCSR, so there is a potential compatibility problem if e.g. the Precision Mask were set in a JNI call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1361920261 From adinn at openjdk.org Tue Oct 17 11:10:16 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Oct 2023 11:10:16 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 19:49:23 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2369: > 2367: } > 2368: > 2369: void TemplateTable::load_resolved_method_entry_common(Register cache, When I saw the `common` suffix I thought at first that this might be used in addition to the variants with the other suffixes rather than as an alternative. Since this is 'common' to only to 2 out of the 5 different cases perhaps it could be named `load_resolved_method_entry_special_or_static`. That would make it clear that each invoke type has a corresponding `load_resolved_method_entry_` variant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1361932042 From adinn at openjdk.org Tue Oct 17 11:13:33 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Oct 2023 11:13:33 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 19:49:23 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2423: > 2421: const Register index = r4; > 2422: assert_different_registers(method_or_table_index, flags); > 2423: assert_different_registers(method_or_table_index, cache, flags); This assert renders the previous one redundant ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1361935650 From adinn at openjdk.org Tue Oct 17 11:26:11 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Oct 2023 11:26:11 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 19:49:23 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2429: > 2427: __ load_unsigned_byte(flags, Address(cache, in_bytes(ResolvedMethodEntry::flags_offset()))); > 2428: > 2429: // table_or_ref_index can either be an itable index or a resolved reference index depending on the bytecode This comment seems to apply to `ref_index` in `load_resolved_method_entry_handle`? Does it need to move up there? Do we need a comment here stating something similar for `method_or_table_index`? (probably not). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1361948091 From aph at openjdk.org Tue Oct 17 11:43:59 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 17 Oct 2023 11:43:59 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: - Review feedback - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 - Remove change to RestoreMXCSROnJNICalls ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/a3305e5c..7cba08d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=09-10 Stats: 30 lines in 6 files changed: 4 ins; 21 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From ayang at openjdk.org Tue Oct 17 11:54:53 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 17 Oct 2023 11:54:53 GMT Subject: RFR: 8317350: Move code cache purging out of CodeCache::UnloadingScope In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 12:56:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring that moves actual code cache flushing/purging out of `CodeCache::UnloadingScope`. Reasons: > > * I prefer that a destructor does not do anything substantial - in some cases, 90% of time is spent in the destructor in that extracted method (due to https://bugs.openjdk.org/browse/JDK-8316959) > * imho it does not fit the class which does nothing but sets/resets some code cache unloading behavior (probably should be renamed to `UnloadingBehaviorScope` too in a separate CR). > * other existing methods at that level are placed out of that (or any other) scope object too - which is already the case for when doing concurrent unloading. > * putting it there makes future logging of the various phases a little bit easier, not having `GCTraceTimer` et al. in various places. > > Testing: gha > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16011#pullrequestreview-1682122861 From ayang at openjdk.org Tue Oct 17 11:54:55 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 17 Oct 2023 11:54:55 GMT Subject: RFR: 8317350: Move code cache purging out of CodeCache::UnloadingScope In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 09:19:05 GMT, Thomas Schatzl wrote: > UnloadingScope should be called something different, maybe after this discussion something like UnlinkingScope or similar? OK, the new-scope starts to make sense if it's named `UnlinkingScope`. In my mind, the concept of `UnloadingScope` includes unlinking + purging; the implementation supports my understanding, as I believe the destructor still belongs to the obj-on-stack. However, I now realize that this interpretation could be subjective. > This change only at most exposes the existing (as you say ugly - I agree) structure. OK, that's all I wanna convey. Maybe this is an inevitable transient state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16011#discussion_r1361991674 From lkorinth at openjdk.org Tue Oct 17 12:29:46 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 17 Oct 2023 12:29:46 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v4] In-Reply-To: References: Message-ID: <4pRda3ZAZzVzGiVrDv6LN9Pw__DhrmTm4qZjTHzaq80=.a009bb29-4869-4047-8b62-80fbe7bef692@github.com> > Rename createJavaProcessBuilder so that it is not used by mistake instead of createTestJvm. > > I have used the following sed script: `find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"` > > Then I have manually modified ProcessTools.java. In that file I have moved one version of createJavaProcessBuilder so that it is close to the other version. Then I have added a javadoc comment in bold telling: > > /** > * Create ProcessBuilder using the java launcher from the jdk to > * be tested. > * > *

Please observe that you likely should use > * createTestJvm() instead of this method because createTestJvm() > * will add JVM options from "test.vm.opts" and "test.java.opts" > * and this method will not do that. > * > * @param command Arguments to pass to the java command. > * @return The ProcessBuilder instance representing the java command. > */ > > > I have used the name createJavaProcessBuilderIgnoreTestJavaOpts because of the name of Utils.prependTestJavaOpts that adds those VM flags. If you have a better name I could do a rename of the method. I kind of like that it is long and clumsy, that makes it harder to use... > > I have run tier 1 testing, and I have started more exhaustive testing. Leo Korinth has updated the pull request incrementally with three additional commits since the last revision: - Revert "8315097: Rename createJavaProcessBuilder" This reverts commit 4b2d171133c40c5c48114602bfd0d4da75531317. - Revert "copyright" This reverts commit f3418c80cc0d4cbb722ee5e368f1a001e898b43e. - Revert "fix static import" This reverts commit 27da71508aec9a4bec1c0ad07031887286580171. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15452/files - new: https://git.openjdk.org/jdk/pull/15452/files/27da7150..44af07b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15452&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15452&range=02-03 Stats: 1102 lines in 462 files changed: 11 ins; 22 del; 1069 mod Patch: https://git.openjdk.org/jdk/pull/15452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15452/head:pull/15452 PR: https://git.openjdk.org/jdk/pull/15452 From mdoerr at openjdk.org Tue Oct 17 14:03:06 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 17 Oct 2023 14:03:06 GMT Subject: RFR: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes [v4] In-Reply-To: References: Message-ID: On Sat, 14 Oct 2023 09:46:35 GMT, Martin Doerr wrote: >> Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move constants to globalDefinitions.hpp. Thanks for the reviews and the suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16165#issuecomment-1766461924 From mdoerr at openjdk.org Tue Oct 17 14:03:10 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 17 Oct 2023 14:03:10 GMT Subject: Integrated: 8318015: Lock inflation not needed for OSR or Deopt for new locking modes In-Reply-To: References: Message-ID: On Thu, 12 Oct 2023 14:03:20 GMT, Martin Doerr wrote: > Only LockingMode "LM_LEGACY" requires inflation before lock transfers because it is the only one which uses stack addresses in the mark word. I think we should treat the displaced header as stale data because it may be uninitialized. This pull request has now been integrated. Changeset: d0ea2a51 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/d0ea2a51111bd5de5a6465e7de6a4950aae89c71 Stats: 40 lines in 3 files changed: 16 ins; 0 del; 24 mod 8318015: Lock inflation not needed for OSR or Deopt for new locking modes Reviewed-by: pchilanomate, dlong ------------- PR: https://git.openjdk.org/jdk/pull/16165 From shade at openjdk.org Tue Oct 17 14:10:27 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 17 Oct 2023 14:10:27 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v11] In-Reply-To: References: Message-ID: > Work in progress, submitting for broader attention. > > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: - Merge branch 'master' into JDK-8316180-backoff-secondary-super - Touchup benchmark metadata - S390 implementation - Merge branch 'master' into JDK-8316180-backoff-secondary-super - Correct type for flag - Option is diagnostic, platform-dependent - Merge branch 'master' into JDK-8316180-backoff-secondary-super - Init with backoff right away - x86 cleanup - Denser AArch64 - ... and 12 more: https://git.openjdk.org/jdk/compare/f0023938...8dd00325 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/48c67465..8dd00325 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=09-10 Stats: 12197 lines in 507 files changed: 7598 ins; 2321 del; 2278 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From stuefe at openjdk.org Tue Oct 17 14:10:29 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Oct 2023 14:10:29 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v10] In-Reply-To: <7DenZ_i83kiIQIVJWaYCsE4kHcGa93PcNXW4VgGbaUw=.a84889a5-77ed-475c-a072-4da395fe4cd0@github.com> References: <7DenZ_i83kiIQIVJWaYCsE4kHcGa93PcNXW4VgGbaUw=.a84889a5-77ed-475c-a072-4da395fe4cd0@github.com> Message-ID: On Tue, 17 Oct 2023 09:37:00 GMT, Aleksey Shipilev wrote: > @RealFYang, do you want to do the RISC-V version, or should I take a stab at it? @tstuefe, do you want to do the ARM version, or should I take a stab at it? Want yes, time absolutely no :( ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1766453560 From fyang at openjdk.org Tue Oct 17 14:10:30 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 17 Oct 2023 14:10:30 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v10] In-Reply-To: References: <7DenZ_i83kiIQIVJWaYCsE4kHcGa93PcNXW4VgGbaUw=.a84889a5-77ed-475c-a072-4da395fe4cd0@github.com> Message-ID: <15CMQumaDu1XNzdF--inIdmU65bIVFfMI2xcnTO8Tbc=.9f90601f-c81d-43dc-bc46-e991d5a78326@github.com> On Tue, 17 Oct 2023 13:45:39 GMT, Thomas Stuefe wrote: > @RealFYang, do you want to do the RISC-V version, or should I take a stab at it? @tstuefe, do you want to do the ARM version, or should I take a stab at it? Sorry I missed this one. Yes, I think I can give it a try on linux-riscv platform :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1766469591 From shade at openjdk.org Tue Oct 17 14:17:56 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 17 Oct 2023 14:17:56 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v10] In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 03:43:26 GMT, nahidasu wrote: > ?We see that it takes very little contention (5%) for the default behavior to perform poorly, and in the uncontended case there is no downside for using a large Backoff value. So backoff values of 1,000 or even 10,000 seem reasonable. Thank you! Looking at these results and the porters benchmarks in this PR, I am leaning to go with 1000 as the default then. My reasoning comes from two things. First, too high backoff might affect performance in transitional states (e.g. warmup) when we want to actually update the cache to the first value. Let's ballpark the regression opportunity for a realistic-on-side-of-terrible scenario. Assume single-threaded workload, so no secondary super contention, 10K classes that need updates for secondary super, huge interface hierarchy to search through. Let's pessimistically ballpark the secondary supers scan at 100ns. This means that before all classes would populate their secondary super cache, we incur the 10^4 (classes) x 10^3 (threshold) x 10^2 (ns) = 10^9 ns = 1s CPU time of the additional performance cost. I think that is a fair cost for our deliberately bad scenario, and it is comparable with the time it takes to load, initialize and compile all those classes. Second, the effects we see in targeted benchmarks suggest we have diminishing returns past 1000. Sometimes going from 1000 to 10000 improves the performance 1.5x..2x, but it is largely reasoning about the improvement of 20x or 40x in the targeted test that does nothing else but contending. I suspect the realistic workload would break even at much lower threshold, and the total difference for this improvement would likely to be eaten by Ahmdal's Law. If this guess is wrong, we can reconsider the default in the future, based on real-world experience we get with this patch. Any thoughts? Maybe @franz1981 can test different backoff levels with this patch with Quarkus/some-other benchmarks, assuming there is still a version that is clearly affected by this issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1766510970 From shade at openjdk.org Tue Oct 17 14:17:57 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 17 Oct 2023 14:17:57 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v10] In-Reply-To: <15CMQumaDu1XNzdF--inIdmU65bIVFfMI2xcnTO8Tbc=.9f90601f-c81d-43dc-bc46-e991d5a78326@github.com> References: <7DenZ_i83kiIQIVJWaYCsE4kHcGa93PcNXW4VgGbaUw=.a84889a5-77ed-475c-a072-4da395fe4cd0@github.com> <15CMQumaDu1XNzdF--inIdmU65bIVFfMI2xcnTO8Tbc=.9f90601f-c81d-43dc-bc46-e991d5a78326@github.com> Message-ID: On Tue, 17 Oct 2023 13:54:08 GMT, Fei Yang wrote: >>> @RealFYang, do you want to do the RISC-V version, or should I take a stab at it? @tstuefe, do you want to do the ARM version, or should I take a stab at it? >> >> Want yes, time absolutely no :( > >> @RealFYang, do you want to do the RISC-V version, or should I take a stab at it? @tstuefe, do you want to do the ARM version, or should I take a stab at it? > > Sorry I missed this one. Yes, I think I can give it a try on linux-riscv platform :-) > > @RealFYang, do you want to do the RISC-V version, or should I take a stab at it? @tstuefe, do you want to do the ARM version, or should I take a stab at it? > > Want yes, time absolutely no :( No problem, I'll try to find some time to implement this for ARM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1766511714 From adinn at openjdk.org Tue Oct 17 15:37:22 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Oct 2023 15:37:22 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 19:49:23 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3308: > 3306: > 3307: // Load TOS state for later > 3308: __ load_unsigned_byte(rscratch2, Address(cache, in_bytes(ResolvedMethodEntry::type_offset()))); In the old code this came after the `if (load_receiver)` block just before `rscratch2` was used to compute the return address. This was for a good reason. At the end of the `if` block there is a call to `verify_oop` which uses (overwrites) `rscratch2` when VerifyOops is set. So, relocating this load of the TOS state will fail to compute the correct return address in that case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362334861 From adinn at openjdk.org Tue Oct 17 15:39:56 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Oct 2023 15:39:56 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 15:34:30 GMT, Andrew Dinn wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3308: > >> 3306: >> 3307: // Load TOS state for later >> 3308: __ load_unsigned_byte(rscratch2, Address(cache, in_bytes(ResolvedMethodEntry::type_offset()))); > > In the old code this came after the `if (load_receiver)` block just before `rscratch2` was used to compute the return type. This was for a good reason. At the end of the `if` block there is a call to `verify_oop` which uses (overwrites) `rscratch2` when VerifyOops is set. So, relocating this load of the TOS state will fail to compute the correct return type in that case. Oops, ignore that! `verify_oops` pushes rscratch2 on the stack before using it and restores it afterwards. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362341037 From kbarrett at openjdk.org Tue Oct 17 15:49:15 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 17 Oct 2023 15:49:15 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v8] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 09:07:26 GMT, Kim Barrett wrote: >> Please review this new facility, providing a general mechanism for intrusive >> doubly-linked lists. A class supports inclusion in a list by having an >> IntrusiveListEntry member, and providing structured information about how to >> access that member. A class supports inclusion in multiple lists by having >> multiple IntrusiveListEntry members, with different lists specified to use >> different members. >> >> The IntrusiveList class template provides the list management. It is modelled >> on bidirectional containers such as std::list and boost::intrusive::list, >> providing many of the expected member types and functions. (Note that the >> member types use the Standard's naming conventions.) (Not all standard >> container requirements are met; some operations are not presently supported >> because they haven't been needed yet.) This includes iteration support using >> (mostly) standard-conforming iterator types (they are presently missing >> iterator_category member types, pending being able to include so we >> can use std::bidirectional_iterator_tag). >> >> This change only provides the new facility, and doesn't include any uses of >> it. It is intended to replace the 4-5 (or maybe more) competing intrusive >> doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of >> those alterantives, this proposal provides a suite of unit tests. >> >> An example of a place that I think might benefit from this is G1's region >> handling. There are various places where G1 iterates over all regions in order >> to do something with those which satisfy some property (humongous regions, >> regions in the collection set, &etc). If it were trivial to create new region >> sublists (and this facility makes that easy), some of these could be turned >> into direct iteration over only the regions of interest. >> >> Some specific points to consider when reviewing this proposal: >> >> (1) This proposal follows Standard Library API conventions, which differ from >> HotSpot in various ways. >> >> (1a) Lists and iterators provide various type members, with names per the >> Standard Library. There has been discussion of using some parts of the >> Standard Library eventually, in which case this would be important. But for >> now some of the naming choices are atypical for HotSpot. >> >> (1b) Some of the function signatures follow the Standard Library APIs even >> though the reasons for that form might not apply to HotSpot. For example, the >> list pop operations don't return the removed... > > Kim Barrett has updated the pull request incrementally with four additional commits since the last revision: > > - remove unnecessary code markers in comments > - use override for virtual test support functions > - remove support for allocation base classes > - remove IntrusiveList::operator[] src/hotspot/share/utilities/intrusiveList.hpp line 1334: > 1332: * transferred). > 1333: */ > 1334: template::value)> Why is this not using `can_splice()`? In some earlier development version doing something like that produced much worse compiler error messages than doing the more limited test and leaving the compatibility checking to the inner call to `splice` with range arguments. But I think that shouldn't be true now, and this should be changed to use `can_splice`. But look at the compiler error messages before making that change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1362355950 From adinn at openjdk.org Tue Oct 17 15:54:17 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Oct 2023 15:54:17 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: <7KbiijMWtxX8yTgUu-O-7YaSVBNhE2zD0AnyL5XY1q8=.98deea47-8744-4180-b983-e8415bfb0c83@github.com> On Mon, 28 Aug 2023 19:49:23 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3313: > 3311: if (load_receiver) { > 3312: __ load_unsigned_short(recv, Address(cache, in_bytes(ResolvedMethodEntry::num_parameters_offset()))); > 3313: // FIXME -- is this actually correct? looks like it should be 2 This 'FIXME' comment (all 5 lines) does not need to be kept. Likewise the trailing comment that says 'FIXME: uxtb here?' ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362363596 From adinn at openjdk.org Tue Oct 17 16:01:51 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Oct 2023 16:01:51 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 19:49:23 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3305: > 3303: > 3304: // save 'interpreter return address' > 3305: __ save_bcp(); // probably don't need this I'm not sure why you added this comment but we do need this call to `save_bcp()`. It records in the interpreter frame the 'return address' needed for re-entry into the interpreter. We cannot rely on the value cached in register rbcp remaining constant across the invoke that is about to happen. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362376633 From adinn at openjdk.org Tue Oct 17 16:10:11 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Oct 2023 16:10:11 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 19:49:23 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 src/hotspot/cpu/x86/templateTable_x86.cpp line 2838: > 2836: } > 2837: > 2838: void TemplateTable::load_resolved_method_entry_common(Register cache, Same comment applies regarding the `common` suffix as for aarch64 code. src/hotspot/cpu/x86/templateTable_x86.cpp line 2896: > 2894: __ load_unsigned_byte(flags, Address(cache, in_bytes(ResolvedMethodEntry::flags_offset()))); > 2895: > 2896: // table_or_ref_index can either be an itable index or a resolved reference index depending on the bytecode same comment applies as for aarch64 code ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362386876 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362388291 From adinn at openjdk.org Tue Oct 17 16:20:08 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Oct 2023 16:20:08 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 19:49:23 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 src/hotspot/cpu/x86/templateTable_x86.cpp line 4018: > 4016: load_invokedynamic_entry(rbx_method); > 4017: // rax: CallSite object (from cpool->resolved_references[f1]) > 4018: // rbx: MH.linkToCallSite method I'm not sure why you edited these comments in the x86 code but not in the aarch64 code. To avoid issues for maintainers it is better to make the changes in both rather than just one. src/hotspot/cpu/x86/templateTable_x86.cpp line 4020: > 4018: // rbx: MH.linkToCallSite method > 4019: > 4020: // Note: rax_callsite is already pushed As per previous change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362402014 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362402370 From adinn at openjdk.org Tue Oct 17 16:44:10 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Oct 2023 16:44:10 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 19:49:23 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 src/hotspot/share/ci/ciReplay.cpp line 436: > 434: #endif > 435: ResolvedMethodEntry* method_entry = cp->cache()->resolved_method_entry_at(index); > 436: cp->cache()->set_method_handle(index, callInfo); It looks a bit odd that you obtain a pointer to the `ResolvedMethodEntry` at `index` by calling `resolved_method_entry_at` and then pass `index` back into `set_method_handle` in order to update it with the call info. Obviously this is done so that the set routine can handle data races. Would it not be better to modify `set_method_handle` so that it handled the race and also returned the `ResolvedMethodEntry` at `index`. Likewise you pass `index` back in the call to `appendix_if_resolved` below. Would it not be better to have this method accept a `ResolvedMethodEntry` pointer? Likewise ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362455455 From matsaave at openjdk.org Tue Oct 17 16:49:43 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 17 Oct 2023 16:49:43 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 10:21:40 GMT, Andrew Dinn wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2261: > >> 2259: // volatile-loads. >> 2260: >> 2261: void TemplateTable::resolve_cache_and_index_for_field(int byte_no, > > This change looks more confusing than it actually needs because you deleted the definition for `resolve_cache_and_index` that preceded `resolve_cache_and_index_for_field` and then added `resolve_cache_and_index_for_method` after `resolve_cache_and_index_for_field`. As a result we see changes made to `resolve_cache_and_index` that re-introduce changes already made for `resolve_cache_and_index_for_field` followed by changes to `resolve_cache_and_index_for_field` that rework it to remove field-specific operations and replace them with method-specific operations. > > That's going to make it a lot harder for maintainers and back-porters to work out what this change is really doing. If instead you place the definition for `resolve_cache_and_index_for_method` before the definition for `resolve_cache_and_index_for_field` then the diff will show `resolve_cache_and_index` being repurposed to cater for methods and the definition of `resolve_cache_and_index_for_field` should remain unchanged. Great point! I didn't notice how messy the diff ended up looking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362462387 From matsaave at openjdk.org Tue Oct 17 16:52:57 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 17 Oct 2023 16:52:57 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 11:07:35 GMT, Andrew Dinn wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2369: > >> 2367: } >> 2368: >> 2369: void TemplateTable::load_resolved_method_entry_common(Register cache, > > When I saw the `common` suffix I thought at first that this might be used in addition to the variants with the other suffixes rather than as an alternative. Since this is 'common' to only to 2 out of the 5 different cases perhaps it could be named `load_resolved_method_entry_special_or_static`. That would make it clear that each invoke type has a corresponding `load_resolved_method_entry_` variant. I struggled with naming this method which is why I settled on `common` but you are correct that it gives the wrong impression. However, I think `load_resolved_method_entry_special_or_static` is too verbose (44 characters!). Do you think there is a better name or is our best option to use this longer one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362465873 From matsaave at openjdk.org Tue Oct 17 16:59:05 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 17 Oct 2023 16:59:05 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 11:22:55 GMT, Andrew Dinn wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2429: > >> 2427: __ load_unsigned_byte(flags, Address(cache, in_bytes(ResolvedMethodEntry::flags_offset()))); >> 2428: >> 2429: // table_or_ref_index can either be an itable index or a resolved reference index depending on the bytecode > > This comment seems to apply to `ref_index` in `load_resolved_method_entry_handle`? Does it need to move up there? Do we need a comment here stating something similar for `method_or_table_index`? (probably not). This looks like a leftover from a previous implementation. I don't believe this is true for either invokeinterface or invokehandle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362472884 From matsaave at openjdk.org Tue Oct 17 17:18:55 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 17 Oct 2023 17:18:55 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 16:41:41 GMT, Andrew Dinn wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > src/hotspot/share/ci/ciReplay.cpp line 436: > >> 434: #endif >> 435: ResolvedMethodEntry* method_entry = cp->cache()->resolved_method_entry_at(index); >> 436: cp->cache()->set_method_handle(index, callInfo); > > It looks a bit odd that you obtain a pointer to the `ResolvedMethodEntry` at `index` by calling `resolved_method_entry_at` and then pass `index` back into `set_method_handle` in order to update it with the call info. Obviously this is done so that the set routine can handle data races. Would it not be better to modify `set_method_handle` so that it handled the race and also returned the `ResolvedMethodEntry` at `index`. > > Likewise you pass `index` back in the call to `appendix_if_resolved` below. Would it not be better to have this method accept a `ResolvedMethodEntry` pointer? > Likewise Right, currently these calls seem redundant since it reads the resolved method entry twice. If I understand correctly, you are suggesting that the methods look something like this? `ResolvedMethodEntry* set_method_handle(int index, const CallInfo &call_info)` `oop appendix_if_resolved(ResolvedMethodEntry* method_entry)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362501180 From matsaave at openjdk.org Tue Oct 17 17:45:41 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 17 Oct 2023 17:45:41 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Removed some comments and relocated code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15455/files - new: https://git.openjdk.org/jdk/pull/15455/files/9f73d7e1..9ce5f591 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=00-01 Stats: 118 lines in 5 files changed: 37 ins; 53 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From sviswanathan at openjdk.org Tue Oct 17 18:05:13 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 17 Oct 2023 18:05:13 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v8] In-Reply-To: References: Message-ID: <_lBt1OmMnO2QRMX00Pto3q5Vre9A30WAjbBqzB51Dv0=.ebd6b351-78ae-440b-a2b2-92362cfadfe7@github.com> On Wed, 11 Oct 2023 22:05:08 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments, removed unused labels @TobiHartmann Please advice if we could integrate this or if you would like to run it through your testing first. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15410#issuecomment-1766906345 From kbarrett at openjdk.org Tue Oct 17 20:19:49 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 17 Oct 2023 20:19:49 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v8] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 09:07:26 GMT, Kim Barrett wrote: >> Please review this new facility, providing a general mechanism for intrusive >> doubly-linked lists. A class supports inclusion in a list by having an >> IntrusiveListEntry member, and providing structured information about how to >> access that member. A class supports inclusion in multiple lists by having >> multiple IntrusiveListEntry members, with different lists specified to use >> different members. >> >> The IntrusiveList class template provides the list management. It is modelled >> on bidirectional containers such as std::list and boost::intrusive::list, >> providing many of the expected member types and functions. (Note that the >> member types use the Standard's naming conventions.) (Not all standard >> container requirements are met; some operations are not presently supported >> because they haven't been needed yet.) This includes iteration support using >> (mostly) standard-conforming iterator types (they are presently missing >> iterator_category member types, pending being able to include so we >> can use std::bidirectional_iterator_tag). >> >> This change only provides the new facility, and doesn't include any uses of >> it. It is intended to replace the 4-5 (or maybe more) competing intrusive >> doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of >> those alterantives, this proposal provides a suite of unit tests. >> >> An example of a place that I think might benefit from this is G1's region >> handling. There are various places where G1 iterates over all regions in order >> to do something with those which satisfy some property (humongous regions, >> regions in the collection set, &etc). If it were trivial to create new region >> sublists (and this facility makes that easy), some of these could be turned >> into direct iteration over only the regions of interest. >> >> Some specific points to consider when reviewing this proposal: >> >> (1) This proposal follows Standard Library API conventions, which differ from >> HotSpot in various ways. >> >> (1a) Lists and iterators provide various type members, with names per the >> Standard Library. There has been discussion of using some parts of the >> Standard Library eventually, in which case this would be important. But for >> now some of the naming choices are atypical for HotSpot. >> >> (1b) Some of the function signatures follow the Standard Library APIs even >> though the reasons for that form might not apply to HotSpot. For example, the >> list pop operations don't return the removed... > > Kim Barrett has updated the pull request incrementally with four additional commits since the last revision: > > - remove unnecessary code markers in comments > - use override for virtual test support functions > - remove support for allocation base classes > - remove IntrusiveList::operator[] src/hotspot/share/utilities/intrusiveList.hpp line 812: > 810: static constexpr bool can_splice_from() { > 811: return Conjunction, > 812: std::is_convertible>::value; Here and in can_swap we're using `Conjunction` rather than simple `&&` expressions of the values. The intent is to delay the iterator check until we've verified `Other` is actually a List type. But that isn't working. Instead, if `Other` isn't a List then we get a syntax error on `Other::iterator`, which is not at all helpful as error reporting goes. To fix this we need to add a level of indirection so the code containing `Other::iterator` doesn't get instantiated until the IsListType check has passed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1362711685 From dholmes at openjdk.org Wed Oct 18 00:01:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 18 Oct 2023 00:01:00 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v10] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <-uTYDm4GQeD1SeD-qNyK_s0jrNC5Ft9tv3ZdPcAz4ts=.1b1077c4-4f69-41ee-bf2f-de3364d8770e@github.com> Message-ID: On Tue, 17 Oct 2023 10:56:33 GMT, Andrew Haley wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5169: >> >>> 5167: // Perform a little arithmetic to make sure that denormal >>> 5168: // numbers are handled correctly, i.e. that the "Denormals Are >>> 5169: // Zeros" flag has not been set. >> >> I don't understand what this part is doing. I thought it was simply checking so you could log/warn if the unexpected mode was detected. But it seems to cause MXCSR to not be restored when there is an issue, where I would expect you would always want to restore to overwrite the invalid mode the JNI call made. ?? > > If we reach `FAIL`, MXCSR is reloaded from `addr_mxcsr_std()`, restoring correct IEEE behaviour. That's what RestoreMXCSROnJNICalls is supposed to do, as far as I can tell. > > But I will take this part out, because there are other flags in MXCSR, so there is a potential compatibility problem if e.g. the Precision Mask were set in a JNI call. Misread the code flow. So you were skipping the restore if you didn't think it was needed? The fact the restore was now conditional is what threw me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1362939998 From dholmes at openjdk.org Wed Oct 18 00:03:58 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 18 Oct 2023 00:03:58 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Tue, 17 Oct 2023 11:43:59 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: > > - Review feedback > - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 > - Remove change to RestoreMXCSROnJNICalls Meta-question and apologies if this was covered before, but why is this logic being added to stubRoutines.cpp? ------------- PR Review: https://git.openjdk.org/jdk/pull/10661#pullrequestreview-1683665056 From coleenp at openjdk.org Wed Oct 18 00:32:54 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 18 Oct 2023 00:32:54 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 20:22:25 GMT, Coleen Phillimore wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed some comments and relocated code > > src/hotspot/share/interpreter/rewriter.hpp line 45: > >> 43: GrowableArray _cp_map; >> 44: GrowableArray _cpi_to_method_index_map; >> 45: GrowableArray _cp_cache_map; // for Methodref, Fieldref, > > I think this field and a couple other functions in rewriter.hpp still need to be deleted. And some _first_iteration_cp_cache_limit (?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362725003 From coleenp at openjdk.org Wed Oct 18 00:32:51 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 18 Oct 2023 00:32:51 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 17:45:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Removed some comments and relocated code I'm about halfway through. src/hotspot/cpu/x86/interp_masm_x86.hpp line 311: > 309: void load_resolved_indy_entry(Register cache, Register index); > 310: void load_field_entry(Register cache, Register index, int bcp_offset = 1); > 311: void load_method_entry(Register cache, Register index, int bcp_offset = 1); As a future RFE, maybe we could put some of these common functions in src/share/interpreter/interp_masm.hpp. I think you need to remove the declarations for the 4 functions you removed from this and the aarch64 version. src/hotspot/share/interpreter/interpreterRuntime.cpp line 867: > 865: > 866: // check if link resolution caused cpCache to be updated > 867: if (pool->cache()->resolved_method_entry_at(method_index)->is_resolved(bytecode)) return; Maybe you could have a local variable to save some characters. ConstantPoolCache* cache = pool->cache(); src/hotspot/share/interpreter/rewriter.cpp line 228: > 226: if (!reverse) { > 227: int cp_index = Bytes::get_Java_u2(p); > 228: _initialized_method_entries.push(ResolvedMethodEntry((u2)cp_index)); You only need to do this if the constant pool is a JVM_CONSTANT_InterfaceMethodref. It looks like you're doing this unconditionally. src/hotspot/share/interpreter/rewriter.hpp line 45: > 43: GrowableArray _cp_map; > 44: GrowableArray _cpi_to_method_index_map; > 45: GrowableArray _cp_cache_map; // for Methodref, Fieldref, I think this field and a couple other functions in rewriter.hpp still need to be deleted. src/hotspot/share/interpreter/templateTable.hpp line 296: > 294: Register method_or_table_index, > 295: Register flags); > 296: static void load_invoke_cp_cache_entry(int byte_no, I think there are some functions that need to be removed from TemplateTable now. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15455#pullrequestreview-1683218490 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362664153 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362707802 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362711379 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362722657 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1362669795 From kbarrett at openjdk.org Wed Oct 18 01:06:23 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 18 Oct 2023 01:06:23 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v9] In-Reply-To: References: Message-ID: > Please review this new facility, providing a general mechanism for intrusive > doubly-linked lists. A class supports inclusion in a list by having an > IntrusiveListEntry member, and providing structured information about how to > access that member. A class supports inclusion in multiple lists by having > multiple IntrusiveListEntry members, with different lists specified to use > different members. > > The IntrusiveList class template provides the list management. It is modelled > on bidirectional containers such as std::list and boost::intrusive::list, > providing many of the expected member types and functions. (Note that the > member types use the Standard's naming conventions.) (Not all standard > container requirements are met; some operations are not presently supported > because they haven't been needed yet.) This includes iteration support using > (mostly) standard-conforming iterator types (they are presently missing > iterator_category member types, pending being able to include so we > can use std::bidirectional_iterator_tag). > > This change only provides the new facility, and doesn't include any uses of > it. It is intended to replace the 4-5 (or maybe more) competing intrusive > doubly-linked lists presently in HotSpot. Unlike most (or perhaps all?) of > those alterantives, this proposal provides a suite of unit tests. > > An example of a place that I think might benefit from this is G1's region > handling. There are various places where G1 iterates over all regions in order > to do something with those which satisfy some property (humongous regions, > regions in the collection set, &etc). If it were trivial to create new region > sublists (and this facility makes that easy), some of these could be turned > into direct iteration over only the regions of interest. > > Some specific points to consider when reviewing this proposal: > > (1) This proposal follows Standard Library API conventions, which differ from > HotSpot in various ways. > > (1a) Lists and iterators provide various type members, with names per the > Standard Library. There has been discussion of using some parts of the > Standard Library eventually, in which case this would be important. But for > now some of the naming choices are atypical for HotSpot. > > (1b) Some of the function signatures follow the Standard Library APIs even > though the reasons for that form might not apply to HotSpot. For example, the > list pop operations don't return the removed value. For node-based containers > in Standard Library that would introduce exception... Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: improve compiler errors for improper splice/swap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15896/files - new: https://git.openjdk.org/jdk/pull/15896/files/a6f3a8c8..b93431e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15896&range=07-08 Stats: 41 lines in 2 files changed: 37 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15896/head:pull/15896 PR: https://git.openjdk.org/jdk/pull/15896 From kbarrett at openjdk.org Wed Oct 18 01:10:51 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 18 Oct 2023 01:10:51 GMT Subject: RFR: 8189088: Add intrusive doubly-linked list utility [v8] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 20:16:41 GMT, Kim Barrett wrote: >> Kim Barrett has updated the pull request incrementally with four additional commits since the last revision: >> >> - remove unnecessary code markers in comments >> - use override for virtual test support functions >> - remove support for allocation base classes >> - remove IntrusiveList::operator[] > > src/hotspot/share/utilities/intrusiveList.hpp line 812: > >> 810: static constexpr bool can_splice_from() { >> 811: return Conjunction, >> 812: std::is_convertible>::value; > > Here and in can_swap we're using `Conjunction` rather than simple `&&` > expressions of the values. The intent is to delay the iterator check until > we've verified `Other` is actually a List type. But that isn't working. > Instead, if `Other` isn't a List then we get a syntax error on > `Other::iterator`, which is not at all helpful as error reporting goes. To fix > this we need to add a level of indirection so the code containing > `Other::iterator` doesn't get instantiated until the IsListType check has > passed. Fixed. > src/hotspot/share/utilities/intrusiveList.hpp line 1334: > >> 1332: * transferred). >> 1333: */ >> 1334: template::value)> > > Why is this not using `can_splice()`? In some earlier development version doing something like that > produced much worse compiler error messages than doing the more limited test and leaving the compatibility > checking to the inner call to `splice` with range arguments. But I think that shouldn't be true now, and this > should be changed to use `can_splice`. But look at the compiler error messages before making that change. Fixed. Changing it even made the error message better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1362973509 PR Review Comment: https://git.openjdk.org/jdk/pull/15896#discussion_r1362973477 From dholmes at openjdk.org Wed Oct 18 02:03:42 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 18 Oct 2023 02:03:42 GMT Subject: RFR: 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo In-Reply-To: References: Message-ID: On Sat, 7 Oct 2023 17:44:06 GMT, Jan Kratochvil wrote: > In OpenJDK project CRaC I had a [need to fetch new CpuidInfo without affecting the existing one](https://github.com/openjdk/crac/commit/ed4ad9ba31b77732dcede2eb743b2f389ec9a0fe#diff-6ed856c57ddbe33e49883adb7c52ec51ed377e5f697dfd6d8bea505a97bfc5a5R2743). > Which led me to encapsulate the object more and I think this no-functionality-change patch is even appropriate for JDK. > I am sure interested primarily to reduce the CRaC patchset boilerplate. I don't hate this :) but I don't like seeing methods on structs. Should we make `CpuidInfo` a full-fledged class instead? ------------- PR Review: https://git.openjdk.org/jdk/pull/16093#pullrequestreview-1683769586 From jvernee at openjdk.org Wed Oct 18 04:21:33 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 04:21:33 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code Message-ID: Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. Components of this patch: - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. - The object/oop + offset is exposed as temporary address to native code. - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). - Only x64 and AArch64 for now. - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. Numbers for the included benchmark on my machine are: Benchmark (size) Mode Cnt Score Error Units CriticalCalls.callNotPinned 100 avgt 30 123.060 ? 5.674 ns/op CriticalCalls.callNotPinned 10000 avgt 30 3136.032 ? 46.175 ns/op CriticalCalls.callNotPinned 1000000 avgt 30 1190692.161 ? 36254.502 ns/op CriticalCalls.callPinned 100 avgt 30 30.722 ? 0.298 ns/op CriticalCalls.callPinned 10000 avgt 30 2233.453 ? 23.568 ns/op CriticalCalls.callPinned 1000000 avgt 30 220870.350 ? 1576.958 ns/op CriticalCalls.callRecycled 100 avgt 30 38.753 ? 0.269 ns/op CriticalCalls.callRecycled 10000 avgt 30 2683.381 ? 56.335 ns/op CriticalCalls.callRecycled 1000000 avgt 30 314389.106 ? 5275.236 ns/op In particular the difference between the `callNotPinned`, which allocates a native segment and copies the heap segment into it, and the `callPinned` which is zero allocation and zero copy, is important. While the allocation can sometimes be avoided (`callRecycled`), sometimes the API's structure prevents allocations from being amortized. Testing: `jdk_foreign` ------------- Commit messages: - eyeball more fixes - ref other platforms + add back shuffle reg - fix failing x86 test - fix arm stubs - fix x86_32 stubs - Share DowncallStubGenerator impl between x64 and aarch64 - remove GCLocker calls - fix zero compilation + disable stress test on non-debug because of missing CheckUnhandledOops flag - fix zero for real - fix zero + clang build - ... and 18 more: https://git.openjdk.org/jdk/compare/1d54e73f...90fdbec0 Changes: https://git.openjdk.org/jdk/pull/16201/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8254693 Stats: 1969 lines in 60 files changed: 1169 ins; 545 del; 255 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From jvernee at openjdk.org Wed Oct 18 04:21:35 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 04:21:35 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code In-Reply-To: References: Message-ID: <7fdZdmxHUrXdcurmlKY1-gwfWNEPLcU9Pijk_DC1ntY=.2deb3ecd-52e1-416a-ae33-7f0bb1e8e0e8@github.com> On Mon, 16 Oct 2023 10:19:17 GMT, Jorn Vernee wrote: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... @TheRealMDoerr @feilongjiang @sid8606 I've kept things working as best as possible on PPC, RISC-V, and s390x. But, more changes are needed to actually implement the new feature on those platforms. (See the changes to `downcallLinker_aarch64.cpp` and `CallArranger`). test/jdk/java/foreign/critical/TestCritical.java line 93: > 91: > 92: @Test(dataProvider = "allowHeapCases") > 93: public void testAllowHeap(AllowHeapCase testCase) throws Throwable { Note that this is the only new test. The diff looks bigger because I've renamed the enclosing directory. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1767610566 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1360478619 From jvernee at openjdk.org Wed Oct 18 04:44:26 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 04:44:26 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: drop unused in_reg_spiller ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16201/files - new: https://git.openjdk.org/jdk/pull/16201/files/90fdbec0..2c073bf1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From dholmes at openjdk.org Wed Oct 18 05:35:46 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 18 Oct 2023 05:35:46 GMT Subject: RFR: JDK-8313764: Offer JVM HS functionality to shared lib load operations done by the JDK codebase [v2] In-Reply-To: References: Message-ID: <-mnRIhYxnLvgUVnwxr_thupKgzfav_NdaT79AS1PO7M=.7cb2174c-9808-42ea-beb7-312ebab3e683@github.com> On Mon, 16 Oct 2023 15:04:51 GMT, Matthias Baesken wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> windows aarch64 build issues > > Hello, any comments about the idea of calling into 'os::dll_load' instead ? That would indeed make the coding smaller and less 'messy' . @MBaesken I'm not at all sure what it would look like - sorry. But there doesn't seem to be any general support from the library folk for this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15264#issuecomment-1767673904 From xgong at openjdk.org Wed Oct 18 06:19:23 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 18 Oct 2023 06:19:23 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Message-ID: Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. [1] https://github.com/openjdk/jdk/pull/3638 [2] https://sleef.org/ [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ [4] https://packages.debian.org/bookworm/libsleef3 [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html ------------- Commit messages: - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Changes: https://git.openjdk.org/jdk/pull/16234/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16234&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312425 Stats: 161 lines in 12 files changed: 118 ins; 1 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/16234.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16234/head:pull/16234 PR: https://git.openjdk.org/jdk/pull/16234 From xgong at openjdk.org Wed Oct 18 06:19:23 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 18 Oct 2023 06:19:23 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 06:12:29 GMT, Xiaohong Gong wrote: > Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). > > SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. > > To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. > > Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. > > [1] https://github.com/openjdk/jdk/pull/3638 > [2] https://sleef.org/ > [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ > [4] https://packages.debian.org/bookworm/libsleef3 > [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html Here is the performance improvement for JMH benchmarks [1] [2] after enabling libsleef for AArch64 NEON and SVE: NEON: Benchmark (size) Mode Cnt Gain DoubleMaxVector.ACOS 1024 thrpt 5 1.775 DoubleMaxVector.ASIN 1024 thrpt 5 2.134 DoubleMaxVector.ATAN 1024 thrpt 5 2.376 DoubleMaxVector.ATAN2 1024 thrpt 5 2.799 DoubleMaxVector.CBRT 1024 thrpt 5 1.588 DoubleMaxVector.COS 1024 thrpt 5 1.751 DoubleMaxVector.COSH 1024 thrpt 5 1.756 DoubleMaxVector.EXP 1024 thrpt 5 8.257 DoubleMaxVector.EXPM1 1024 thrpt 5 2.028 DoubleMaxVector.HYPOT 1024 thrpt 5 2.132 DoubleMaxVector.LOG 1024 thrpt 5 4.017 DoubleMaxVector.LOG10 1024 thrpt 5 5.693 DoubleMaxVector.LOG1P 1024 thrpt 5 2.788 DoubleMaxVector.POW 1024 thrpt 5 3.494 DoubleMaxVector.SIN 1024 thrpt 5 2.010 DoubleMaxVector.SINH 1024 thrpt 5 1.697 DoubleMaxVector.TAN 1024 thrpt 5 3.448 DoubleMaxVector.TANH 1024 thrpt 5 0.984 FloatMaxVector.ACOS 1024 thrpt 5 2.310 FloatMaxVector.ASIN 1024 thrpt 5 2.887 FloatMaxVector.ATAN 1024 thrpt 5 3.076 FloatMaxVector.ATAN2 1024 thrpt 5 4.162 FloatMaxVector.CBRT 1024 thrpt 5 2.941 FloatMaxVector.COS 1024 thrpt 5 1.832 FloatMaxVector.COSH 1024 thrpt 5 2.681 FloatMaxVector.EXP 1024 thrpt 5 15.758 FloatMaxVector.EXPM1 1024 thrpt 5 3.061 FloatMaxVector.HYPOT 1024 thrpt 5 3.428 FloatMaxVector.LOG 1024 thrpt 5 12.364 FloatMaxVector.LOG10 1024 thrpt 5 11.267 FloatMaxVector.LOG1P 1024 thrpt 5 5.819 FloatMaxVector.POW 1024 thrpt 5 6.710 FloatMaxVector.SIN 1024 thrpt 5 1.906 FloatMaxVector.SINH 1024 thrpt 5 2.505 FloatMaxVector.TAN 1024 thrpt 5 4.975 FloatMaxVector.TANH 1024 thrpt 5 1.157 Float64Vector.ACOS 1024 thrpt 5 1.855 Float64Vector.ASIN 1024 thrpt 5 2.294 Float64Vector.ATAN 1024 thrpt 5 2.082 Float64Vector.ATAN2 1024 thrpt 5 2.849 Float64Vector.CBRT 1024 thrpt 5 1.781 Float64Vector.COS 1024 thrpt 5 1.224 Float64Vector.COSH 1024 thrpt 5 1.793 Float64Vector.EXP 1024 thrpt 5 9.000 Float64Vector.EXPM1 1024 thrpt 5 2.096 Float64Vector.HYPOT 1024 thrpt 5 2.589 Float64Vector.LOG 1024 thrpt 5 5.582 Float64Vector.LOG10 1024 thrpt 5 5.495 Float64Vector.LOG1P 1024 thrpt 5 3.594 Float64Vector.POW 1024 thrpt 5 3.254 Float64Vector.SIN 1024 thrpt 5 1.254 Float64Vector.SINH 1024 thrpt 5 1.719 Float64Vector.TAN 1024 thrpt 5 2.670 Float64Vector.TANH 1024 thrpt 5 1.020 SVE 512-bit vector size: Benchmark (size) Mode Cnt Gain DoubleMaxVector.ACOS 1024 thrpt 5 1.731 DoubleMaxVector.ASIN 1024 thrpt 5 2.046 DoubleMaxVector.ATAN 1024 thrpt 5 4.932 DoubleMaxVector.ATAN2 1024 thrpt 5 6.032 DoubleMaxVector.CBRT 1024 thrpt 5 6.883 DoubleMaxVector.COS 1024 thrpt 5 5.512 DoubleMaxVector.COSH 1024 thrpt 5 2.796 DoubleMaxVector.EXP 1024 thrpt 5 42.490 DoubleMaxVector.EXPM1 1024 thrpt 5 6.188 DoubleMaxVector.HYPOT 1024 thrpt 5 2.195 DoubleMaxVector.LOG 1024 thrpt 5 19.532 DoubleMaxVector.LOG10 1024 thrpt 5 19.229 DoubleMaxVector.LOG1P 1024 thrpt 5 10.477 DoubleMaxVector.POW 1024 thrpt 5 11.887 DoubleMaxVector.SIN 1024 thrpt 5 6.073 DoubleMaxVector.SINH 1024 thrpt 5 2.994 DoubleMaxVector.TAN 1024 thrpt 5 15.417 FloatMaxVector.ACOS 1024 thrpt 5 3.867 FloatMaxVector.ASIN 1024 thrpt 5 4.291 FloatMaxVector.ATAN 1024 thrpt 5 11.786 FloatMaxVector.ATAN2 1024 thrpt 5 14.734 FloatMaxVector.CBRT 1024 thrpt 5 11.622 FloatMaxVector.COS 1024 thrpt 5 6.477 FloatMaxVector.COSH 1024 thrpt 5 3.571 FloatMaxVector.EXP 1024 thrpt 5 53.020 FloatMaxVector.EXPM1 1024 thrpt 5 6.348 FloatMaxVector.HYPOT 1024 thrpt 5 4.722 FloatMaxVector.LOG 1024 thrpt 5 41.263 FloatMaxVector.LOG10 1024 thrpt 5 47.685 FloatMaxVector.LOG1P 1024 thrpt 5 22.481 FloatMaxVector.POW 1024 thrpt 5 24.896 FloatMaxVector.SIN 1024 thrpt 5 6.768 FloatMaxVector.SINH 1024 thrpt 5 3.429 [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L1068 [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L1068 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1767727028 From fyang at openjdk.org Wed Oct 18 07:14:51 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 18 Oct 2023 07:14:51 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v11] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 14:10:27 GMT, Aleksey Shipilev wrote: >> Work in progress, submitting for broader attention. >> >> See more details in the bug and related issues. >> >> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. >> >> This implements the mitigation for AArch64 and x86_64. More platforms can be implemented in this PR, or deferred to later PRs. Port maintainers, feel free to suggest the patches for your arches, I'll be happy to fold them in. >> >> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. >> >> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. >> >> Current PR deliberately sets backoff at `1000` to simplify testing. The actual value should be chosen by broader experiments. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: > > - Merge branch 'master' into JDK-8316180-backoff-secondary-super > - Touchup benchmark metadata > - S390 implementation > - Merge branch 'master' into JDK-8316180-backoff-secondary-super > - Correct type for flag > - Option is diagnostic, platform-dependent > - Merge branch 'master' into JDK-8316180-backoff-secondary-super > - Init with backoff right away > - x86 cleanup > - Denser AArch64 > - ... and 12 more: https://git.openjdk.org/jdk/compare/7fc0b6b7...8dd00325 RISC-V implementation: [15718-riscv.txt](https://github.com/openjdk/jdk/files/12995775/15718-riscv.txt) JMH data on hifive unmatched board for reference: -XX:SecondarySuperMissBackoff=0 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 511.595 ? 35.062 ns/op SecondarySuperCache.uncontended avgt 15 73.155 ? 8.891 ns/op -XX:SecondarySuperMissBackoff=10 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 200.010 ? 17.197 ns/op SecondarySuperCache.uncontended avgt 15 73.766 ? 9.232 ns/op -XX:SecondarySuperMissBackoff=100 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 167.544 ? 17.544 ns/op SecondarySuperCache.uncontended avgt 15 68.551 ? 5.460 ns/op -XX:SecondarySuperMissBackoff=1000 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 156.470 ? 21.725 ns/op SecondarySuperCache.uncontended avgt 15 72.963 ? 9.210 ns/o -XX:SecondarySuperMissBackoff=10000 Benchmark Mode Cnt Score Error Units SecondarySuperCache.contended avgt 15 162.363 ? 14.920 ns/op SecondarySuperCache.uncontended avgt 15 75.536 ? 11.916 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1767811133 From dholmes at openjdk.org Wed Oct 18 07:40:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 18 Oct 2023 07:40:00 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 06:12:29 GMT, Xiaohong Gong wrote: > Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). > > SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. > > To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. > > Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. > > [1] https://github.com/openjdk/jdk/pull/3638 > [2] https://sleef.org/ > [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ > [4] https://packages.debian.org/bookworm/libsleef3 > [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html Changes requested by dholmes (Reviewer). src/hotspot/cpu/aarch64/globals_aarch64.hpp line 131: > 129: "Branch Protection to use: none, standard, pac-ret") \ > 130: product(ccstr, UseSleefLib, "libsleef.so.3", EXPERIMENTAL, \ > 131: "Sleef library to use for the vector math operations") \ Experimental functionality like this should not be enabled by default as you are changing the behaviour for all users. This needs to be off by default with user's being able to opt-in if they want. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8555: > 8553: } else { > 8554: if (FLAG_IS_DEFAULT(UseSleefLib)) { > 8555: log_info(library)("Fail to load sleef library!"); The library name being looked up is probably useful here too. ------------- PR Review: https://git.openjdk.org/jdk/pull/16234#pullrequestreview-1684300759 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1363389306 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1363385262 From adinn at openjdk.org Wed Oct 18 07:40:04 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 18 Oct 2023 07:40:04 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 17:45:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Removed some comments and relocated code src/hotspot/share/interpreter/bytecodeTracer.cpp line 527: > 525: assert(is_linked(), "invokehandle is only in rewritten methods"); > 526: assert(cpcache_index >= 0, "must be"); > 527: print_field_or_method(cp_index, st); I don't understand this code very well but it looks like this change means `print_field_or_method` gets called twice when we have an `invokehandle` bytecode, passing `cp_index` both times. Is that intended? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1363389472 From shade at openjdk.org Wed Oct 18 08:00:52 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 18 Oct 2023 08:00:52 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v11] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 07:12:08 GMT, Fei Yang wrote: > RISC-V implementation: [15718-riscv.txt](https://github.com/openjdk/jdk/files/12995775/15718-riscv.txt) Thank you! Folded in, please check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1767890139 From adinn at openjdk.org Wed Oct 18 08:30:10 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 18 Oct 2023 08:30:10 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 17:45:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Removed some comments and relocated code src/hotspot/share/interpreter/linkResolver.cpp line 1706: > 1704: Klass* resolved_klass = link_info.resolved_klass(); > 1705: methodHandle method(THREAD, method_entry->method()); > 1706: Handle appendix(THREAD, pool->cache()->appendix_if_resolved(index)); This is another point where you already have the `ResolvedMethodEntry` and are still looking up the appendix info via the cache using an index. src/hotspot/share/interpreter/rewriter.cpp line 244: > 242: if ((*opc) == (u1)Bytecodes::_invokevirtual || > 243: // allow invokespecial as an alias, although it would be very odd: > 244: ((*opc) == (u1)Bytecodes::_invokespecial && _pool->tag_at(cp_index).is_method())) { I'm not clear why the assert for both cases has now been folded into an additional logic check for only one case. You don't seem to have added an else clause to handle the case where the opcode is `_invokespecial` and the pool tag is not a method. Can you explain this change? src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 2292: > 2290: if (entry->has_appendix()) { > 2291: constantPoolHandle cp(THREAD, METHOD->constants()); > 2292: SET_STACK_OBJECT(cp->cache()->appendix_if_resolved(index), 0); Another place where you indirect through the cache using an index when you already have the entry. src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 2314: > 2312: CALL_VM(InterpreterRuntime::resolve_from_cache(THREAD, (Bytecodes::Code)opcode), > 2313: handle_exception); > 2314: entry = cp->resolved_method_entry_at(index); Do you actually need to lookup the entry again? I'm not really sure why the old code needed to do so. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 1685: > 1683: vmassert(MethodHandles::is_signature_polymorphic_method(resolved_method()),"!"); > 1684: vmassert(!MethodHandles::is_signature_polymorphic_static(resolved_method->intrinsic_id()), "!"); > 1685: vmassert(cp->cache()->appendix_if_resolved(index) == nullptr, "!"); Another place where you have the `ResolvedMethodEntry` and are accessing via the cache with an index. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1363399362 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1363415505 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1363423644 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1363427943 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1363440870 From thartmann at openjdk.org Wed Oct 18 08:30:13 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 18 Oct 2023 08:30:13 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v8] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 22:05:08 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments, removed unused labels Sorry for the delay on our side. I submitted testing and will report back once it passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15410#issuecomment-1767939001 From fyang at openjdk.org Wed Oct 18 08:29:38 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 18 Oct 2023 08:29:38 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 09:52:15 GMT, Ilya Gavrilin wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1682: >> >>> 1680: // use floating-point 1.0 with a sign of input >>> 1681: is_double ? fsgnj_d(dst, one, src) >>> 1682: : fsgnj_s(dst, one, src); >> >> What if the `src` argument contains zero? Math.signum(float/double) is supposed to return zero if the argument is zero [1]. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L2602 > > According to IEEE754, we can get positive or negative zero in the `src` register (also positive zero can be named as zero) , and these cases included to mask for the tmp1 (L1671-1676) and `src` value returned. I see. Thanks for the answer. I can approve this once my other comments are resolved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1363440282 From xgong at openjdk.org Wed Oct 18 08:29:39 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 18 Oct 2023 08:29:39 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: <6OkPww11l85K0HiU01s6TE6n-1PLTHjZqHSJ1CTSJU8=.f1931736-312f-4c15-915e-25b2a6352c73@github.com> On Wed, 18 Oct 2023 07:36:44 GMT, David Holmes wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > src/hotspot/cpu/aarch64/globals_aarch64.hpp line 131: > >> 129: "Branch Protection to use: none, standard, pac-ret") \ >> 130: product(ccstr, UseSleefLib, "libsleef.so.3", EXPERIMENTAL, \ >> 131: "Sleef library to use for the vector math operations") \ > > Experimental functionality like this should not be enabled by default as you are changing the behaviour for all users. This needs to be off by default with user's being able to opt-in if they want. Sounds reasonable. Thanks a lot for the reminder! > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8555: > >> 8553: } else { >> 8554: if (FLAG_IS_DEFAULT(UseSleefLib)) { >> 8555: log_info(library)("Fail to load sleef library!"); > > The library name being looked up is probably useful here too. Thanks for the review! I will address this in next commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1363451666 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1363451750 From shade at openjdk.org Wed Oct 18 08:34:04 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 18 Oct 2023 08:34:04 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v12] In-Reply-To: References: Message-ID: <7Or8Vl0dQBUVIaiq6SSDn-PujHmw3YYeVyUyrGCSDLk=.768595f4-955b-4733-8bac-9b2dd7272d3f@github.com> > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements mitigation on most current architectures: > - ? x86_64: implemented > - ? x86_32: considered, abandoned; cannot be easily done without blowing up code size > - ? AArch64: implemented > - ? ARM32: considered, abandoned; needs cleanups and testing; see [JDK-8318414](https://bugs.openjdk.org/browse/JDK-8318414) > - ? PPC64: implemented, thanks @TheRealMDoerr > - ? S390: implemented, thanks @offamitkumar > - ? RISC-V: implemented, thanks @RealFYang > - ? Zero: does not need implementation > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > I believe we can go in with `1000` as the default, given the experimental results mentioned in this PR. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request incrementally with four additional commits since the last revision: - Editorial cleanups - RISC-V implementation - Mention ARM32 bug - Make sure benchmark runs with C1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/8dd00325..0e1fccd2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=10-11 Stats: 24 lines in 7 files changed: 15 ins; 3 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From aph at openjdk.org Wed Oct 18 08:53:25 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Oct 2023 08:53:25 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 18 Oct 2023 00:01:20 GMT, David Holmes wrote: > Meta-question and apologies if this was covered before, but why is this logic being added to stubRoutines.cpp? Because tha'ts where @iwanowww asked me to put it. I don't much care. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1767984532 From mcimadamore at openjdk.org Wed Oct 18 09:25:09 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 18 Oct 2023 09:25:09 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 04:44:26 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > drop unused in_reg_spiller src/java.base/share/classes/java/lang/foreign/Linker.java line 792: > 790: * such as loss of performance, or JVM crashes. > 791: *

> 792: * Critical functions can optionally allow access to the Java heap. This allows a client to pass heap Suggestion: * Critical functions can optionally allow access to the Java heap. This allows clients to pass heap src/java.base/share/classes/java/lang/foreign/Linker.java line 795: > 793: * memory segments as addresses, where normally only off-heap memory segments would be allowed. The memory region > 794: * inside the Java heap is exposed through a temporary native address that is valid for the duration of the > 795: * function call. As such, these temporary addresses, or any addresses derived from them, should not be used The API does not expose temporary addresses - so it is not 100% clear when reading what this para refers to. I suppose you mean a native function that "captures" an object's address and then returns it, so that the client wraps it in a zero-length memory segment? I can't decide on top of my head if this is too cornery or not, even for this javadoc. src/java.base/share/classes/java/lang/foreign/Linker.java line 800: > 798: * prohibitive in terms of performance. > 799: * > 800: * @implNote As a consequence of allowing heap access, the JVM will either lock the garbage collector, or pin the This is a very low-level comment. Which is fine per se. But it doesn't say what does it mean for the developer using this functionality. I think you want to say that GC is impacted in some way or form. If so, please say that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363538621 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363541304 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363544042 From jvernee at openjdk.org Wed Oct 18 09:32:09 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 09:32:09 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 09:20:10 GMT, Maurizio Cimadamore wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> drop unused in_reg_spiller > > src/java.base/share/classes/java/lang/foreign/Linker.java line 795: > >> 793: * memory segments as addresses, where normally only off-heap memory segments would be allowed. The memory region >> 794: * inside the Java heap is exposed through a temporary native address that is valid for the duration of the >> 795: * function call. As such, these temporary addresses, or any addresses derived from them, should not be used > > The API does not expose temporary addresses - so it is not 100% clear when reading what this para refers to. I suppose you mean a native function that "captures" an object's address and then returns it, so that the client wraps it in a zero-length memory segment? I can't decide on top of my head if this is too cornery or not, even for this javadoc. I'm not sure what to write to make this clearer. The address that is exposed to the native target function is indeed a temporary address that is constructed from the oop and offset. It is temporary because after the native call, the GC might move the object around, which invalidates the address. This part is meant to document that native code should not be holding on to the address until after the call completes. This also includes returning the address back to Java. The address would be invalid the moment the function returns. Would it help if this said: `The memory region inside the Java heap is exposed to the native target function through a temporary native address`? > src/java.base/share/classes/java/lang/foreign/Linker.java line 800: > >> 798: * prohibitive in terms of performance. >> 799: * >> 800: * @implNote As a consequence of allowing heap access, the JVM will either lock the garbage collector, or pin the > > This is a very low-level comment. Which is fine per se. But it doesn't say what does it mean for the developer using this functionality. I think you want to say that GC is impacted in some way or form. If so, please say that. This comment can be removed, as we no longer lock the GC. The same restrictions apply as a 'regular' critical call that does not allow passing heap segments. The call should complete quickly. > src/java.base/share/classes/jdk/internal/foreign/abi/Binding.java line 281: > >> 279: } >> 280: >> 281: static SegmentOffset segmentOffsetAllowHeap() { > > I suppose `heapSegmentOffset` is not good because this binding applies to both native and heap segments, right? Yes. Setting `allowHeap` to true, just allows heap segments to be used, by switching the object + offset addressing pairs. Native segments are also handled by addressing pair. So this binding applies to both. FWIW, we don't allow heap segments for critical calls by default, since it complicates the calling convention and the work the downcall stub does, which would be detrimental to performance in a case that only needs support off-heap segments. (and when we're talking about critical calls, every nanosecond counts) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363551493 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363553663 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363554288 From mcimadamore at openjdk.org Wed Oct 18 09:32:12 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 18 Oct 2023 09:32:12 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: <4JaIjJ4E_HG1DLJLX_LvL4ysvZsyT6HZxQOPFAJl2uU=.415cf6c9-4e42-4501-a6fb-5e914fd98877@github.com> On Wed, 18 Oct 2023 04:44:26 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > drop unused in_reg_spiller src/java.base/share/classes/jdk/internal/foreign/abi/Binding.java line 205: > 203: LoadFunc loadFunc, SegmentAllocator allocator); > 204: > 205: private static void checkType(Class type) { For another day: I wonder if we might be better served by more specific "check" routines. E.g. I suppose `Object` is not valid always... and maybe in other cases only certain primitives are ok. src/java.base/share/classes/jdk/internal/foreign/abi/Binding.java line 281: > 279: } > 280: > 281: static SegmentOffset segmentOffsetAllowHeap() { I suppose `heapSegmentOffset` is not good because this binding applies to both native and heap segments, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363547594 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363552900 From jvernee at openjdk.org Wed Oct 18 09:38:55 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 09:38:55 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v3] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: fix s390 & riscv compilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16201/files - new: https://git.openjdk.org/jdk/pull/16201/files/2c073bf1..20fc7fa1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=01-02 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From mcimadamore at openjdk.org Wed Oct 18 09:39:01 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 18 Oct 2023 09:39:01 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: <4Zl8qpBRu9L-zLZ--FcK3ADw-LTuqwX7l7-AxxFhQZA=.675541a1-6933-42fb-bec8-14d0d531902f@github.com> On Wed, 18 Oct 2023 04:44:26 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > drop unused in_reg_spiller src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/CallArranger.java line 156: > 154: > 155: BindingCalculator argCalc = forUpcall ? new BoxBindingCalculator(true) : new UnboxBindingCalculator(true, forVariadicFunction, options.allowsHeapAccess()); > 156: BindingCalculator retCalc = forUpcall ? new UnboxBindingCalculator(false, forVariadicFunction, false) : new BoxBindingCalculator(false); asymmetry noted - that said, not much that can be done here - as for an upcall it is not clear what the surrounding context is (e.g. where's the place where unpinning happens) src/java.base/share/classes/jdk/internal/foreign/abi/fallback/FallbackLinker.java line 188: > 186: acquiredSessions.add(sessionImpl); > 187: if (invData.allowsHeapAccess() && !ms.isNative()) { > 188: heapBases[i] = ms.heapBase().get(); This will fail if the segment is read-only. Use the unsafe variant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363562390 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363566076 From aph at openjdk.org Wed Oct 18 09:42:20 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Oct 2023 09:42:20 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: <6OkPww11l85K0HiU01s6TE6n-1PLTHjZqHSJ1CTSJU8=.f1931736-312f-4c15-915e-25b2a6352c73@github.com> References: <6OkPww11l85K0HiU01s6TE6n-1PLTHjZqHSJ1CTSJU8=.f1931736-312f-4c15-915e-25b2a6352c73@github.com> Message-ID: On Wed, 18 Oct 2023 08:18:41 GMT, Xiaohong Gong wrote: >> src/hotspot/cpu/aarch64/globals_aarch64.hpp line 131: >> >>> 129: "Branch Protection to use: none, standard, pac-ret") \ >>> 130: product(ccstr, UseSleefLib, "libsleef.so.3", EXPERIMENTAL, \ >>> 131: "Sleef library to use for the vector math operations") \ >> >> Experimental functionality like this should not be enabled by default as you are changing the behaviour for all users. This needs to be off by default with user's being able to opt-in if they want. > > Sounds reasonable. Thanks a lot for the reminder! Hard-coding the libsleef ABI version into OpenJDK is a code smell. For now I suppose it'll do, but we need a better strategy going forward, perhaps involving a bundled library. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1363568640 From aph at openjdk.org Wed Oct 18 09:42:22 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Oct 2023 09:42:22 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: <46Wir5nFl1uiDGTmjKvTkO_ehii8febUq5eHm1NMJTI=.e1ab3c84-7704-4032-a92b-41bfc6060a05@github.com> On Wed, 18 Oct 2023 06:12:29 GMT, Xiaohong Gong wrote: > Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). > > SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. > > To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. > > Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. > > [1] https://github.com/openjdk/jdk/pull/3638 > [2] https://sleef.org/ > [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ > [4] https://packages.debian.org/bookworm/libsleef3 > [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8521: > 8519: for (int op = 0; op < VectorSupport::NUM_VECTOR_OP_MATH; op++) { > 8520: int vop = VectorSupport::VECTOR_OP_MATH_START + op; > 8521: // Skip "tanh", since there is performance regression Suggestion: // Skip "tanh" because there is performance regression ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1363570896 From jvernee at openjdk.org Wed Oct 18 09:42:27 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 09:42:27 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v4] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Phrasing Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16201/files - new: https://git.openjdk.org/jdk/pull/16201/files/20fc7fa1..c4a8866b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From aph at openjdk.org Wed Oct 18 09:47:43 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Oct 2023 09:47:43 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 06:12:29 GMT, Xiaohong Gong wrote: > Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). > > SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. > > To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. > > Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. > > [1] https://github.com/openjdk/jdk/pull/3638 > [2] https://sleef.org/ > [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ > [4] https://packages.debian.org/bookworm/libsleef3 > [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html This looks good. As far as I can tell the choice you've made of accuracy matches what we need to meet the spec. I'm very nervous about binding ourselves to a specific version of the SLEEF ABI, because Java releases are maintained for decades, and we don't want to be dependent on other projects. We'll have to make a plan for version evolution. ------------- PR Review: https://git.openjdk.org/jdk/pull/16234#pullrequestreview-1684604148 From mcimadamore at openjdk.org Wed Oct 18 09:48:34 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 18 Oct 2023 09:48:34 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 04:44:26 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > drop unused in_reg_spiller test/micro/org/openjdk/bench/java/lang/foreign/CriticalCalls.java line 101: > 99: public int callNotPinned() throws Throwable { > 100: try (Arena arena = Arena.ofConfined()) { > 101: MemorySegment nativeArr = arena.allocate(JAVA_INT, arr.length); This allocation method will zero the segment. I think that's a bit unfair and will bias the results against non-pinned, right? test/micro/org/openjdk/bench/java/lang/foreign/CriticalCalls.java line 110: > 108: public int callRecycled() throws Throwable { > 109: MemorySegment nativeArr = recycler.allocate(JAVA_INT, arr.length); > 110: MemorySegment.copy(arr, 0, nativeArr, JAVA_INT, 0, arr.length); Same here - I believe a single call to the correct `allocate` method suffices, but in this case I don't think performance should be skewed too much. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363578019 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363578910 From mcimadamore at openjdk.org Wed Oct 18 09:51:52 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 18 Oct 2023 09:51:52 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 04:44:26 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > drop unused in_reg_spiller Looks very good. I did a pass of all the Java code, tests and benchmarks. And also looked at the fallback linker (both Java and C). I will leave the hotspot parts to others :-) ------------- PR Review: https://git.openjdk.org/jdk/pull/16201#pullrequestreview-1684611937 From jvernee at openjdk.org Wed Oct 18 09:51:52 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 09:51:52 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 09:45:25 GMT, Maurizio Cimadamore wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> drop unused in_reg_spiller > > test/micro/org/openjdk/bench/java/lang/foreign/CriticalCalls.java line 101: > >> 99: public int callNotPinned() throws Throwable { >> 100: try (Arena arena = Arena.ofConfined()) { >> 101: MemorySegment nativeArr = arena.allocate(JAVA_INT, arr.length); > > This allocation method will zero the segment. I think that's a bit unfair and will bias the results against non-pinned, right? Good point. I made this benchmark a while ago, before we had the more optimized allocate variants ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363582375 From mcimadamore at openjdk.org Wed Oct 18 10:00:25 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 18 Oct 2023 10:00:25 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 09:25:49 GMT, Jorn Vernee wrote: >> src/java.base/share/classes/java/lang/foreign/Linker.java line 795: >> >>> 793: * memory segments as addresses, where normally only off-heap memory segments would be allowed. The memory region >>> 794: * inside the Java heap is exposed through a temporary native address that is valid for the duration of the >>> 795: * function call. As such, these temporary addresses, or any addresses derived from them, should not be used >> >> The API does not expose temporary addresses - so it is not 100% clear when reading what this para refers to. I suppose you mean a native function that "captures" an object's address and then returns it, so that the client wraps it in a zero-length memory segment? I can't decide on top of my head if this is too cornery or not, even for this javadoc. > > I'm not sure what to write to make this clearer. The address that is exposed to the native target function is indeed a temporary address that is constructed from the oop and offset. It is temporary because after the native call, the GC might move the object around, which invalidates the address. > > This part is meant to document that native code should not be holding on to the address until after the call completes. This also includes returning the address back to Java. The address would be invalid the moment the function returns. > > Would it help if this said: `The memory region inside the Java heap is exposed to the native target function through a temporary native address`? I'm considering to leave `As such, these temporary addresses...` out. You already said that the address "is valid for the duration of the call". I'm not sure the subsequent sentence adds much. If you want to make a concrete example of something that should not be done, that would be better. But it seems to me that the client is as in charge as this text suggests - (e.g. if a native lib decides to hold onto an address, the client can't do much about it). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363593890 From mcimadamore at openjdk.org Wed Oct 18 10:00:25 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 18 Oct 2023 10:00:25 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 09:57:25 GMT, Maurizio Cimadamore wrote: >> I'm not sure what to write to make this clearer. The address that is exposed to the native target function is indeed a temporary address that is constructed from the oop and offset. It is temporary because after the native call, the GC might move the object around, which invalidates the address. >> >> This part is meant to document that native code should not be holding on to the address until after the call completes. This also includes returning the address back to Java. The address would be invalid the moment the function returns. >> >> Would it help if this said: `The memory region inside the Java heap is exposed to the native target function through a temporary native address`? > > I'm considering to leave `As such, these temporary addresses...` out. You already said that the address "is valid for the duration of the call". I'm not sure the subsequent sentence adds much. If you want to make a concrete example of something that should not be done, that would be better. But it seems to me that the client is as in charge as this text suggests - (e.g. if a native lib decides to hold onto an address, the client can't do much about it). So, summing up - either we write more and spell what are the things to be on the lookout for, or we leave it out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363595149 From amitkumar at openjdk.org Wed Oct 18 10:25:11 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 18 Oct 2023 10:25:11 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 17:45:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Removed some comments and relocated code Hi @matias9927, would you please rebase this PR with the head stream. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1768143574 From jvernee at openjdk.org Wed Oct 18 10:27:28 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 10:27:28 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 09:58:12 GMT, Maurizio Cimadamore wrote: >> I'm considering to leave `As such, these temporary addresses...` out. You already said that the address "is valid for the duration of the call". I'm not sure the subsequent sentence adds much. If you want to make a concrete example of something that should not be done, that would be better. But it seems to me that the client is as in charge as this text suggests - (e.g. if a native lib decides to hold onto an address, the client can't do much about it). > > So, summing up - either we write more and spell what are the things to be on the lookout for, or we leave it out. Ok, I'll remove the second sentence. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363628571 From jvernee at openjdk.org Wed Oct 18 10:51:22 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 10:51:22 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 09:48:47 GMT, Jorn Vernee wrote: >> test/micro/org/openjdk/bench/java/lang/foreign/CriticalCalls.java line 101: >> >>> 99: public int callNotPinned() throws Throwable { >>> 100: try (Arena arena = Arena.ofConfined()) { >>> 101: MemorySegment nativeArr = arena.allocate(JAVA_INT, arr.length); >> >> This allocation method will zero the segment. I think that's a bit unfair and will bias the results against non-pinned, right? > > Good point. I made this benchmark a while ago, before we had the more optimized allocate variants This had a pretty big impact, actually. Especially on the larger sizes: Benchmark (size) Mode Cnt Score Error Units CriticalCalls.callNotPinned 100 avgt 30 84.818 ? 0.729 ns/op CriticalCalls.callNotPinned 10000 avgt 30 2966.918 ? 39.898 ns/op CriticalCalls.callNotPinned 1000000 avgt 30 952864.052 ? 34996.156 ns/op CriticalCalls.callPinned 100 avgt 30 30.640 ? 0.173 ns/op CriticalCalls.callPinned 10000 avgt 30 2241.403 ? 35.473 ns/op CriticalCalls.callPinned 1000000 avgt 30 221152.247 ? 1697.968 ns/op CriticalCalls.callRecycled 100 avgt 30 40.205 ? 0.458 ns/op CriticalCalls.callRecycled 10000 avgt 30 2845.316 ? 13.331 ns/op CriticalCalls.callRecycled 1000000 avgt 30 287752.178 ? 2322.382 ns/op ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363661902 From mcimadamore at openjdk.org Wed Oct 18 11:14:58 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 18 Oct 2023 11:14:58 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: References: Message-ID: <67T6EClvm9QEAqYr4F4xjgh0mYFwLj0g2Eekx3ecxJA=.8a304336-2d6e-440e-aa55-8f290487f623@github.com> On Wed, 18 Oct 2023 10:48:58 GMT, Jorn Vernee wrote: >> Good point. I made this benchmark a while ago, before we had the more optimized allocate variants > > This had a pretty big impact, actually. Especially on the larger sizes: > > > Benchmark (size) Mode Cnt Score Error Units > CriticalCalls.callNotPinned 100 avgt 30 84.818 ? 0.729 ns/op > CriticalCalls.callNotPinned 10000 avgt 30 2966.918 ? 39.898 ns/op > CriticalCalls.callNotPinned 1000000 avgt 30 952864.052 ? 34996.156 ns/op > CriticalCalls.callPinned 100 avgt 30 30.640 ? 0.173 ns/op > CriticalCalls.callPinned 10000 avgt 30 2241.403 ? 35.473 ns/op > CriticalCalls.callPinned 1000000 avgt 30 221152.247 ? 1697.968 ns/op > CriticalCalls.callRecycled 100 avgt 30 40.205 ? 0.458 ns/op > CriticalCalls.callRecycled 10000 avgt 30 2845.316 ? 13.331 ns/op > CriticalCalls.callRecycled 1000000 avgt 30 287752.178 ? 2322.382 ns/op I also notice that the non pinned variant of the `100` benchmark is slow compared to the others. This might be due to try with resources inhibiting scalarization. I suggest to call Arena::close explicitly in that benchmark and repeat the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363689027 From jvernee at openjdk.org Wed Oct 18 11:20:01 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 11:20:01 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v2] In-Reply-To: <67T6EClvm9QEAqYr4F4xjgh0mYFwLj0g2Eekx3ecxJA=.8a304336-2d6e-440e-aa55-8f290487f623@github.com> References: <67T6EClvm9QEAqYr4F4xjgh0mYFwLj0g2Eekx3ecxJA=.8a304336-2d6e-440e-aa55-8f290487f623@github.com> Message-ID: On Wed, 18 Oct 2023 11:12:43 GMT, Maurizio Cimadamore wrote: >> This had a pretty big impact, actually. Especially on the larger sizes: >> >> >> Benchmark (size) Mode Cnt Score Error Units >> CriticalCalls.callNotPinned 100 avgt 30 84.818 ? 0.729 ns/op >> CriticalCalls.callNotPinned 10000 avgt 30 2966.918 ? 39.898 ns/op >> CriticalCalls.callNotPinned 1000000 avgt 30 952864.052 ? 34996.156 ns/op >> CriticalCalls.callPinned 100 avgt 30 30.640 ? 0.173 ns/op >> CriticalCalls.callPinned 10000 avgt 30 2241.403 ? 35.473 ns/op >> CriticalCalls.callPinned 1000000 avgt 30 221152.247 ? 1697.968 ns/op >> CriticalCalls.callRecycled 100 avgt 30 40.205 ? 0.458 ns/op >> CriticalCalls.callRecycled 10000 avgt 30 2845.316 ? 13.331 ns/op >> CriticalCalls.callRecycled 1000000 avgt 30 287752.178 ? 2322.382 ns/op > > I also notice that the non pinned variant of the `100` benchmark is slow compared to the others. This might be due to try with resources inhibiting scalarization. I suggest to call Arena::close explicitly in that benchmark and repeat the test. Not sure... The `callNotPinned` variant is meant as a typical use case where the native segment needs to be allocated. I think TWR belongs in that typical use-case. This is really about measuring the difference between 2 idiomatic code patterns. If non-scalarization is part of one of those patterns, then I think that is something that should be included in the measurement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363694139 From hgreule at openjdk.org Wed Oct 18 11:25:19 2023 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 18 Oct 2023 11:25:19 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v5] In-Reply-To: References: Message-ID: > See the bug description for more information. > > This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: fix indent ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16083/files - new: https://git.openjdk.org/jdk/pull/16083/files/f303a227..df3700b3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16083&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16083&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16083/head:pull/16083 PR: https://git.openjdk.org/jdk/pull/16083 From hgreule at openjdk.org Wed Oct 18 11:25:20 2023 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 18 Oct 2023 11:25:20 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v4] In-Reply-To: References: Message-ID: On Sat, 14 Oct 2023 20:17:31 GMT, Hannes Greule wrote: >> See the bug description for more information. >> >> This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > reword -> initial klass Fixed the wrong indent. Thank you for your review. Do I need another one or can we proceed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16083#issuecomment-1768240487 From adinn at openjdk.org Wed Oct 18 12:11:05 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 18 Oct 2023 12:11:05 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 17:15:56 GMT, Matias Saavedra Silva wrote: >> src/hotspot/share/ci/ciReplay.cpp line 436: >> >>> 434: #endif >>> 435: ResolvedMethodEntry* method_entry = cp->cache()->resolved_method_entry_at(index); >>> 436: cp->cache()->set_method_handle(index, callInfo); >> >> It looks a bit odd that you obtain a pointer to the `ResolvedMethodEntry` at `index` by calling `resolved_method_entry_at` and then pass `index` back into `set_method_handle` in order to update it with the call info. Obviously this is done so that the set routine can handle data races. Would it not be better to modify `set_method_handle` so that it handled the race and also returned the `ResolvedMethodEntry` at `index`. >> >> Likewise you pass `index` back in the call to `appendix_if_resolved` below. Would it not be better to have this method accept a `ResolvedMethodEntry` pointer? > > Right, currently these calls seem redundant since it reads the resolved method entry twice. If I understand correctly, you are suggesting that the methods look something like this? > `ResolvedMethodEntry* set_method_handle(int index, const CallInfo &call_info)` > `oop appendix_if_resolved(ResolvedMethodEntry* method_entry)` Yes, removing the redundant lookup is the correct rationale. However, I suggested a better solution below in a comment attached to method `ConstantPoolCache::appendix_if_resolved(int index)`. Keep the current method of `ConstantPoolCache` that takes an index but modify it to delegate the appendix check to a new method `ResolvedMethodEntry::appendix_if_resolved()`. You can make a direct call to the latter method when you have already looked up the `ResolvedMethodEntry`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1363642133 From adinn at openjdk.org Wed Oct 18 12:11:08 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 18 Oct 2023 12:11:08 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 17:45:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Removed some comments and relocated code src/hotspot/share/oops/constantPool.inline.hpp line 65: > 63: inline oop ConstantPool::appendix_if_resolved(int method_index) const { > 64: ResolvedMethodEntry* entry = cache()->resolved_method_entry_at(method_index); > 65: if (!entry->has_appendix()) As per the method on `ConstantPoolCache` if you move the body of this into `ResolvedMethodEntry::appendix_if_resolved` you can call that direct from here. src/hotspot/share/oops/cpCache.cpp line 643: > 641: oop ConstantPoolCache::appendix_if_resolved(int method_index) const { > 642: ResolvedMethodEntry* method_entry = resolved_method_entry_at(method_index); > 643: if (!method_entry->has_appendix()) If you move the rest of the code in this method into the a new method `ResolvedMethodEntry::appendix_if_resolved()` then you can call that method from here and also call it in places where you have already looked up the `ResolvedMethodEntry` but are still indirecting through the cache method using an `index`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1363753051 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1363630294 From adinn at openjdk.org Wed Oct 18 12:32:05 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 18 Oct 2023 12:32:05 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 17:45:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Removed some comments and relocated code Hi Matias. I have reviewed the aarch64- and x86-specific changes and also worked through the shared code in c1, c1, classfile, interpreter, zero, oops and prims and have made some suggestions for changes. No doubt Coleen will pick up on things I am quite likely to have missed in the shared code. ------------- Changes requested by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15455#pullrequestreview-1684949406 From adinn at openjdk.org Wed Oct 18 12:37:58 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 18 Oct 2023 12:37:58 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 16:50:11 GMT, Matias Saavedra Silva wrote: >> src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2369: >> >>> 2367: } >>> 2368: >>> 2369: void TemplateTable::load_resolved_method_entry_common(Register cache, >> >> When I saw the `common` suffix I thought at first that this might be used in addition to the variants with the other suffixes rather than as an alternative. Since this is 'common' to only to 2 out of the 5 different cases perhaps it could be named `load_resolved_method_entry_special_or_static`. That would make it clear that each invoke type has a corresponding `load_resolved_method_entry_` variant. > > I struggled with naming this method which is why I settled on `common` but you are correct that it gives the wrong impression. However, I think `load_resolved_method_entry_special_or_static` is too verbose (44 characters!). Do you think there is a better name or is our best option to use this longer one? Yes, it's a long name but it clarifies the intended purpose consistently with the names for the other cases. I think it is best to stick with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1363807321 From adinn at openjdk.org Wed Oct 18 12:42:01 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 18 Oct 2023 12:42:01 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 17:45:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Removed some comments and relocated code Oh, and I forgot to say, nice work! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1768366127 From jvernee at openjdk.org Wed Oct 18 12:45:05 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 12:45:05 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v5] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request incrementally with six additional commits since the last revision: - Add xor benchmark - add readOnly heap segment test - shorten linker doc - use allocateFrom in benchmarks - use unsafeGetBase in fallback linker - remove impl note from isCritical ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16201/files - new: https://git.openjdk.org/jdk/pull/16201/files/c4a8866b..628d4952 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=03-04 Stats: 392 lines in 14 files changed: 374 ins; 7 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From jvernee at openjdk.org Wed Oct 18 12:45:07 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 12:45:07 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v4] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 09:42:27 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Phrasing > > Co-authored-by: Maurizio Cimadamore <54672762+mcimadamore at users.noreply.github.com> Added another benchmark to the patch that xors 2 arrays together using various strategies. These are the results on my machine: Benchmark (arrayKind) (sizeKind) Mode Cnt Score Error Units XorTest.xor JNI_ELEMENTS SMALL avgt 30 0.555 ? 0.010 ms/op XorTest.xor JNI_ELEMENTS MEDIUM avgt 30 4.610 ? 0.114 ms/op XorTest.xor JNI_ELEMENTS LARGE avgt 30 53.533 ? 2.113 ms/op XorTest.xor JNI_REGION SMALL avgt 30 0.030 ? 0.001 ms/op XorTest.xor JNI_REGION MEDIUM avgt 30 1.498 ? 0.041 ms/op XorTest.xor JNI_REGION LARGE avgt 30 7.544 ? 0.188 ms/op XorTest.xor JNI_CRITICAL SMALL avgt 30 0.035 ? 0.005 ms/op XorTest.xor JNI_CRITICAL MEDIUM avgt 30 0.496 ? 0.003 ms/op XorTest.xor JNI_CRITICAL LARGE avgt 30 2.521 ? 0.035 ms/op XorTest.xor FOREIGN_NO_INIT SMALL avgt 30 0.030 ? 0.001 ms/op XorTest.xor FOREIGN_NO_INIT MEDIUM avgt 30 1.303 ? 0.021 ms/op XorTest.xor FOREIGN_NO_INIT LARGE avgt 30 7.668 ? 0.168 ms/op XorTest.xor FOREIGN_INIT SMALL avgt 30 0.031 ? 0.001 ms/op XorTest.xor FOREIGN_INIT MEDIUM avgt 30 1.485 ? 0.012 ms/op XorTest.xor FOREIGN_INIT LARGE avgt 30 9.183 ? 0.247 ms/op XorTest.xor FOREIGN_CRITICAL SMALL avgt 30 0.026 ? 0.001 ms/op XorTest.xor FOREIGN_CRITICAL MEDIUM avgt 30 0.501 ? 0.002 ms/op XorTest.xor FOREIGN_CRITICAL LARGE avgt 30 2.578 ? 0.023 ms/op XorTest.xor UNSAFE SMALL avgt 30 0.029 ? 0.001 ms/op XorTest.xor UNSAFE MEDIUM avgt 30 1.300 ? 0.013 ms/op XorTest.xor UNSAFE LARGE avgt 30 7.632 ? 0.178 ms/op The important part here is the `FOREIGN_CRITICAL` (the new feature) is on par with `JNI_CRITICAL`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768370164 From mdoerr at openjdk.org Wed Oct 18 13:16:00 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Oct 2023 13:16:00 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v5] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 12:45:05 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with six additional commits since the last revision: > > - Add xor benchmark > - add readOnly heap segment test > - shorten linker doc > - use allocateFrom in benchmarks > - use unsafeGetBase in fallback linker > - remove impl note from isCritical PPC64 code is here: [PPC64_Panama_heap_segments.patch](https://github.com/openjdk/jdk/files/13025334/PPC64_Panama_heap_segments.patch) All tests are passing on linux PPC64le. One single test case is failing on Big Endian: test TestLayoutPaths.testBadAlignmentOfRoot(): failure java.lang.AssertionError: expected [true] but found [false] at org.testng.Assert.fail(Assert.java:99) at org.testng.Assert.failNotEquals(Assert.java:1037) at org.testng.Assert.assertTrue(Assert.java:45) at org.testng.Assert.assertTrue(Assert.java:55) at TestLayoutPaths.testBadAlignmentOfRoot(TestLayoutPaths.java:157) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132) at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599) at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174) at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46) at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822) at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128) at java.base/java.util.ArrayList.forEach(ArrayList.java:1597) at org.testng.TestRunner.privateRun(TestRunner.java:764) at org.testng.TestRunner.run(TestRunner.java:585) at org.testng.SuiteRunner.runTest(SuiteRunner.java:384) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337) at org.testng.SuiteRunner.run(SuiteRunner.java:286) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218) at org.testng.TestNG.runSuitesLocally(TestNG.java:1140) at org.testng.TestNG.runSuites(TestNG.java:1069) at org.testng.TestNG.run(TestNG.java:1037) at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:102) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) at java.base/java.lang.Thread.run(Thread.java:1570) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768430267 From jvernee at openjdk.org Wed Oct 18 13:52:01 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 13:52:01 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v5] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:13:07 GMT, Martin Doerr wrote: > Note: This error is not related to this PR. It was broken by https://github.com/openjdk/jdk/commit/b12c471a990eb8f789410a20084918368c655659 which is incorrect for Big Endian. Should I file a new issue or is that already known? Please file a new issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768504257 From jvernee at openjdk.org Wed Oct 18 14:29:04 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 14:29:04 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v5] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:13:07 GMT, Martin Doerr wrote: >> Jorn Vernee has updated the pull request incrementally with six additional commits since the last revision: >> >> - Add xor benchmark >> - add readOnly heap segment test >> - shorten linker doc >> - use allocateFrom in benchmarks >> - use unsafeGetBase in fallback linker >> - remove impl note from isCritical > > PPC64 code is here: > [PPC64_Panama_heap_segments.patch](https://github.com/openjdk/jdk/files/13025334/PPC64_Panama_heap_segments.patch) > All tests are passing on linux PPC64le. One single test case is failing on Big Endian: > test TestLayoutPaths.testBadAlignmentOfRoot(): failure > java.lang.AssertionError: expected [true] but found [false] > at org.testng.Assert.fail(Assert.java:99) > at org.testng.Assert.failNotEquals(Assert.java:1037) > at org.testng.Assert.assertTrue(Assert.java:45) > at org.testng.Assert.assertTrue(Assert.java:55) > at TestLayoutPaths.testBadAlignmentOfRoot(TestLayoutPaths.java:157) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132) > at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599) > at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174) > at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46) > at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822) > at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147) > at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146) > at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1597) > at org.testng.TestRunner.privateRun(TestRunner.java:764) > at org.testng.TestRunner.run(TestRunner.java:585) > at org.testng.SuiteRunner.runTest(SuiteRunner.java:384) > at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378) > at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337) > at org.testng.SuiteRunner.run(SuiteRunner.java:286) > at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53) > at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96) > at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218) > at org.testng.TestNG.runSuitesLocally(TestNG.java:1140) > at org.testng.TestNG.runSuites(TestNG.java:1069) > at org.testng.TestNG.run(TestNG.java:1037) > at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:102) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > ... @TheRealMDoerr Thanks for the patch, I've added it to the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768582362 From jvernee at openjdk.org Wed Oct 18 14:29:01 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 14:29:01 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v6] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - add PPC impl - add missing file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16201/files - new: https://git.openjdk.org/jdk/pull/16201/files/628d4952..07d06216 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=04-05 Stats: 156 lines in 3 files changed: 81 ins; 62 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From mdoerr at openjdk.org Wed Oct 18 14:41:01 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Oct 2023 14:41:01 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v5] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:13:07 GMT, Martin Doerr wrote: >> Jorn Vernee has updated the pull request incrementally with six additional commits since the last revision: >> >> - Add xor benchmark >> - add readOnly heap segment test >> - shorten linker doc >> - use allocateFrom in benchmarks >> - use unsafeGetBase in fallback linker >> - remove impl note from isCritical > > PPC64 code is here: > [PPC64_Panama_heap_segments.patch](https://github.com/openjdk/jdk/files/13025334/PPC64_Panama_heap_segments.patch) > All tests are passing on linux PPC64le. One single test case is failing on Big Endian: > test TestLayoutPaths.testBadAlignmentOfRoot(): failure > java.lang.AssertionError: expected [true] but found [false] > at org.testng.Assert.fail(Assert.java:99) > at org.testng.Assert.failNotEquals(Assert.java:1037) > at org.testng.Assert.assertTrue(Assert.java:45) > at org.testng.Assert.assertTrue(Assert.java:55) > at TestLayoutPaths.testBadAlignmentOfRoot(TestLayoutPaths.java:157) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132) > at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599) > at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174) > at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46) > at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822) > at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147) > at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146) > at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1597) > at org.testng.TestRunner.privateRun(TestRunner.java:764) > at org.testng.TestRunner.run(TestRunner.java:585) > at org.testng.SuiteRunner.runTest(SuiteRunner.java:384) > at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378) > at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337) > at org.testng.SuiteRunner.run(SuiteRunner.java:286) > at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53) > at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96) > at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218) > at org.testng.TestNG.runSuitesLocally(TestNG.java:1140) > at org.testng.TestNG.runSuites(TestNG.java:1069) > at org.testng.TestNG.run(TestNG.java:1037) > at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:102) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > ... > @TheRealMDoerr Thanks for the patch, I've added it to the PR. Thanks for adding it! I wonder if the native_invoker_size_per_arg thing still works good enough. We may exceed the computed size, now, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768610367 From mdoerr at openjdk.org Wed Oct 18 14:47:59 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Oct 2023 14:47:59 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v5] In-Reply-To: References: Message-ID: <5f9gy8HbHFVJAgWJ1XIUsxwSwFhyRD9s2z092DlNKpQ=.6de10af8-51b0-4802-84a8-88d37048e07b@github.com> On Wed, 18 Oct 2023 13:49:40 GMT, Jorn Vernee wrote: > > Note: This error is not related to this PR. It was broken by https://github.com/openjdk/jdk/commit/b12c471a990eb8f789410a20084918368c655659 which is incorrect for Big Endian. Should I file a new issue or is that already known? > > Please file a new issue. Filed https://bugs.openjdk.org/browse/JDK-8318454. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768623075 From jvernee at openjdk.org Wed Oct 18 14:51:01 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 14:51:01 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v5] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 14:38:41 GMT, Martin Doerr wrote: > I wonder if the native_invoker_size_per_arg thing still works good enough. We may exceed the computed size, now, right? Good point. I'll have a look at enhancing the test we have for this. Intuitively, I think it will be okay. It's true that we generate more code to add the oops and offsets together, but at the same time, we don't have any code to shuffle the offsets. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768630249 From macarte at openjdk.org Wed Oct 18 15:28:13 2023 From: macarte at openjdk.org (Mat Carter) Date: Wed, 18 Oct 2023 15:28:13 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics Message-ID: Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) Passes tier1 on linux (x86) and mac (aarch64) ------------- Commit messages: - jfr profiles - added requirement that java test is not run with interperter only - Addressed feedback: fix coding style and replaces int with uint - Replaced magic numbers - Update src/hotspot/share/jfr/metadata/metadata.xml - Update src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp - Update src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp - Update src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp - Add periodic jfr CompilerQueueUtilization event Changes: https://git.openjdk.org/jdk/pull/16211/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16211&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317562 Stats: 248 lines in 10 files changed: 248 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16211/head:pull/16211 PR: https://git.openjdk.org/jdk/pull/16211 From karianna at openjdk.org Wed Oct 18 15:28:19 2023 From: karianna at openjdk.org (Martijn Verburg) Date: Wed, 18 Oct 2023 15:28:19 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 00:35:54 GMT, Mat Carter wrote: > Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) > > Passes tier1 on linux (x86) and mac (aarch64) very nitpicky - take or leave the suggestions :-) src/hotspot/share/compiler/compileBroker.hpp line 121: > 119: int size() const { return _size; } > 120: > 121: int get_peak_size() const { return _peak_size; } whitespace looks out by one? src/hotspot/share/jfr/metadata/metadata.xml line 861: > 859: > 860: > 861: Suggestion: src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp line 37: > 35: }; > 36: > 37: // If current counters are less than previous we assume the interface has been reset Suggestion: // If current counters are less than previous, we assume the interface has been reset src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp line 49: > 47: void JfrCompilerQueueUtilization::send_events() { > 48: static CompilerQueueEntry compilerQueueEntries[2] = { > 49: {CompileBroker::c1_compile_queue(), 1, 0,0}, Suggestion: {CompileBroker::c1_compile_queue(), 1, 0, 0}, src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp line 50: > 48: static CompilerQueueEntry compilerQueueEntries[2] = { > 49: {CompileBroker::c1_compile_queue(), 1, 0,0}, > 50: {CompileBroker::c2_compile_queue(), 2, 0,0}}; Suggestion: {CompileBroker::c2_compile_queue(), 2, 0, 0}}; src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp line 55: > 53: static JfrTicks last_sample_instant; > 54: const JfrTickspan interval = cur_time - last_sample_instant; > 55: for(int i = 0; i < 2; i ++) 2 is a magic number, maybe a comment above the for declartion explaining why? ------------- Changes requested by karianna (no project role). PR Review: https://git.openjdk.org/jdk/pull/16211#pullrequestreview-1681156472 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1361399313 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1361399583 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1361399733 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1361400079 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1361400114 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1361400356 From cslucas at openjdk.org Wed Oct 18 15:28:22 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 18 Oct 2023 15:28:22 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 00:35:54 GMT, Mat Carter wrote: > Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) > > Passes tier1 on linux (x86) and mac (aarch64) Some quick comments. src/hotspot/share/compiler/compileBroker.hpp line 91: > 89: > 90: int _size; > 91: int _total_added; Can total be negative? src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp line 55: > 53: return ((current - old) * NANOSECS_PER_SEC) / interval.nanoseconds(); > 54: } > 55: NIT: Too many new lines? src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp line 67: > 65: const JfrTickspan interval = cur_time - last_sample_instant; > 66: for(int i = 0; i < num_compiler_queues; i ++) > 67: { I think this doesn't follow the coding style. See document in "/doc/" folder in the repository. test/jdk/jdk/jfr/event/compiler/TestCompilerQueueUtilization.java line 36: > 34: * @test > 35: * @key jfr > 36: * @requires vm.hasJFR Does it need C1 and/or C2? ------------- PR Review: https://git.openjdk.org/jdk/pull/16211#pullrequestreview-1682911744 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1362487113 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1362485381 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1362486361 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1362480494 From macarte at openjdk.org Wed Oct 18 15:28:23 2023 From: macarte at openjdk.org (Mat Carter) Date: Wed, 18 Oct 2023 15:28:23 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 17:04:02 GMT, Cesar Soares Lucas wrote: >> Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) >> >> Passes tier1 on linux (x86) and mac (aarch64) > > src/hotspot/share/compiler/compileBroker.hpp line 91: > >> 89: >> 90: int _size; >> 91: int _total_added; > > Can total be negative? no - changed to uint > test/jdk/jdk/jfr/event/compiler/TestCompilerQueueUtilization.java line 36: > >> 34: * @test >> 35: * @key jfr >> 36: * @requires vm.hasJFR > > Does it need C1 and/or C2? yes requires c1 or c2 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1362628813 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1362629511 From macarte at openjdk.org Wed Oct 18 15:28:24 2023 From: macarte at openjdk.org (Mat Carter) Date: Wed, 18 Oct 2023 15:28:24 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 01:06:29 GMT, Martijn Verburg wrote: >> Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) >> >> Passes tier1 on linux (x86) and mac (aarch64) > > src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp line 55: > >> 53: static JfrTicks last_sample_instant; >> 54: const JfrTickspan interval = cur_time - last_sample_instant; >> 55: for(int i = 0; i < 2; i ++) > > 2 is a magic number, maybe a comment above the for declartion explaining why? Thanks for the feedback, made this clearer ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1362382032 From macarte at openjdk.org Wed Oct 18 15:28:25 2023 From: macarte at openjdk.org (Mat Carter) Date: Wed, 18 Oct 2023 15:28:25 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 19:04:00 GMT, Mat Carter wrote: >> test/jdk/jdk/jfr/event/compiler/TestCompilerQueueUtilization.java line 36: >> >>> 34: * @test >>> 35: * @key jfr >>> 36: * @requires vm.hasJFR >> >> Does it need C1 and/or C2? > > yes requires c1 or c2 added requirement that java is NOT run with Xint (interpreter mode) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1362633864 From mcimadamore at openjdk.org Wed Oct 18 15:57:03 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 18 Oct 2023 15:57:03 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v6] In-Reply-To: References: Message-ID: <1IPTFSCCyviLmU7-l-tE8lhKfHxNoHUCgAslQimIA6k=.05271756-fb05-42e5-b029-120ccd4d426b@github.com> On Wed, 18 Oct 2023 14:29:01 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - add PPC impl > - add missing file Marked as reviewed by mcimadamore (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16201#pullrequestreview-1685475694 From sviswanathan at openjdk.org Wed Oct 18 16:10:02 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 18 Oct 2023 16:10:02 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v8] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 22:05:08 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments, removed unused labels Thanks a lot Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15410#issuecomment-1768857138 From amenkov at openjdk.org Wed Oct 18 17:33:35 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 18 Oct 2023 17:33:35 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v4] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 11:21:24 GMT, Hannes Greule wrote: > Fixed the wrong indent. Thank you for your review. Do I need another one or can we proceed? Hotspot changes require 2 reviewers ------------- PR Comment: https://git.openjdk.org/jdk/pull/16083#issuecomment-1769017339 From igavrilin at openjdk.org Wed Oct 18 17:35:58 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Wed, 18 Oct 2023 17:35:58 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v3] In-Reply-To: References: Message-ID: > Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. > CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. > Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. > > Tests: > Performance tests on t-head board: > With intrinsics: > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms > MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms > MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms > MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms > > Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms > MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms > MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms > MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms > > Regression tests: tier1, hotspot:tier2 on risc-v board. > > Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. > Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: > > diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java > index 6cd1353907e..0bee25366bf 100644 > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -143,12 +143,12 @@ public double ceilDouble() { > > @Benchmark > public double copySignDouble() { > - return Math.copySign(double81, doubleNegative12); > + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); > } > > @Benchmark > public float copySignFloat() { > - return Math.copySign(floatNegative99, float1); > + return Math.copySign(floatNegative99, float1) + Math.copySign(eFloat, float1) + Math.copySign... Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Remove some effects and assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16186/files - new: https://git.openjdk.org/jdk/pull/16186/files/b0a53a0c..c0a2ab95 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16186&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16186&range=01-02 Stats: 4 lines in 2 files changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16186/head:pull/16186 PR: https://git.openjdk.org/jdk/pull/16186 From jvernee at openjdk.org Wed Oct 18 17:38:24 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 17:38:24 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v7] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: add s390 support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16201/files - new: https://git.openjdk.org/jdk/pull/16201/files/07d06216..65bd8d83 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=05-06 Stats: 124 lines in 2 files changed: 53 ins; 60 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From sjayagond at openjdk.org Wed Oct 18 17:38:27 2023 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Wed, 18 Oct 2023 17:38:27 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v6] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 14:29:01 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - add PPC impl > - add missing file Add s390x port from here [S390x_Panama_heap_segments.txt](https://github.com/openjdk/jdk/files/13031418/S390x_Panama_heap_segments.txt) All tests are passing on linux S390x. One single test case is failing on Big Endian: test TestLayoutPaths.testBadAlignmentOfRoot(): failure. Similar to PPC Big Endian. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768978247 From jvernee at openjdk.org Wed Oct 18 17:38:28 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 18 Oct 2023 17:38:28 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v6] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 17:04:52 GMT, Sidraya Jayagond wrote: >> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >> >> - add PPC impl >> - add missing file > > Add s390x port from here > [S390x_Panama_heap_segments.txt](https://github.com/openjdk/jdk/files/13031418/S390x_Panama_heap_segments.txt) > > All tests are passing on linux S390x. One single test case is failing on Big Endian: > test TestLayoutPaths.testBadAlignmentOfRoot(): failure. Similar to PPC Big Endian. @sid8606 Thanks, I've added it to the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768985519 From kvn at openjdk.org Wed Oct 18 17:39:58 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 18 Oct 2023 17:39:58 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics In-Reply-To: References: Message-ID: <-O9AczApq9UKq3h7GrInmyZ-5Eap0wuE-bGQqwOrySA=.a31a9076-4629-4c9b-8a0e-8fe2778b9617@github.com> On Tue, 17 Oct 2023 00:35:54 GMT, Mat Carter wrote: > Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) > > Passes tier1 on linux (x86) and mac (aarch64) Would be interesting to test it with Graal which use `c2_compile_queue` (Graal JIT replaces C2). src/hotspot/share/compiler/compileBroker.cpp line 530: > 528: return _c2_compile_queue; > 529: } > 530: Note, `*_compiler_queue` could be `nullptr` if VM is build without C2 or C1 or when run with `-XX:-TieredCompilation` (only C2 is used) or with `-XX:TierdStopAtLevel={1,2,3}` (only C1 is used). Make sure you check it in JFR event. src/hotspot/share/compiler/compileBroker.hpp line 123: > 121: int get_peak_size() const { return _peak_size; } > 122: int get_total_added() const { return _total_added; } > 123: int get_total_removed() const { return _total_removed; } Fields are `uint` type. These accessors should also return `uint`. ------------- PR Review: https://git.openjdk.org/jdk/pull/16211#pullrequestreview-1685714293 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1364277126 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1364268812 From macarte at openjdk.org Wed Oct 18 18:11:45 2023 From: macarte at openjdk.org (Mat Carter) Date: Wed, 18 Oct 2023 18:11:45 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v2] In-Reply-To: References: Message-ID: > Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) > > Passes tier1 on linux (x86) and mac (aarch64) Mat Carter has updated the pull request incrementally with one additional commit since the last revision: fixed return type and changed NULL to nullptr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16211/files - new: https://git.openjdk.org/jdk/pull/16211/files/6413be5b..6c0b1670 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16211&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16211&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16211/head:pull/16211 PR: https://git.openjdk.org/jdk/pull/16211 From macarte at openjdk.org Wed Oct 18 18:11:48 2023 From: macarte at openjdk.org (Mat Carter) Date: Wed, 18 Oct 2023 18:11:48 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v2] In-Reply-To: <-O9AczApq9UKq3h7GrInmyZ-5Eap0wuE-bGQqwOrySA=.a31a9076-4629-4c9b-8a0e-8fe2778b9617@github.com> References: <-O9AczApq9UKq3h7GrInmyZ-5Eap0wuE-bGQqwOrySA=.a31a9076-4629-4c9b-8a0e-8fe2778b9617@github.com> Message-ID: <1hW4B6WGVoZg_Dob78Xnwz33rME20-1HlINbOwY9YsM=.f33470c8-e33f-42ac-b98f-9f94078ca2c3@github.com> On Wed, 18 Oct 2023 17:35:59 GMT, Vladimir Kozlov wrote: >> Mat Carter has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed return type and changed NULL to nullptr > > src/hotspot/share/compiler/compileBroker.cpp line 530: > >> 528: return _c2_compile_queue; >> 529: } >> 530: > > Note, `*_compiler_queue` could be `nullptr` if VM is build without C2 or C1 or when run with `-XX:-TieredCompilation` (only C2 is used) or with `-XX:TierdStopAtLevel={1,2,3}` (only C1 is used). > > Make sure you check it in JFR event. Thank you! The JFR event does check for NULL (now nullptr) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1364317003 From kvn at openjdk.org Wed Oct 18 18:27:20 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 18 Oct 2023 18:27:20 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 18:11:45 GMT, Mat Carter wrote: >> Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) >> >> Passes tier1 on linux (x86) and mac (aarch64) > > Mat Carter has updated the pull request incrementally with one additional commit since the last revision: > > fixed return type and changed NULL to nullptr Good for compiler part of changes. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16211#pullrequestreview-1685826018 From shade at openjdk.org Wed Oct 18 19:03:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 18 Oct 2023 19:03:39 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v30] In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 01:38:13 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Add call to publish in parallel gc and update counter names FYI, 64-bit atomics fallbacks for 32-bit arches are developing here: #16252. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1769145084 From matsaave at openjdk.org Wed Oct 18 19:08:46 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 18 Oct 2023 19:08:46 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 10:26:26 GMT, Andrew Dinn wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed some comments and relocated code > > src/hotspot/share/oops/cpCache.cpp line 643: > >> 641: oop ConstantPoolCache::appendix_if_resolved(int method_index) const { >> 642: ResolvedMethodEntry* method_entry = resolved_method_entry_at(method_index); >> 643: if (!method_entry->has_appendix()) > > If you move the rest of the code in this method into the a new method `ResolvedMethodEntry::appendix_if_resolved()` then you can call that method from here and also call it in places where you have already looked up the `ResolvedMethodEntry` but are still indirecting through the cache method using an `index`. ResolvedMethodEntry doesn't know about the constant pool or the cache, so that mean it will be unable to call `constant_pool()->resolved_reference_at(ref_index);`. Maybe an alternate solution would be to overload `appendix_if_resolved` so that it can take either `int method_index` or `ResolvedMethodEntry* method_entry` as arguments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1364381690 From vlivanov at openjdk.org Wed Oct 18 19:16:31 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 18 Oct 2023 19:16:31 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <0YkU44VdBbj08LTg9zw27GoMHMs1GNEz8U2nUb1wkpc=.29b8c5a2-2384-4566-9c5f-e483a1437ef2@github.com> On Wed, 18 Oct 2023 08:50:13 GMT, Andrew Haley wrote: >> Meta-question and apologies if this was covered before, but why is this logic being added to stubRoutines.cpp? > Because tha'ts where @iwanowww asked me to put it. I don't much care. The constants were duplicated in multiple places (in particular, in `macroAssembler_x86.cpp`), so I suggested to put them in some shared place and couldn't come up with a better one than `stubRoutines.cpp/hpp`. But I don't see any changes in `macroAssembler_x86.cpp` anymore. What happened to them, @theRealAph? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1769165919 From shade at openjdk.org Wed Oct 18 19:22:26 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 18 Oct 2023 19:22:26 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms In-Reply-To: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Wed, 18 Oct 2023 18:58:58 GMT, Aleksey Shipilev wrote: > See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. > > Unfortunately, we cannot test these apart from the existing gtest. > > Additional testing: > - [x] linux-x86-server-fastdebug, atomic tests pass > - [ ] linux-arm-server-fastdebug, atomic tests pass This is actually pretty generic area. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1769174715 From matsaave at openjdk.org Wed Oct 18 20:12:55 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 18 Oct 2023 20:12:55 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: <5NcuSCkW_SrmfZHFBx75iwwBd31k9kIjOIYUWqLA5TQ=.91af4e48-8f4e-499d-8a5a-5f6140f02309@github.com> On Wed, 18 Oct 2023 07:56:12 GMT, Andrew Dinn wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed some comments and relocated code > > src/hotspot/share/interpreter/rewriter.cpp line 244: > >> 242: if ((*opc) == (u1)Bytecodes::_invokevirtual || >> 243: // allow invokespecial as an alias, although it would be very odd: >> 244: ((*opc) == (u1)Bytecodes::_invokespecial && _pool->tag_at(cp_index).is_method())) { > > I'm not clear why the assert for both cases has now been folded into an additional logic check for only one case. You don't seem to have added an else clause to handle the case where the opcode is `_invokespecial` and the pool tag is not a method. Can you explain this change? This looks like a leftover of an earlier attempt to resolve the issue with invokespecial. I believe this can be reverted ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1364479834 From eosterlund at openjdk.org Wed Oct 18 21:00:47 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 18 Oct 2023 21:00:47 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms In-Reply-To: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Wed, 18 Oct 2023 18:58:58 GMT, Aleksey Shipilev wrote: > See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. > > Unfortunately, we cannot test these apart from the existing gtest. > > Additional testing: > - [x] linux-x86-server-fastdebug, atomic tests pass > - [ ] linux-arm-server-fastdebug, atomic tests pass Having monotonic counters tends to require 64 bits. I have pulled my hair out so many times that we can't have atomic monotonic counters because of the lack of 32 bit platform support for it. The fix looks great, thanks for fixing! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16252#pullrequestreview-1686131973 From fyang at openjdk.org Thu Oct 19 00:39:49 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 19 Oct 2023 00:39:49 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v3] In-Reply-To: References: Message-ID: <8hnDLf3rfTCEGEOMPYJIbdK0xiqd1YM_p6oZz6zTx4M=.43701960-46d8-4a6b-8373-f03649eb75c3@github.com> On Wed, 18 Oct 2023 17:35:58 GMT, Ilya Gavrilin wrote: >> Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. >> CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. >> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. >> >> Tests: >> Performance tests on t-head board: >> With intrinsics: >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms >> MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms >> MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms >> MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms >> >> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms >> MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms >> MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms >> MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms >> >> Regression tests: tier1, hotspot:tier2 on risc-v board. >> >> Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. >> Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: >> >> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> index 6cd1353907e..0bee25366bf 100644 >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -143,12 +143,12 @@ public double ceilDouble() { >> >> @Benchmark >> public double copySignDouble() { >> - return Math.copySign(double81, doubleNegative12); >> + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); >> } >> >> @Benchmark >> public float copySignFloat() { >> - return Math.copySign(floatNegative99, float1); >> + return ... > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Remove some effects and assert Updated change looks good. Thanks. JMH data on hifive unmatched for reference: Before: Benchmark (seed) Mode Cnt Score Error Units MathBench.copySignDouble 0 thrpt 8 79728.042 ? 8211.438 ops/ms MathBench.copySignFloat 0 thrpt 8 79516.930 ? 13163.477 ops/ms MathBench.sigNumDouble 0 thrpt 8 58204.403 ? 6795.238 ops/ms MathBench.signumFloat 0 thrpt 8 57882.056 ? 3635.354 ops/ms After: Benchmark (seed) Mode Cnt Score Error Units MathBench.copySignDouble 0 thrpt 8 104301.832 ? 7170.917 ops/ms MathBench.copySignFloat 0 thrpt 8 103008.851 ? 11722.187 ops/ms MathBench.signumDouble 0 thrpt 8 64465.030 ? 6849.148 ops/ms MathBench.signumFloat 0 thrpt 8 63987.290 ? 4298.311 ops/ms ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1686455232 From dholmes at openjdk.org Thu Oct 19 01:25:55 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Oct 2023 01:25:55 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v7] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 17:38:24 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > add s390 support src/hotspot/share/gc/shared/gcLocker.cpp line 139: > 137: // has called `jni_unlock`, but not yet finished the call, e.g. initiating > 138: // a GCCause::_gc_locker GC. > 139: log_debug_jni("Blocked from entering critical section while waiting on GC."); Seems unrelated to current MR - leftover debugging code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1364763295 From darcy at openjdk.org Thu Oct 19 01:32:03 2023 From: darcy at openjdk.org (Joe Darcy) Date: Thu, 19 Oct 2023 01:32:03 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Tue, 17 Oct 2023 11:43:59 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: > > - Review feedback > - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 > - Remove change to RestoreMXCSROnJNICalls src/hotspot/os/linux/os_linux.cpp line 1853: > 1851: > 1852: #ifndef IA32 > 1853: // Quickly test to make sure denormals are correctly handled. Nit: I recommend using "subnormal" rather than "denormal" for general terminology on this point. While "denormal" was used in the original IEEE 754 standard from 1985, subsequent iterations of the standard using "subnormal" The term "subnormal" has also been used for the last several editions of JLS and JVMS. src/hotspot/share/runtime/stubRoutines.cpp line 333: > 331: // performed at runtime. Making _small_denormal volatile ensures > 332: // that the following expression isn't evaluated at compile time: > 333: return (_large_denormal + _small_denormal == _large_denormal As a possible future expansion, if there are cases where foreign or native code sets the rounding mode to something other than round to nearest, expressions in the same vein can be used to detect that case and restore that other aspect of the control word. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1364765074 PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1364766165 From jvernee at openjdk.org Thu Oct 19 01:55:56 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 19 Oct 2023 01:55:56 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v7] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 01:22:46 GMT, David Holmes wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> add s390 support > > src/hotspot/share/gc/shared/gcLocker.cpp line 139: > >> 137: // has called `jni_unlock`, but not yet finished the call, e.g. initiating >> 138: // a GCCause::_gc_locker GC. >> 139: log_debug_jni("Blocked from entering critical section while waiting on GC."); > > Seems unrelated to current MR - leftover debugging code? Yes, good catch. Leftover from when this patch used GCLocker (which turned out to be not needed in the end) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1364783686 From dholmes at openjdk.org Thu Oct 19 02:18:51 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Oct 2023 02:18:51 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms In-Reply-To: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Wed, 18 Oct 2023 18:58:58 GMT, Aleksey Shipilev wrote: > See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. > > Unfortunately, we cannot test these apart from the existing gtest. > > Additional testing: > - [x] linux-x86-server-fastdebug, atomic tests pass > - [ ] linux-arm-server-fastdebug, atomic tests pass Looks good! Thanks for fixing. I'm a bit surprised this wasn't done from the start ... makes me wonder how we missed it. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16252#pullrequestreview-1686533632 From dholmes at openjdk.org Thu Oct 19 02:22:51 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Oct 2023 02:22:51 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v5] In-Reply-To: References: Message-ID: <2QN5S1ENuCII3Wr2Z9z_o_27GxDSOEmvlKMoCQPiRFM=.1d13fc58-f21a-4018-9faf-a30befd3c46c@github.com> On Wed, 18 Oct 2023 11:25:19 GMT, Hannes Greule wrote: >> See the bug description for more information. >> >> This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > fix indent This looks reasonable to me but I'd really like @fparain to review this. Thanks, ------------- PR Review: https://git.openjdk.org/jdk/pull/16083#pullrequestreview-1686536467 From iklam at openjdk.org Thu Oct 19 06:38:45 2023 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Oct 2023 06:38:45 GMT Subject: RFR: 8318484: Initial version of cdsConfig.hpp Message-ID: This is the first step for [JDK-8318483 - Move CDS configuration management into cdsConfig.hpp](https://bugs.openjdk.org/browse/JDK-8318483) - Remove `Arguments::is_dumping_archive()` and `Arguments assert_is_dumping_archive()` - Add the following new APIs class CDSConfig { static bool is_dumping_archive(); static bool is_dumping_static_archive(); static bool is_dumping_dynamic_archive(); static bool is_dumping_heap(); }; - Convert some use of `DumpSharedSpaces` and `DynamicDumpSharedSpaces` to these new APIs (More APIs will be added in future sub tasks of [JDK-8318483](https://bugs.openjdk.org/browse/JDK-8318483)) ------------- Commit messages: - 8318484: Initial version of cdsConfig.hpp Changes: https://git.openjdk.org/jdk/pull/16257/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16257&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318484 Stats: 236 lines in 36 files changed: 125 ins; 16 del; 95 mod Patch: https://git.openjdk.org/jdk/pull/16257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16257/head:pull/16257 PR: https://git.openjdk.org/jdk/pull/16257 From dholmes at openjdk.org Thu Oct 19 07:00:34 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Oct 2023 07:00:34 GMT Subject: RFR: 8318484: Initial version of cdsConfig.hpp In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 05:56:53 GMT, Ioi Lam wrote: > This is the first step for [JDK-8318483 - Move CDS configuration management into cdsConfig.hpp](https://bugs.openjdk.org/browse/JDK-8318483) > > - Remove `Arguments::is_dumping_archive()` and `Arguments assert_is_dumping_archive()` > - Add the following new APIs > > > class CDSConfig { > static bool is_dumping_archive(); > static bool is_dumping_static_archive(); > static bool is_dumping_dynamic_archive(); > static bool is_dumping_heap(); > }; > > > - Convert some use of `DumpSharedSpaces` and `DynamicDumpSharedSpaces` to these new APIs > > (More APIs will be added in future sub tasks of [JDK-8318483](https://bugs.openjdk.org/browse/JDK-8318483)) Initial refactoring looks good. One query below. Thanks src/hotspot/share/cds/metaspaceShared.cpp line 778: > 776: > 777: #if INCLUDE_CDS_JAVA_HEAP > 778: if (CDSConfig::is_dumping_heap()) { This seems a new condition. Why is it needed now? ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16257#pullrequestreview-1686863583 PR Review Comment: https://git.openjdk.org/jdk/pull/16257#discussion_r1365002696 From eosterlund at openjdk.org Thu Oct 19 07:43:30 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 19 Oct 2023 07:43:30 GMT Subject: RFR: 8316436: ContinuationWrapper uses unhandled nullptr oop [v2] In-Reply-To: References: Message-ID: <88a4FEG25ZdO4GSfsVvY2X-oroswJMnJk_sz3hoIQCo=.ef1736d6-6c9d-4b25-8475-2bbe8c30883a@github.com> On Tue, 19 Sep 2023 12:35:41 GMT, Stefan Karlsson wrote: >> The ZGC oop verification code in combination with CheckUnhandledOops finds an unhandled oop in ContinuationWrapper: >> >> >> Test java/lang/Thread/virtual/stress/Skynet.java#ZGenerational with ' -XX:+CheckUnhandledOops' crashes with >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (src/hotspot/share/gc/z/zAddress.inline.hpp:296), pid=986260, tid=986296 >> # assert(!assert_on_failure) failed: Has low-order bits set: 0xfffffffffffffff1 >> >> V [libjvm.so+0x1962fda] initialize_check_oop_function()::{lambda(oopDesc*)#1}::_FUN(oopDesc*)+0x5a (zAddress.inline.hpp:296) >> V [libjvm.so+0xa6d484] ContinuationWrapper::~ContinuationWrapper()+0x24 (oopsHierarchy.hpp:89) >> V [libjvm.so+0xa66c83] int freeze_internal >(JavaThread*, long*)+0x373 (continuationFreezeThaw.cpp:1584) >> V [libjvm.so+0xa6711b] int freeze >(JavaThread*, long*)+0x5b (continuationFreezeThaw.cpp:272) >> J 216 jdk.internal.vm.Continuation.doYield()I [java.base at 22-internal](mailto:java.base at 22-internal) (0 bytes) @ 0x00007f614c630875 [0x00007f614c630820+0x0000000000000055] >> >> >> This is the scenario that triggers this bug: >> 1) ContinuationWrapper is created on the stack >> 2) We enter a JRT_BLOCK section >> 3) Call ContinuationWrapper::done() >> 4) Exit the JRT_BLOCK >> 5) ~ContinuationWrapper is called >> >> (3) sets ContinuationWrapper::_continuation to nullptr >> (4) hits a safepoint and sets ContinuationWrapper::_continuation to 0xfffffffffffffff1 >> (5) uses ContinuationWrapper::_continuation in `_continuation != nullptr`, which triggers ZGC's verification code that finds the broken oop. >> >> So, this crashes with ZGC, but that's because ZGC finds a broken usage of _continuation. To show that this is still a problem with other GCs I added this assert: >> >> diff --git a/src/hotspot/share/runtime/javaThread.hpp b/src/hotspot/share/runtime/javaThread.hpp >> index 40205d324a6..80b60d0b7b8 100644 >> --- a/src/hotspot/share/runtime/javaThread.hpp >> +++ b/src/hotspot/share/runtime/javaThread.hpp >> @@ -258,7 +258,7 @@ class JavaThread: public Thread { >> >> public: >> void inc_no_safepoint_count() { _no_safepoint_count++; } >> - void dec_no_safepoint_count() { _no_safepoint_count--; } >> + void dec_no_safepoint_count() { _no_safepoint_count--; assert(_no_safepoint_count >= 0, "Catch G1 in the act!"); } >> #endif // ASSERT >> public: >> // These functions check conditions before possibly going to ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix thread argument Looks good. I don't really mind if the _done flag is in there in release as well; don't think that it makes a difference and it's easier to read. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15810#pullrequestreview-1686944926 From adinn at openjdk.org Thu Oct 19 07:48:53 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 19 Oct 2023 07:48:53 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 19:06:20 GMT, Matias Saavedra Silva wrote: >> src/hotspot/share/oops/cpCache.cpp line 643: >> >>> 641: oop ConstantPoolCache::appendix_if_resolved(int method_index) const { >>> 642: ResolvedMethodEntry* method_entry = resolved_method_entry_at(method_index); >>> 643: if (!method_entry->has_appendix()) >> >> If you move the rest of the code in this method into the a new method `ResolvedMethodEntry::appendix_if_resolved()` then you can call that method from here and also call it in places where you have already looked up the `ResolvedMethodEntry` but are still indirecting through the cache method using an `index`. > > ResolvedMethodEntry doesn't know about the constant pool or the cache, so that means it will be unable to call `constant_pool()->resolved_reference_at(ref_index);`. Maybe an alternate solution would be to overload `appendix_if_resolved` so that it can take either `int method_index` or `ResolvedMethodEntry* method_entry` as arguments. Doh! Yes, of course. Overloading will allow the repeated lookup to be avoided. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1365058940 From shade at openjdk.org Thu Oct 19 08:03:21 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Oct 2023 08:03:21 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v12] In-Reply-To: <7Or8Vl0dQBUVIaiq6SSDn-PujHmw3YYeVyUyrGCSDLk=.768595f4-955b-4733-8bac-9b2dd7272d3f@github.com> References: <7Or8Vl0dQBUVIaiq6SSDn-PujHmw3YYeVyUyrGCSDLk=.768595f4-955b-4733-8bac-9b2dd7272d3f@github.com> Message-ID: On Wed, 18 Oct 2023 08:34:04 GMT, Aleksey Shipilev wrote: >> See more details in the bug and related issues. >> >> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. >> >> This implements mitigation on most current architectures: >> - ? x86_64: implemented >> - ? x86_32: considered, abandoned; cannot be easily done without blowing up code size >> - ? AArch64: implemented >> - ? ARM32: considered, abandoned; needs cleanups and testing; see [JDK-8318414](https://bugs.openjdk.org/browse/JDK-8318414) >> - ? PPC64: implemented, thanks @TheRealMDoerr >> - ? S390: implemented, thanks @offamitkumar >> - ? RISC-V: implemented, thanks @RealFYang >> - ? Zero: does not need implementation >> >> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. >> >> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. >> >> I believe we can go in with `1000` as the default, given the experimental results mentioned in this PR. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request incrementally with four additional commits since the last revision: > > - Editorial cleanups > - RISC-V implementation > - Mention ARM32 bug > - Make sure benchmark runs with C1 Deeper performance evaluation shows that Dacapo:pmd has regressions, that get linearly worse as we get into larger backoffs. They reach 8% at backoff=10000. I am currently investigating the cause. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1770271456 From duke at openjdk.org Thu Oct 19 08:51:47 2023 From: duke at openjdk.org (Francesco Nigro) Date: Thu, 19 Oct 2023 08:51:47 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v12] In-Reply-To: References: <7Or8Vl0dQBUVIaiq6SSDn-PujHmw3YYeVyUyrGCSDLk=.768595f4-955b-4733-8bac-9b2dd7272d3f@github.com> Message-ID: <_t0vIVBMsOHSIQVYHA2FAstL3cwkvHhr6lmSEn59q5A=.0b8bea1e-5c28-4a63-93d0-612f3cf23558@github.com> On Thu, 19 Oct 2023 08:00:59 GMT, Aleksey Shipilev wrote: >> Aleksey Shipilev has updated the pull request incrementally with four additional commits since the last revision: >> >> - Editorial cleanups >> - RISC-V implementation >> - Mention ARM32 bug >> - Make sure benchmark runs with C1 > > Deeper performance evaluation shows that Dacapo:pmd has regressions, that get linearly worse as we get into larger backoffs. They reach 8% at backoff=10000 with focused single-threaded tests. I am currently investigating the cause. @shipilev At the time of https://github.com/netty/netty/issues/12708 I've prepared a microbenchmark which didn't requires Netty and was decent enough (warmup phase should be improved) to implement a "fix" and which contains a code pattern that will become even more relevant in the future, thanks to type switch (that I've already verified is affected by https://bugs.openjdk.org/browse/JDK-8180450 as well, at https://github.com/franz1981/java-puzzles/commit/ac38cd07b0207345295a9ed72180c60050aa0e5a) and is pretty common in middleware/low-level frameworks ie - having an `Object` signature (eg typical of Hibernate/ORMs where user defined types are "enhanced" by implementing specific behaviours, but which types are uncknown to the framework entry points/signature) - using chains of type checks to verify presence of specific "traits" to feed a state-machine of some type - concrete objects which implements sets of "traits" (ie interfaces) flowing through the state machine The microbenchmark was at https://github.com/franz1981/java-puzzles/commit/0d0579514d5be9cf8cf0fa71101e208ced5e3d28 (all classes but not `FalseSharingInstanceOfBenchmark`) and the entry point was `InstanceOfScalabilityBenchmark`. The pattern is exactly the same of the Netty issue which affected Vertx/Quarkus, meaning that it let 2 concrete types to be checked against different type checks, but because of it, in case https://github.com/openjdk/jdk/pull/14375 kicks-in, it won't work as expected; this translate into stopping at C1 (again, like the other tests in this pr) or adding a third/forth concrete type during warmup. **Note**: Some of the many DONT_INLINE could be simplified, to be sure inlining won't play any factor to simplify some type checks, but AFAIK it shouldn't be necessary here. If the benchmark looks decent (once cleaned up) to verify the effects o the path for others user patterns, we could make use of it. ATM I don't have free cycles to try it myself now, but talking with other team members to see how we can contribute. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1770352226 From ayang at openjdk.org Thu Oct 19 08:59:52 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 19 Oct 2023 08:59:52 GMT Subject: RFR: 8318489: Remove unused alignment_unit and alignment_offset Message-ID: Trivial removing dead code. ------------- Commit messages: - trivial Changes: https://git.openjdk.org/jdk/pull/16263/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16263&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318489 Stats: 24 lines in 4 files changed: 0 ins; 24 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16263.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16263/head:pull/16263 PR: https://git.openjdk.org/jdk/pull/16263 From rrich at openjdk.org Thu Oct 19 09:16:04 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Oct 2023 09:16:04 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v23] In-Reply-To: References: Message-ID: <1HXBowkNZo1iyNgOVA6qZYcyz0alZ8r5FBMQ3FqAvTE=.8a95691e-f02d-486a-96c3-88d5ec836228@github.com> > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 36 additional commits since the last revision: - Use better name: _preprocessing_active_workers - Merge branch 'master' - Remove obsolete comment - Feedback Albert - Merge branch 'master' - Re-cleanup (was accidentally reverted) - Make sure to scan obj reaching in just once - Simplification suggested by Albert - Don't overlap card table processing with scavenging for simplicity - Cleanup - ... and 26 more: https://git.openjdk.org/jdk/compare/71431ab6...f7965512 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/607f0c22..f7965512 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=21-22 Stats: 14751 lines in 659 files changed: 9415 ins; 2652 del; 2684 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Thu Oct 19 09:22:50 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Oct 2023 09:22:50 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v11] In-Reply-To: References: Message-ID: On Wed, 27 Sep 2023 14:13:23 GMT, Thomas Schatzl wrote: >> Richard Reingruber has updated the pull request incrementally with five additional commits since the last revision: >> >> - Eliminate special case for scanning the large array end >> - First card of large array should be cleared if dirty >> - Do all large array scanning in separate method >> - Limit stripe size to 1m with at least 8 threads >> - Small clean-ups > > Hi, > >> > I experimented with the aforementioned read-only card table idea a bit and here is the draft: >> > https://github.com/openjdk/jdk/compare/master...albertnetymk:jdk:pgc-precise-obj-arr?expand=1 >> >> This looks very nice! The code is a lot easier to follow than the baseline and this pr. >> >> With your draft I found out too that the regressions with just 2 threads come from the remaining `object_start` calls. Larger stripes mean fewer of them. The caching used in your draft is surly better. >> >> So by default 1 card table byte per 512b card is needed. The shadow card table will require 2M per gigabyte used old generation. I guess that's affordable. >> >> Would you think that your solution can be backported? > > I had a brief look at @albertnetymk's suggestion, a few comments: > > * it uses another card table - while "just" another 0.2% of the heap, we should try to avoid such regressions. G1 also does not need another card table... maybe some more effort should be put into optimizing that one away. > * obviously allocating and freeing during the pause is suboptimal wrt to pause time so the prototype should be improved in that regard :) > * the copying will stay (if there is a second card table), I would be interested in pause time changes for more throughput'y applications (jbb2005, timefold/optaplanner https://timefold.ai/blog/2023/java-21-performance) > * anything can be backported, but the question is whether the individual maintainers of these versions are going to. It does have a good case though which may make it easier to convince maintainers. > > Hth, > Thomas I don't intend to make any further changes or tests. So the pr would be ready for another look at it, @tschatzl , if you want. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1770407514 From aph at openjdk.org Thu Oct 19 09:36:51 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 19 Oct 2023 09:36:51 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 18 Oct 2023 08:50:13 GMT, Andrew Haley wrote: >> Meta-question and apologies if this was covered before, but why is this logic being added to stubRoutines.cpp? > >> Meta-question and apologies if this was covered before, but why is this logic being added to stubRoutines.cpp? > > Because tha'ts where @iwanowww asked me to put it. I don't much care. > But I don't see any changes in `macroAssembler_x86.cpp` anymore. What happened to them, @theRealAph? I took them out because of a potential backwards-compatibility breakage. They weren't critical to the core purpose of this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1770432250 From aph at openjdk.org Thu Oct 19 09:36:55 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 19 Oct 2023 09:36:55 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Thu, 19 Oct 2023 01:26:43 GMT, Joe Darcy wrote: >> Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: >> >> - Review feedback >> - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 >> - Remove change to RestoreMXCSROnJNICalls > > src/hotspot/os/linux/os_linux.cpp line 1853: > >> 1851: >> 1852: #ifndef IA32 >> 1853: // Quickly test to make sure denormals are correctly handled. > > Nit: I recommend using "subnormal" rather than "denormal" for general terminology on this point. While "denormal" was used in the original IEEE 754 standard from 1985, subsequent iterations of the standard using "subnormal" The term "subnormal" has also been used for the last several editions of JLS and JVMS. I've long avoided "subnormal" because subnormal in British English, adjective 3. [old-fashioned, offensive] a person of low intelligence ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1365207067 From simonis at openjdk.org Thu Oct 19 09:57:44 2023 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 19 Oct 2023 09:57:44 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v30] In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 01:38:13 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Add call to publish in parallel gc and update counter names This looks good to me now. I only propose to wait until #16252 will be pushed (it already has two reviews) and than rebase on top of that change and use the generic 64-bit atomics instead of your handcrafted solution. That will remove the need for the clumsy `#ifdef _LP64` code. Once that is done, I'll approve this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1770469810 From ayang at openjdk.org Thu Oct 19 12:00:54 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 19 Oct 2023 12:00:54 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v30] In-Reply-To: References: Message-ID: On Fri, 13 Oct 2023 01:38:13 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Add call to publish in parallel gc and update counter names Since it introduces new APIs in `CollectedHeap`... ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1770750667 From igavrilin at openjdk.org Thu Oct 19 12:14:52 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Thu, 19 Oct 2023 12:14:52 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v4] In-Reply-To: References: Message-ID: > Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. > CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. > Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. > > Tests: > Performance tests on t-head board: > With intrinsics: > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms > MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms > MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms > MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms > > Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms > MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms > MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms > MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms > > Regression tests: tier1, hotspot:tier2 on risc-v board. > > Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. > Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: > > diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java > index 6cd1353907e..0bee25366bf 100644 > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -143,12 +143,12 @@ public double ceilDouble() { > > @Benchmark > public double copySignDouble() { > - return Math.copySign(double81, doubleNegative12); > + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); > } > > @Benchmark > public float copySignFloat() { > - return Math.copySign(floatNegative99, float1); > + return Math.copySign(floatNegative99, float1) + Math.copySign(eFloat, float1) + Math.copySign... Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Changed branch inside signum implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16186/files - new: https://git.openjdk.org/jdk/pull/16186/files/c0a2ab95..867d6e8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16186&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16186&range=02-03 Stats: 10 lines in 1 file changed: 3 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16186/head:pull/16186 PR: https://git.openjdk.org/jdk/pull/16186 From vkempik at openjdk.org Thu Oct 19 12:20:41 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 19 Oct 2023 12:20:41 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v4] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 12:14:52 GMT, Ilya Gavrilin wrote: >> Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. >> CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. >> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. >> >> Tests: >> Performance tests on t-head board: >> With intrinsics: >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms >> MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms >> MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms >> MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms >> >> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms >> MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms >> MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms >> MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms >> >> Regression tests: tier1, hotspot:tier2 on risc-v board. >> >> Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. >> Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: >> >> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> index 6cd1353907e..0bee25366bf 100644 >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -143,12 +143,12 @@ public double ceilDouble() { >> >> @Benchmark >> public double copySignDouble() { >> - return Math.copySign(double81, doubleNegative12); >> + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); >> } >> >> @Benchmark >> public float copySignFloat() { >> - return Math.copySign(floatNegative99, float1); >> + return ... > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Changed branch inside signum implementation Looks good, last change is pretty similar to what was done here https://github.com/openjdk/jdk/pull/13800/commits/00e16a6726c258ad1409c7c671b9742bf8448a55 ------------- Marked as reviewed by vkempik (Committer). PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1687538633 From igavrilin at openjdk.org Thu Oct 19 12:20:42 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Thu, 19 Oct 2023 12:20:42 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v4] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 12:14:52 GMT, Ilya Gavrilin wrote: >> Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. >> CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. >> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. >> >> Tests: >> Performance tests on t-head board: >> With intrinsics: >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms >> MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms >> MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms >> MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms >> >> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms >> MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms >> MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms >> MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms >> >> Regression tests: tier1, hotspot:tier2 on risc-v board. >> >> Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. >> Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: >> >> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> index 6cd1353907e..0bee25366bf 100644 >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -143,12 +143,12 @@ public double ceilDouble() { >> >> @Benchmark >> public double copySignDouble() { >> - return Math.copySign(double81, doubleNegative12); >> + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); >> } >> >> @Benchmark >> public float copySignFloat() { >> - return Math.copySign(floatNegative99, float1); >> + return ... > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Changed branch inside signum implementation Change branch inside `signum`, now `dst` always moves to `src`, so we can remove `j(done); ` Performance results: Without intrinsic: Benchmark (seed) Mode Cnt Score Error Units MathBench.signumDouble 0 thrpt 8 35666.674 ? 6.317 ops/ms MathBench.signumFloat 0 thrpt 8 34040.220 ? 13.783 ops/ms With old version: Benchmark (seed) Mode Cnt Score Error Units MathBench.signumDouble 0 thrpt 8 41601.513 ? 16.570 ops/ms MathBench.signumFloat 0 thrpt 8 39414.511 ? 28.290 ops/ms With new version: Benchmark (seed) Mode Cnt Score Error Units MathBench.signumDouble 0 thrpt 8 44060.456 ? 12.483 ops/ms MathBench.signumFloat 0 thrpt 8 40481.776 ? 28.512 ops/ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/16186#issuecomment-1770835814 From fyang at openjdk.org Thu Oct 19 12:25:41 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 19 Oct 2023 12:25:41 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v4] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 12:14:52 GMT, Ilya Gavrilin wrote: >> Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. >> CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. >> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. >> >> Tests: >> Performance tests on t-head board: >> With intrinsics: >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms >> MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms >> MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms >> MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms >> >> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms >> MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms >> MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms >> MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms >> >> Regression tests: tier1, hotspot:tier2 on risc-v board. >> >> Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. >> Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: >> >> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> index 6cd1353907e..0bee25366bf 100644 >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -143,12 +143,12 @@ public double ceilDouble() { >> >> @Benchmark >> public double copySignDouble() { >> - return Math.copySign(double81, doubleNegative12); >> + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); >> } >> >> @Benchmark >> public float copySignFloat() { >> - return Math.copySign(floatNegative99, float1); >> + return ... > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Changed branch inside signum implementation Still good. You might want to correct the remaining typo. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1660: > 1658: // otherwise return +/- 1.0 using sign of input. > 1659: // one - gives us a floating-point 1.0 (got from matching rule) > 1660: // bool is_double - specififes single or double precision operations will be used. Suggestion: s/specififes/specifies/ ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1687549814 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1365427533 From igavrilin at openjdk.org Thu Oct 19 12:35:19 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Thu, 19 Oct 2023 12:35:19 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v5] In-Reply-To: References: Message-ID: > Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. > CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. > Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. > > Tests: > Performance tests on t-head board: > With intrinsics: > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms > MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms > MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms > MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms > > Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms > MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms > MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms > MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms > > Regression tests: tier1, hotspot:tier2 on risc-v board. > > Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. > Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: > > diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java > index 6cd1353907e..0bee25366bf 100644 > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -143,12 +143,12 @@ public double ceilDouble() { > > @Benchmark > public double copySignDouble() { > - return Math.copySign(double81, doubleNegative12); > + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); > } > > @Benchmark > public float copySignFloat() { > - return Math.copySign(floatNegative99, float1); > + return Math.copySign(floatNegative99, float1) + Math.copySign(eFloat, float1) + Math.copySign... Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Fix typo in c2 MacroAssembler ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16186/files - new: https://git.openjdk.org/jdk/pull/16186/files/867d6e8e..c79fb9e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16186&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16186&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16186/head:pull/16186 PR: https://git.openjdk.org/jdk/pull/16186 From igavrilin at openjdk.org Thu Oct 19 12:35:19 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Thu, 19 Oct 2023 12:35:19 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v4] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 12:22:29 GMT, Fei Yang wrote: > Still good. You might want to correct the remaining typo. Thanks for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16186#issuecomment-1770876188 From fparain at openjdk.org Thu Oct 19 12:53:42 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 19 Oct 2023 12:53:42 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v5] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 11:25:19 GMT, Hannes Greule wrote: >> See the bug description for more information. >> >> This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > fix indent Changes look good to me. Thank you for fixing this. Fred ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16083#pullrequestreview-1687649293 From mdoerr at openjdk.org Thu Oct 19 13:01:45 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 19 Oct 2023 13:01:45 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v5] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 14:47:58 GMT, Jorn Vernee wrote: > > I wonder if the native_invoker_size_per_arg thing still works good enough. We may exceed the computed size, now, right? > > Good point. I'll have a look at enhancing the test we have for this. > > Intuitively, I think it will be okay. It's true that we generate more code to add the oops and offsets together, but at the same time, we don't have any code to shuffle the offsets. Looks like we use 2 input regs per obj, so we reserve 2x native_invoker_size_per_arg. However, the case in which `reg_oop` and `reg_offset` are on stack together with `arg_shuffle` can produce more than this size (depending on platform). If we pass lots of objects on stack, we may exceed the computed size. Correct? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1770938705 From shade at openjdk.org Thu Oct 19 13:35:40 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Oct 2023 13:35:40 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms In-Reply-To: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Wed, 18 Oct 2023 18:58:58 GMT, Aleksey Shipilev wrote: > See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. > > Unfortunately, we cannot test these apart from the existing gtest. > > Additional testing: > - [x] linux-x86-server-fastdebug, atomic tests pass > - [ ] linux-arm-server-fastdebug, atomic tests pass ARM32 comes back with a massive wrinkle: even though `PlatformCmpxchg<8>` is defined for it, it would assert later, either in `reorder_cmpxchg_long_func` that checks `VM_Version::supports_cx8()`, or later in the stub what would not know what to do on the platform where there is no way to do the 64-bit stub. I guess in that case we would need to go a do a lock, like Access API does it with `AccessLocker`. (I was hoping it does not come to that, but...) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1770999449 From shade at openjdk.org Thu Oct 19 13:42:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Oct 2023 13:42:39 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms In-Reply-To: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Wed, 18 Oct 2023 18:58:58 GMT, Aleksey Shipilev wrote: > See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. > > Unfortunately, we cannot test these apart from the existing gtest. > > Additional testing: > - [x] linux-x86-server-fastdebug, atomic tests pass > - [ ] linux-arm-server-fastdebug, atomic tests pass This situation, by the way, contradicts the requirement that `atomic.hpp` has: // Platform-specific implementation of cmpxchg. Support for sizes // of 1, 4, and 8 are required. The class is a function object that // must be default constructable, with these requirements: ... template struct PlatformCmpxchg; ...so arguably ARM32 implementation violates `Atomic` contract here. This would require some fiddling on ARM32 side to satisfy the contract, because ARM32, AFAIU, does not guarantee the availability on 64-bit atomics. x86_32 gets away with it by going into `.S` that does `cmpxchg8b`, available in all current x86 implementations since Pentium. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1771012765 From lucy at openjdk.org Thu Oct 19 13:46:46 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 19 Oct 2023 13:46:46 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v7] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 17:38:24 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > add s390 support See inline comments for s390 part. I didn't review all the other code. src/hotspot/cpu/s390/downcallLinker_s390.cpp line 100: > 98: Address offset_addr(callerSP, FP_BIAS + reg_offset.offset()); > 99: __ mem2reg_opt(r_tmp1, offset_addr, true); > 100: __ z_agr(reg_oop_reg, r_tmp1); Please note that s390 is a CISC architecture. It provides instructions for almost everything. :-) Here, I would suggest to add the offset to reg_oop_reg directly from memory - without first loading the offset into a temp register (that is RISC style). It's shorter and faster: ` __ z_ag(reg_oop_reg, offset_addr);` src/hotspot/cpu/s390/downcallLinker_s390.cpp line 112: > 110: __ mem2reg_opt(r_tmp2, oop_addr, true); > 111: __ z_agr(r_tmp1, r_tmp2); > 112: __ reg2mem_opt(r_tmp1, oop_addr, true); Similar to above. You need to load only one operand into a register. __ mem2reg_opt(r_tmp2, oop_addr, true); __ z_ag(r_tmp2, offset_addr); __ reg2mem_opt(r_tmp2, oop_addr, true); ------------- Changes requested by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16201#pullrequestreview-1687765628 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1365554460 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1365560688 From fjiang at openjdk.org Thu Oct 19 13:46:46 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 19 Oct 2023 13:46:46 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v7] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 17:38:24 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > add s390 support Here is the patch for risc-v: [riscv_panama_heap_segments.patch](https://github.com/openjdk/jdk/files/13043332/riscv_panama_heap_segments.patch) All `jdk_foreign` tests passed on linux-riscv with `-XX+VerifyOops -XX:+VerifyStack` and fastdebug build. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1771014968 From lucy at openjdk.org Thu Oct 19 13:46:48 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 19 Oct 2023 13:46:48 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v6] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 17:04:52 GMT, Sidraya Jayagond wrote: >> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >> >> - add PPC impl >> - add missing file > > Add s390x port from here > [S390x_Panama_heap_segments.txt](https://github.com/openjdk/jdk/files/13031418/S390x_Panama_heap_segments.txt) > > All tests are passing on linux S390x. One single test case is failing on Big Endian: > test TestLayoutPaths.testBadAlignmentOfRoot(): failure. Similar to PPC Big Endian. @sid8606 You may want to have a look at my s390 comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1771020553 From hgreule at openjdk.org Thu Oct 19 13:54:54 2023 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 19 Oct 2023 13:54:54 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v5] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 11:25:19 GMT, Hannes Greule wrote: >> See the bug description for more information. >> >> This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > fix indent Thank you all for your reviews. I think this is worth a backport to jdk21u, what do you think? And if yes, is it enough to run the backport command on the commit later and create a PR from it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16083#issuecomment-1771036882 From fparain at openjdk.org Thu Oct 19 14:22:22 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 19 Oct 2023 14:22:22 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 17:45:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Removed some comments and relocated code src/hotspot/cpu/x86/templateTable_x86.cpp line 3799: > 3797: load_resolved_method_entry_special_or_static(rcx, // ResolvedMethodEntry* > 3798: rbx, // Method* > 3799: rdx // flags Please align the last comment with the comments above. src/hotspot/cpu/x86/templateTable_x86.cpp line 3819: > 3817: load_resolved_method_entry_special_or_static(rcx, // ResolvedMethodEntry* > 3818: rbx, // Method* > 3819: rdx // flags Please align the last comment with the comments above. src/hotspot/cpu/x86/templateTable_x86.cpp line 3844: > 3842: rax, // Klass* > 3843: rbx, // Method* or itable/vtable index > 3844: rdx); // flags Could all comments be aligned? src/hotspot/share/oops/cpCache.cpp line 335: > 333: } > 334: > 335: void ConstantPoolCache::set_method_handle_common(int method_index, The only place where this method is called is in set_method_handle() just above. Do we really need to have two methods then? src/hotspot/share/oops/cpCache.cpp line 350: > 348: ResolvedMethodEntry* method_entry = resolved_method_entry_at(method_index); > 349: > 350: if (method_entry->is_resolved(invoke_code)) { //method_entry->method() != nullptr && Weird comment. src/hotspot/share/oops/cpCache.cpp line 369: > 367: // In the general case, this could be the call site's MethodType, > 368: // for use with java.lang.Invokers.checkExactType, or else a CallSite object. > 369: // f1 contains the adapter method which manages the actual call. Obsolete reference to f1 src/hotspot/share/oops/cpCache.cpp line 378: > 376: // > 377: // This means that given a call site like (List)mh.invoke("foo"), > 378: // the f1 method has signature '(Ljl/Object;Ljl/invoke/MethodType;)Ljl/Object;', Obsolete reference to f1 src/hotspot/share/oops/cpCache.hpp line 47: > 45: > 46: // A constant pool cache is a runtime data structure set aside to a constant pool. The cache > 47: // holds interpreter runtime information for all field access and invoke bytecodes. The cache The cpCache is not a cache (this name should be changed at some point) and is not interpreter specific. The cpCache stores the results of fields and methods resolutions, which are then used by all VM components (interpreter, runtime, JITs). src/hotspot/share/oops/resolvedMethodEntry.hpp line 38: > 36: // with the constant pool index associated with the bytecode before any resolution is done, > 37: // where "resolution" refers to populating the bytecode1 and bytecode2 fields and other > 38: // relevant information.These entries are contained within the ConstantPoolCache and are Space after the dot. src/hotspot/share/oops/resolvedMethodEntry.hpp line 45: > 43: // This structure has fields for every type of invoke bytecode but each entry may only > 44: // use some of the fields. All entries have a TOS state, number of parameters, flags, > 45: // and a constant pool index. An entry can have only one type, and some fields are specific to a particular type of entry. Would it be possible to use an union for those fields, in order to reduce the size of ResolvedMethodEntry instances? For instance: class ResolvedMethodEntry { friend class VMStructs; Method* _method; // Method for non virtual calls, adapter method for invokevirtual, final method for invokevirtual, final method for virtual union _specific { u2 _resolved_references_index; // Index of resolved references array that holds the appendix oop for invokehandle u2 _table_index; // vtable/itable index for virtual and interface calls InstanceKlass* _interface_klass; // for interface and static } specific; u2 _cpool_index; // Constant pool index u2 _number_of_parameters; // Number of arguments for method u1 _tos_state; // TOS state u1 _flags; // Flags: [000|has_local_signature|has_appendix|forced_virtual|final|virtual_final] u1 _bytecode1, _bytecode2; // Bytecodes for f1 and f2 }; src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/interpreter/BytecodeWithCPIndex.java line 64: > 62: return cpCache.getMethodEntryAt(cpCacheIndex).getConstantPoolIndex(); > 63: } > 64: //return cpCache.getEntryAt((int) (0xFFFF & cpCacheIndex)).getConstantPoolIndex(); The commented line should be removed. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ConstantPool.java line 240: > 238: } else { > 239: // change byte-ordering and go via cache > 240: //i = cache.getEntryAt(0xFFFF & which).getConstantPoolIndex(); Commented line should be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1365532555 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1365533518 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1365534194 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1364075591 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1364075903 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1364082255 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1364082532 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1364033456 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1364132756 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1365595822 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1335937597 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1335939147 From rrich at openjdk.org Thu Oct 19 14:49:22 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Oct 2023 14:49:22 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: preprocess_card_table_parallel should be private ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/f7965512..26f06361 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=22-23 Stats: 16 lines in 1 file changed: 8 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From thartmann at openjdk.org Thu Oct 19 15:10:05 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 19 Oct 2023 15:10:05 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v8] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 22:05:08 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> ? | ? | ? | ? | ? >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated comments, removed unused labels All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15410#issuecomment-1771115519 From adinn at openjdk.org Thu Oct 19 15:10:05 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 19 Oct 2023 15:10:05 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 14:57:04 GMT, Frederic Parain wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed some comments and relocated code > > src/hotspot/share/oops/cpCache.hpp line 47: > >> 45: >> 46: // A constant pool cache is a runtime data structure set aside to a constant pool. The cache >> 47: // holds interpreter runtime information for all field access and invoke bytecodes. The cache > > The cpCache is not a cache (this name should be changed at some point) and is not interpreter specific. > The cpCache stores the results of fields and methods resolutions, which are then used by all VM components (interpreter, runtime, JITs). I think there is a bit more nuance needed here. Strictly, the cpCache *is* a cache since it retains values for re-use that would otherwise need to be recomputed. What it is not is a traditional key-value lookup store i.e. it does not conform to the common usage of 'cache' in computer science parlance but to a more broad use of the term in English. I agree that we could do with a better name. Also, while you are correct to note that the runtime uses the cpCache during link resolution and that the JITs use the cpCache when generating compiled code, it is also true that the resulting JITted method code does not refer to the cache as part of its operation. So, as far as the execution engine of the JVM is concerned, the info in the cpCache is present to optimize interpreted execution and is incidental to compiled execution. That distinction explains why it has been referred to as an interpreter 'cache' or 'scratch pad' and I think this is worth noting, albeit at the risk of down-playing those other uses. This present discussion may well all that is needed to provide that note so, as you say, the comment should probably avoid tying the cpCache exclusively to the interpreter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1365696830 From lkorinth at openjdk.org Thu Oct 19 15:16:13 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Thu, 19 Oct 2023 15:16:13 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v5] In-Reply-To: References: Message-ID: > Rename createJavaProcessBuilder so that it is not used by mistake instead of createTestJvm. > > I have used the following sed script: `find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"` > > Then I have manually modified ProcessTools.java. In that file I have moved one version of createJavaProcessBuilder so that it is close to the other version. Then I have added a javadoc comment in bold telling: > > /** > * Create ProcessBuilder using the java launcher from the jdk to > * be tested. > * > *

Please observe that you likely should use > * createTestJvm() instead of this method because createTestJvm() > * will add JVM options from "test.vm.opts" and "test.java.opts" > * and this method will not do that. > * > * @param command Arguments to pass to the java command. > * @return The ProcessBuilder instance representing the java command. > */ > > > I have used the name createJavaProcessBuilderIgnoreTestJavaOpts because of the name of Utils.prependTestJavaOpts that adds those VM flags. If you have a better name I could do a rename of the method. I kind of like that it is long and clumsy, that makes it harder to use... > > I have run tier 1 testing, and I have started more exhaustive testing. Leo Korinth has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Batch update using sed find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createLimitedJavaTestProcessBuilder(/g" find -name "*.java" | xargs -n 1 sed -i -e "s/createTestJvm(/createJavaTestProcessBuilder(/g" find -name "*.java" | xargs -n 1 sed -i -e "s/import static jdk.test.lib.process.ProcessTools.createJavaProcessBuilder/import static jdk.test.lib.process.ProcessTools.createLimitedJavaTestProcessBuilder/g" - Merge branch '_master_jdk' into _8315097 - explain usage - Revert "8315097: Rename createJavaProcessBuilder" This reverts commit 4b2d171133c40c5c48114602bfd0d4da75531317. - Revert "copyright" This reverts commit f3418c80cc0d4cbb722ee5e368f1a001e898b43e. - Revert "fix static import" This reverts commit 27da71508aec9a4bec1c0ad07031887286580171. - fix static import - copyright - 8315097: Rename createJavaProcessBuilder ------------- Changes: https://git.openjdk.org/jdk/pull/15452/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15452&range=04 Stats: 894 lines in 560 files changed: 34 ins; 10 del; 850 mod Patch: https://git.openjdk.org/jdk/pull/15452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15452/head:pull/15452 PR: https://git.openjdk.org/jdk/pull/15452 From lkorinth at openjdk.org Thu Oct 19 15:16:42 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Thu, 19 Oct 2023 15:16:42 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v4] In-Reply-To: <4pRda3ZAZzVzGiVrDv6LN9Pw__DhrmTm4qZjTHzaq80=.a009bb29-4869-4047-8b62-80fbe7bef692@github.com> References: <4pRda3ZAZzVzGiVrDv6LN9Pw__DhrmTm4qZjTHzaq80=.a009bb29-4869-4047-8b62-80fbe7bef692@github.com> Message-ID: <3XTw5IAFj_YpaMrqyLdO9mNbWDENFMVkwk8JBmKHDcE=.5e4b1c70-0b36-4ff5-9cce-ee4f25a8e3bb@github.com> On Tue, 17 Oct 2023 12:29:46 GMT, Leo Korinth wrote: >> Rename createJavaProcessBuilder so that it is not used by mistake instead of createTestJvm. >> >> I have used the following sed script: `find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"` >> >> Then I have manually modified ProcessTools.java. In that file I have moved one version of createJavaProcessBuilder so that it is close to the other version. Then I have added a javadoc comment in bold telling: >> >> /** >> * Create ProcessBuilder using the java launcher from the jdk to >> * be tested. >> * >> *

Please observe that you likely should use >> * createTestJvm() instead of this method because createTestJvm() >> * will add JVM options from "test.vm.opts" and "test.java.opts" >> * and this method will not do that. >> * >> * @param command Arguments to pass to the java command. >> * @return The ProcessBuilder instance representing the java command. >> */ >> >> >> I have used the name createJavaProcessBuilderIgnoreTestJavaOpts because of the name of Utils.prependTestJavaOpts that adds those VM flags. If you have a better name I could do a rename of the method. I kind of like that it is long and clumsy, that makes it harder to use... >> >> I have run tier 1 testing, and I have started more exhaustive testing. > > Leo Korinth has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "8315097: Rename createJavaProcessBuilder" > > This reverts commit 4b2d171133c40c5c48114602bfd0d4da75531317. > - Revert "copyright" > > This reverts commit f3418c80cc0d4cbb722ee5e368f1a001e898b43e. > - Revert "fix static import" > > This reverts commit 27da71508aec9a4bec1c0ad07031887286580171. If this looks roughly acceptable, I will manually add indentation spaces. I am now running tests. The changes can be verified by running the following commands: git switch -c _reproduce 15acf4b8d7cffcd0d74bf1b9c43cde9acaf31ea9 find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createLimitedJavaTestProcessBuilder(/g" find -name "*.java" | xargs -n 1 sed -i -e "s/createTestJvm(/createJavaTestProcessBuilder(/g" find -name "*.java" | xargs -n 1 sed -i -e "s/import static jdk.test.lib.process.ProcessTools.createJavaProcessBuilder/import static jdk.test.lib.process.ProcessTools.createLimitedJavaTestProcessBuilder/g" git diff HEAD f80dda8d7109c2ef6bc1f685d0b611704dec645e Only the documentation changes should be visible. When I have manually indented everything it should be easy to that verify that change as a whitespace-only change. But that is for tomorrow (at best). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15452#issuecomment-1771194189 From tschatzl at openjdk.org Thu Oct 19 15:25:11 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 19 Oct 2023 15:25:11 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 14:49:22 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > preprocess_card_table_parallel should be private Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/parallel/psCardTable.cpp line 146: > 144: // to all stripes (if any) they extend to. > 145: // A copy of card table entries corresponding to the stripe called "shadow" table > 146: // is used to separate card reading, clearing and redirtying. It looks like this documentation (at least the last part starting with the second sentence) should be placed to `scavenge_contents_parallel` because _this_ method does not do preprocessing (but `scavenge_contents_parallel`). (And the first sentence should be in the definitions in the hpp file) Generally I have a feeling that the documentation is scattered across too many places and that imo makes things harder to understand than necessary. I.e. I would really prefer some larger block summarizing how things work with at most small details added to the methods. The interface (.hpp) file has no documentation whatsoever, although I would assume that this would be the entry point trying to understand what is happening here. src/hotspot/share/gc/parallel/psCardTable.cpp line 157: > 155: StripeShadowTable sct(this, MemRegion(start, end)); > 156: > 157: // end might not be card-aligned Suggestion: // end might not be card-aligned. src/hotspot/share/gc/parallel/psCardTable.cpp line 171: > 169: } > 170: > 171: // Located a non-empty dirty chunk [dirty_l, dirty_r) Suggestion: // Located a non-empty dirty chunk [dirty_l, dirty_r). src/hotspot/share/gc/parallel/psCardTable.cpp line 175: > 173: HeapWord* addr_r = MIN2(sct.addr_for(dirty_r), end); > 174: > 175: // Scan objects overlapping [addr_l, addr_r) limited to [start, end) Suggestion: // Scan objects overlapping [addr_l, addr_r) limited to [start, end). src/hotspot/share/gc/parallel/psCardTable.cpp line 186: > 184: > 185: if (is_obj_array) { > 186: // precise-marked Suggestion: // Always scan obj arrays precisely (they are always marked precisely) to avoid unnecessary work. src/hotspot/share/gc/parallel/psCardTable.cpp line 190: > 188: } else { > 189: if (obj_addr < i_addr && i_addr > start) { > 190: // already-scanned Suggestion: // Already scanned this object. Has been one that spans multiple dirty chunks. The second condition makes sure // that we always scan the (non-Array) object reaching into this stripe. src/hotspot/share/gc/parallel/psCardTable.cpp line 201: > 199: } > 200: > 201: // move to next obj inside this dirty chunk Suggestion: // Move to next obj inside this dirty chunk. src/hotspot/share/gc/parallel/psCardTable.cpp line 205: > 203: } > 204: > 205: // Finished a dirty chunk Suggestion: // Finished a dirty chunk. src/hotspot/share/gc/parallel/psCardTable.cpp line 210: > 208: } > 209: > 210: // Propagate imprecise card marks from object start to the stripes an object extends to. Suggestion: // Propagate imprecise card marks from object start to all stripes an object extends to this thread is assigned to. (I saw that this is actually duplicated from the .hpp file. Better to improve the one in the .hpp file and remove this one) src/hotspot/share/gc/parallel/psCardTable.cpp line 217: > 215: uint stripe_index, > 216: uint n_stripes) { > 217: const uint active_workers = n_stripes; Suggestion: `active_workers` is unused. src/hotspot/share/gc/parallel/psCardTable.cpp line 221: > 219: CardValue* cur_card = byte_for(old_gen_bottom) + stripe_index * num_cards_in_stripe; > 220: CardValue* const end_card = byte_for(old_gen_top - 1) + 1; > 221: HeapWord* signaled_goal = nullptr; Unused too. Suggestion: src/hotspot/share/gc/parallel/psCardTable.cpp line 235: > 233: } > 234: } > 235: } I think this code becomes more clear if the nested-ifs are replaced by negation and `continue`. I also added some additional comments giving reasons for the conditions. Suggestion: for (CardValue* cur_card = byte_for(old_gen_bottom) + stripe_index * num_cards_in_stripe; // this may be left outside, your call, it is a bit long. cur_card < end_card; cur_card += num_cards_in_slice) { HeapWord* stripe_addr = addr_for(cur_card); if (is_dirty(cur_card) { // The first card of this stripe is already dirty, no need to see if the reaching-in object is a potentially imprecisely marked non-array object. continue; } HeapWord* first_obj_addr = object_start(stripe_addr); if (first_obj_addr == stripe_addr) { // (random comment) can't be > I think // No object reaching into this stripe. continue; } oop first_obj = cast_to_oop(first_obj_addr); if (!first_obj->is_array() && is_dirty(byte_for(first_obj_addr))) { // Found a non-array object reaching into the stripe assigned to this thread that has potentially been marked imprecisely. // Mark first card of stripe dirty so that this thread will process it later. *cur_card = dirty_card_val(); } } src/hotspot/share/gc/parallel/psCardTable.cpp line 242: > 240: SpinYield spin_yield; > 241: while (Atomic::load_acquire(&_preprocessing_active_workers) > 0) { > 242: spin_yield.wait(); I would prefer to have the synchronization as part of `scavenge_contents_parallel`; i.e. the logic there being Prepare Scavenge Synchronize Scavenge Here the synchronization feels out of place and surprising for a method that nowhere indicates that it is doing anything other than preprocessing the table. src/hotspot/share/gc/parallel/psCardTable.cpp line 309: > 307: > 308: const size_t stripe_size_in_words = num_cards_in_stripe * _card_size_in_words; > 309: const size_t slice_size_in_words = stripe_size_in_words * n_stripes; Reiterating this, these two initializations seem to be related to the "Scavenge" phase of this method and should be placed there. src/hotspot/share/gc/parallel/psCardTable.cpp line 311: > 309: const size_t slice_size_in_words = stripe_size_in_words * n_stripes; > 310: > 311: // Prepare scavenge Suggestion: // Prepare scavenge. src/hotspot/share/gc/parallel/psCardTable.cpp line 315: > 313: > 314: // Reset cached object > 315: cached_obj = {nullptr, old_gen_bottom}; Suggestion: // Prepare for actual scavenge. const size_t stripe_size_in_words = num_cards_in_stripe * _card_size_in_words; const size_t slice_size_in_words = stripe_size_in_words * n_stripes; cached_obj = {nullptr, old_gen_bottom}; (I do not feel that "Reset cached object" adds a lot) src/hotspot/share/gc/parallel/psCardTable.cpp line 319: > 317: // Scavenge > 318: HeapWord* cur_addr = old_gen_bottom + stripe_index * stripe_size_in_words; > 319: for (/* empty */; cur_addr < old_gen_top; cur_addr += slice_size_in_words) { Suggestion: for (HeapWord* cur_addr = old_gen_bottom + stripe_index * stripe_size_in_words; cur_addr < old_gen_top; cur_addr += slice_size_in_words) { feeld better to me than that `/*empty*/` marker, but both is fine. src/hotspot/share/gc/parallel/psCardTable.hpp line 35: > 33: class PSPromotionManager; > 34: > 35: class PSCardTable: public CardTable { I am aware that `PSCardTable`'s function is not only scavenging support, but I would prefer documentation how this works here (or with the `scavenge_contents_parallel` method, at "Scavenge Support" comment) src/hotspot/share/gc/parallel/psCardTable.hpp line 36: > 34: > 35: class PSCardTable: public CardTable { > 36: private: Suggestion: Unnecessary. src/hotspot/share/gc/parallel/psCardTable.hpp line 44: > 42: const CardValue* _table_base; > 43: > 44: public: Suggestion: public: Should align with `class`. src/hotspot/share/gc/parallel/psCardTable.hpp line 45: > 43: > 44: public: > 45: StripeShadowTable(PSCardTable* pst, MemRegion stripe) : Fwiw, this is the only place I can find in this change where a memory range is passed as `MemRegion`. All other places pass the start/end pointers directly. Nothing to do here, but maybe for uniformity keep doing either. (I somewhat prefer using `MemRegion`, but not a strong opinion at all) src/hotspot/share/gc/parallel/psCardTable.hpp line 47: > 45: StripeShadowTable(PSCardTable* pst, MemRegion stripe) : > 46: _table_base(_table - (uintptr_t(stripe.start()) >> _card_shift)) { > 47: // Old gen top is not card aligned. Suggestion: // Old gen top may not be card aligned. src/hotspot/share/gc/parallel/psCardTable.hpp line 49: > 47: // Old gen top is not card aligned. > 48: size_t copy_length = align_up(stripe.byte_size(), _card_size) >> _card_shift; > 49: size_t clear_length = align_down(stripe.byte_size(), _card_size) >> _card_shift; Can you explain why `align_down` is needed here? I remember some reason why this needs to be the case at least for the old code, and @albertnetymk also explained it to me recently, but just now I can't figure it out (and it may not be required any more). Please add a comment, this is not obvious. src/hotspot/share/gc/parallel/psCardTable.hpp line 51: > 49: size_t clear_length = align_down(stripe.byte_size(), _card_size) >> _card_shift; > 50: memcpy(_table, pst->byte_for(stripe.start()), copy_length); > 51: memset(pst->byte_for(stripe.start()), clean_card_val(), clear_length); `pst->byte_for(stripe.start())` could be extracted out. src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 135: > 133: > 134: inline void PSPromotionManager::push_contents_bounded(oop obj, HeapWord* left, HeapWord* right) { > 135: if (!obj->klass()->is_typeArray_klass()) { I would probably put this check into `scan_obj_with_limit()` to also avoid the unnecessary prefetch and initialization of `PSPushContentsClosure`. `scan_obj_with_limit` seems to be the only caller anyway. ------------- PR Review: https://git.openjdk.org/jdk/pull/14846#pullrequestreview-1687805994 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365643777 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365704196 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365704534 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365704876 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365708507 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365709874 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365716122 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365716661 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365675777 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365596378 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365595698 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365684229 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365656451 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365658417 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365668045 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365667245 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365662064 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365719821 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365604379 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365606118 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365697976 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365692703 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365608483 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365692173 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365724310 From tschatzl at openjdk.org Thu Oct 19 15:25:20 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 19 Oct 2023 15:25:20 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v23] In-Reply-To: <1HXBowkNZo1iyNgOVA6qZYcyz0alZ8r5FBMQ3FqAvTE=.8a95691e-f02d-486a-96c3-88d5ec836228@github.com> References: <1HXBowkNZo1iyNgOVA6qZYcyz0alZ8r5FBMQ3FqAvTE=.8a95691e-f02d-486a-96c3-88d5ec836228@github.com> Message-ID: On Thu, 19 Oct 2023 09:16:04 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 36 additional commits since the last revision: > > - Use better name: _preprocessing_active_workers > - Merge branch 'master' > - Remove obsolete comment > - Feedback Albert > - Merge branch 'master' > - Re-cleanup (was accidentally reverted) > - Make sure to scan obj reaching in just once > - Simplification suggested by Albert > - Don't overlap card table processing with scavenging for simplicity > - Cleanup > - ... and 26 more: https://git.openjdk.org/jdk/compare/8e97b7e4...f7965512 src/hotspot/share/gc/parallel/psCardTable.hpp line 91: > 89: return end; > 90: } > 91: }; Could these implementations moved into the .cpp file? They are only every referenced by that and should be inlined anyway to not clog the interface/hpp file too much. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365581799 From jvernee at openjdk.org Thu Oct 19 15:28:54 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 19 Oct 2023 15:28:54 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v8] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request incrementally with three additional commits since the last revision: - add stub size stress test for allowHeap - RISC-V impl - remove leftover debug log line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16201/files - new: https://git.openjdk.org/jdk/pull/16201/files/65bd8d83..dd9e9741 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=06-07 Stats: 151 lines in 4 files changed: 74 ins; 66 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From jvernee at openjdk.org Thu Oct 19 15:28:57 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 19 Oct 2023 15:28:57 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v7] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 13:40:55 GMT, Feilong Jiang wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> add s390 support > > Here is the patch for risc-v: [riscv_panama_heap_segments.patch](https://github.com/openjdk/jdk/files/13043332/riscv_panama_heap_segments.patch) > > All `jdk_foreign` tests passed on linux-riscv with `-XX+VerifyOops -XX:+VerifyStack` and fastdebug build. @feilongjiang Thanks, I've added it to the PR @TheRealMDoerr Note that `reg_offset` is filtered out, and not handled by `arg_shuffle`. So, it becomes the question whether shuffling a register, or adding an offset to an oop takes more bytes. I think most of the `native_invoker_size_per_arg` have some lenience built in though? (I did do that for x64 and aarch64 at least). So, I think it will be okay. I've added an additional test case to the existing test for this, which should stress the new code gen. (https://github.com/openjdk/jdk/pull/16201/commits/dd9e9741de3ca07e6a4cc561002255f98e1e3330) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1771209486 From shade at openjdk.org Thu Oct 19 15:45:59 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Oct 2023 15:45:59 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms In-Reply-To: References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Thu, 19 Oct 2023 13:32:49 GMT, Aleksey Shipilev wrote: > EDIT: Wait, something is off here. I am running on RPi 4 that is definitely ARM v7+, which should support this. I think the gtest does not initialize `VM_Version` or something... Oh, I see, that's the test bug: #16269 The contract for `Atomic` requires that `PlatformCmpxchg<8>` is implemented on all platforms: // Platform-specific implementation of cmpxchg. Support for sizes // of 1, 4, and 8 are required. ... ... template struct PlatformCmpxchg; ARM32 does not implement this for all types of machines, as per comment here: /* * Atomic long operations on 32-bit ARM * ARM v7 supports LDREXD/STREXD synchronization instructions so no problem. * ARM < v7 does not have explicit 64 atomic load/store capability. * However, gcc emits LDRD/STRD instructions on v5te and LDM/STM on v5t * when loading/storing 64 bits. * For non-MP machines (which is all we support for ARM < v7) * under current Linux distros these instructions appear atomic. * See section A3.5.3 of ARM Architecture Reference Manual for ARM v7. * Also, for cmpxchg64, if ARM < v7 we check for cmpxchg64 support in the * Linux kernel using _kuser_helper_version. See entry-armv.S in the Linux * kernel source or kernel_user_helpers.txt in Linux Doc. */ I guess we still have a problem for ARM < v7 without kernel support, which would either assert in `atomic_linux_arm.hpp`, or `stop` later in the stub. I would argue that is a violation of `Atomic` contract that requires implementing `PlatformCmpxchg<8>` one way or the other. Given how even current direct uses of `PlatformCmpxchg<8>` in those configs would fail, I don't see this PR introduces regressions: using `PlatformXchg<8>` and `PlatformAdd<8>` would break on those platform either before or after this change, and thus we can proceed even with this change. (I would pull in `TEST_VM` change once committed.) Agree, @dholmes-ora? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1771245837 From ayang at openjdk.org Thu Oct 19 16:04:31 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 19 Oct 2023 16:04:31 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 14:07:21 GMT, Thomas Schatzl wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> preprocess_card_table_parallel should be private > > src/hotspot/share/gc/parallel/psCardTable.hpp line 49: > >> 47: // Old gen top is not card aligned. >> 48: size_t copy_length = align_up(stripe.byte_size(), _card_size) >> _card_shift; >> 49: size_t clear_length = align_down(stripe.byte_size(), _card_size) >> _card_shift; > > Can you explain why `align_down` is needed here? I remember some reason why this needs to be the case at least for the old code, and @albertnetymk also explained it to me recently, but just now I can't figure it out (and it may not be required any more). Please add a comment, this is not obvious. Since old-gen-top before scavenging might not be card-aligned, it's unsafe to clear it; hence the conservative (align-down) calculation. However, the right shift will do implicit align-down as well, so it is probably not needed. Better be explicit, I was thinking. Either is fine, I guess. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365791289 From mli at openjdk.org Thu Oct 19 16:10:24 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Oct 2023 16:10:24 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v3] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 17:35:58 GMT, Ilya Gavrilin wrote: >> Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. >> CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. >> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. >> >> Tests: >> Performance tests on t-head board: >> With intrinsics: >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms >> MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms >> MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms >> MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms >> >> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms >> MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms >> MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms >> MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms >> >> Regression tests: tier1, hotspot:tier2 on risc-v board. >> >> Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. >> Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: >> >> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> index 6cd1353907e..0bee25366bf 100644 >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -143,12 +143,12 @@ public double ceilDouble() { >> >> @Benchmark >> public double copySignDouble() { >> - return Math.copySign(double81, doubleNegative12); >> + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); >> } >> >> @Benchmark >> public float copySignFloat() { >> - return Math.copySign(floatNegative99, float1); >> + return ... > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Remove some effects and assert src/hotspot/cpu/riscv/riscv.ad line 7520: > 7518: __ fsgnj_d(dst, src1, src2); > 7519: %} > 7520: ins_pipe(fp_uop_d); Should ins_pipe be `fp_dop_reg_reg_d` here? src/hotspot/cpu/riscv/riscv.ad line 7532: > 7530: __ fsgnj_s(dst, src1, src2); > 7531: %} > 7532: ins_pipe(fp_uop_d); Should ins_pipe be `fp_dop_reg_reg_s` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1364312642 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1364313031 From duke at openjdk.org Thu Oct 19 16:14:20 2023 From: duke at openjdk.org (Leela Mohan Venati) Date: Thu, 19 Oct 2023 16:14:20 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v2] In-Reply-To: <39f-0nlOdQBABHr1cQOq7jITuRLzM9yLDUEUm1--0N8=.f3aeed11-7151-4885-8376-7d91ca84e8a7@github.com> References: <39f-0nlOdQBABHr1cQOq7jITuRLzM9yLDUEUm1--0N8=.f3aeed11-7151-4885-8376-7d91ca84e8a7@github.com> Message-ID: On Mon, 16 Oct 2023 20:03:44 GMT, Leela Mohan Venati wrote: >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup old oop map cache entry after class redefinition > > src/hotspot/share/gc/shenandoah/shenandoahVMOperations.cpp line 64: > >> 62: OopMapCache::cleanup_old_entries(); >> 63: } >> 64: > > Do you think, VM_ShenandoahFinalMarkStartEvac walks the stack roots. If yes, i recommend adding OopMapCache::cleanup_old_entries() in VM_ShenandoahOperation::doit_epilogue(). And this would make the change simple and also revert the change in this [PR](https://github.com/openjdk/jdk/pull/15921) I stand corrected. My question is still relevant >> Do you think, VM_ShenandoahFinalMarkStartEvac walks the stack roots. My recommendation is incorrect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16074#discussion_r1365804097 From rrich at openjdk.org Thu Oct 19 16:34:53 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Oct 2023 16:34:53 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v25] In-Reply-To: References: Message-ID: <_8M6U5h8p7PTuQ6XL113JmjeiSYG3dHiF2pNe0D36qo=.c9297e9b-d638-4ec2-993f-6a90e375934c@github.com> > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Small cleanup changes suggested by Thomas. Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/26f06361..7843a023 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=23-24 Stats: 11 lines in 2 files changed: 0 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Thu Oct 19 16:34:55 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Oct 2023 16:34:55 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 14:41:37 GMT, Thomas Schatzl wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> preprocess_card_table_parallel should be private > > src/hotspot/share/gc/parallel/psCardTable.cpp line 315: > >> 313: >> 314: // Reset cached object >> 315: cached_obj = {nullptr, old_gen_bottom}; > > Suggestion: > > // Prepare for actual scavenge. > const size_t stripe_size_in_words = num_cards_in_stripe * _card_size_in_words; > const size_t slice_size_in_words = stripe_size_in_words * n_stripes; > > cached_obj = {nullptr, old_gen_bottom}; > > > (I do not feel that "Reset cached object" adds a lot) The reset is needed because the cache requires that queries are monotonic. There was an assertion checking monotonicity. Albert thought it wouldn't be needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365819544 From cjplummer at openjdk.org Thu Oct 19 16:36:06 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 19 Oct 2023 16:36:06 GMT Subject: RFR: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 [v5] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 13:52:24 GMT, Hannes Greule wrote: > I think this is worth a backport to jdk21u, what do you think? And if yes, is it enough to run the backport command on the commit later and create a PR from it? Since the problem was introduced in 21, that sounds reasonable. You can use the backport command, but make sure you first follow the approval process described at https://openjdk.org/projects/jdk-updates/approval.html ------------- PR Comment: https://git.openjdk.org/jdk/pull/16083#issuecomment-1771342414 From rrich at openjdk.org Thu Oct 19 16:42:27 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Oct 2023 16:42:27 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v26] In-Reply-To: References: Message-ID: <_2a21qNJwmnjwMP7EREBeCgrZvNOgx6ScN55rstyOUM=.e6d83a4f-dcb0-431b-8275-f95b3940db8c@github.com> > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: More small changes Thomas suggested (line-breaks needed) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/7843a023..bd853c4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=24-25 Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From matsaave at openjdk.org Thu Oct 19 17:46:20 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 19 Oct 2023 17:46:20 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 15:19:50 GMT, Frederic Parain wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed some comments and relocated code > > src/hotspot/share/oops/cpCache.cpp line 335: > >> 333: } >> 334: >> 335: void ConstantPoolCache::set_method_handle_common(int method_index, > > The only place where this method is called is in set_method_handle() just above. Do we really need to have two methods then? I was also thinking about this after seeing @adinn 's comments. Neither function is private or protected so I think it is safe to just move all of the code from common to `set_method_handle()` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1365906463 From sjayagond at openjdk.org Thu Oct 19 18:00:42 2023 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Thu, 19 Oct 2023 18:00:42 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v7] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 15:21:02 GMT, Jorn Vernee wrote: >> Here is the patch for risc-v: [riscv_panama_heap_segments.patch](https://github.com/openjdk/jdk/files/13043332/riscv_panama_heap_segments.patch) >> >> All `jdk_foreign` tests passed on linux-riscv with `-XX+VerifyOops -XX:+VerifyStack` and fastdebug build. > > @feilongjiang Thanks, I've added it to the PR > > @TheRealMDoerr Note that `reg_offset` is filtered out, and not handled by `arg_shuffle`. So, it becomes the question whether shuffling a register, or adding an offset to an oop takes more bytes. I think most of the `native_invoker_size_per_arg` have some lenience built in though? (I did do that for x64 and aarch64 at least). So, I think it will be okay. > > I've added an additional test case to the existing test for this, which should stress the new code gen. (https://github.com/openjdk/jdk/pull/16201/commits/dd9e9741de3ca07e6a4cc561002255f98e1e3330) @JornVernee please add this patch which addresses @RealLucy comments. [0001-Address-Lutz-Schmidt-s-review-comments.txt](https://github.com/openjdk/jdk/files/13046011/0001-Address-Lutz-Schmidt-s-review-comments.txt) Also tested additional test case on s390x that @JornVernee have added and I see it is passing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1771454607 From sjayagond at openjdk.org Thu Oct 19 18:00:59 2023 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Thu, 19 Oct 2023 18:00:59 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v7] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 13:37:09 GMT, Lutz Schmidt wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> add s390 support > > src/hotspot/cpu/s390/downcallLinker_s390.cpp line 100: > >> 98: Address offset_addr(callerSP, FP_BIAS + reg_offset.offset()); >> 99: __ mem2reg_opt(r_tmp1, offset_addr, true); >> 100: __ z_agr(reg_oop_reg, r_tmp1); > > Please note that s390 is a CISC architecture. It provides instructions for almost everything. :-) > Here, I would suggest to add the offset to reg_oop_reg directly from memory - without first loading the offset into a temp register (that is RISC style). It's shorter and faster: > ` __ z_ag(reg_oop_reg, offset_addr);` That's right @RealLucy. Thanks for reviewing. > src/hotspot/cpu/s390/downcallLinker_s390.cpp line 112: > >> 110: __ mem2reg_opt(r_tmp2, oop_addr, true); >> 111: __ z_agr(r_tmp1, r_tmp2); >> 112: __ reg2mem_opt(r_tmp1, oop_addr, true); > > Similar to above. You need to load only one operand into a register. > > __ mem2reg_opt(r_tmp2, oop_addr, true); > __ z_ag(r_tmp2, offset_addr); > __ reg2mem_opt(r_tmp2, oop_addr, true); Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1365916526 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1365916660 From svkamath at openjdk.org Thu Oct 19 18:05:46 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 19 Oct 2023 18:05:46 GMT Subject: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v8] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 14:32:20 GMT, Tobias Hartmann wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated comments, removed unused labels > > All tests passed. @TobiHartmann Thank you for running the tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15410#issuecomment-1771384900 From iwalulya at openjdk.org Thu Oct 19 18:07:01 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 19 Oct 2023 18:07:01 GMT Subject: RFR: 8317350: Move code cache purging out of CodeCache::UnloadingScope In-Reply-To: References: Message-ID: On Mon, 2 Oct 2023 12:56:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring that moves actual code cache flushing/purging out of `CodeCache::UnloadingScope`. Reasons: > > * I prefer that a destructor does not do anything substantial - in some cases, 90% of time is spent in the destructor in that extracted method (due to https://bugs.openjdk.org/browse/JDK-8316959) > * imho it does not fit the class which does nothing but sets/resets some code cache unloading behavior (probably should be renamed to `UnloadingBehaviorScope` too in a separate CR). > * other existing methods at that level are placed out of that (or any other) scope object too - which is already the case for when doing concurrent unloading. > * putting it there makes future logging of the various phases a little bit easier, not having `GCTraceTimer` et al. in various places. > > Testing: gha > > Thanks, > Thomas Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16011#pullrequestreview-1688249138 From sjayagond at openjdk.org Thu Oct 19 18:18:22 2023 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Thu, 19 Oct 2023 18:18:22 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v7] In-Reply-To: References: Message-ID: <6YFCmKEmndViMxqdZTi0Jl-CvAz3LGJ-Et6bOnCqGVg=.7678f793-9c66-4d76-96f5-6b791e3172be@github.com> On Thu, 19 Oct 2023 15:21:02 GMT, Jorn Vernee wrote: >> Here is the patch for risc-v: [riscv_panama_heap_segments.patch](https://github.com/openjdk/jdk/files/13043332/riscv_panama_heap_segments.patch) >> >> All `jdk_foreign` tests passed on linux-riscv with `-XX+VerifyOops -XX:+VerifyStack` and fastdebug build. > > @feilongjiang Thanks, I've added it to the PR > > @TheRealMDoerr Note that `reg_offset` is filtered out, and not handled by `arg_shuffle`. So, it becomes the question whether shuffling a register, or adding an offset to an oop takes more bytes. I think most of the `native_invoker_size_per_arg` have some lenience built in though? (I did do that for x64 and aarch64 at least). So, I think it will be okay. > > I've added an additional test case to the existing test for this, which should stress the new code gen. (https://github.com/openjdk/jdk/pull/16201/commits/dd9e9741de3ca07e6a4cc561002255f98e1e3330) @JornVernee Please add below patch for addressing of @RealLucy review comments. [0001-Address-Lutz-Schmidt-s-review-comments.txt](https://github.com/openjdk/jdk/files/13046224/0001-Address-Lutz-Schmidt-s-review-comments.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1771488927 From svkamath at openjdk.org Thu Oct 19 18:32:17 2023 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 19 Oct 2023 18:32:17 GMT Subject: Integrated: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 06:12:29 GMT, Smita Kamath wrote: > Hi All, > I would like to submit AES-GCM optimization for x86_64 architectures using AVX2 instructions. This optimization interleaves AES and GHASH operations. > > Below are the performance numbers on my desktop system with -XX:UseAVX=2 option: > > |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | Speedup > |-------------|------------|---------------|------------------|-----------| > |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 > full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 > small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 > small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 > full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 > full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 > small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 > small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 > full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 > small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 > small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 > full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 > full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 > small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 > small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 > full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 > small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 > small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 > full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 > full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 > small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 > small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 > ? | ? | ? | ? | ? > full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 > full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 > small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 > small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 > full.AESGCMBench.decryptMultiPart | 65536 | 42649.816 | 47591.587 |1.11 > full.AESGCMBe... This pull request has now been integrated. Changeset: 17409500 Author: Smita Kamath Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/17409500369bd8503782b2e6f4e885e33837087a Stats: 706 lines in 8 files changed: 699 ins; 0 del; 7 mod 8314901: AES-GCM interleaved implementation using AVX2 instructions Reviewed-by: sviswanathan, djelinski ------------- PR: https://git.openjdk.org/jdk/pull/15410 From tschatzl at openjdk.org Thu Oct 19 18:49:35 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 19 Oct 2023 18:49:35 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 16:24:57 GMT, Richard Reingruber wrote: >> src/hotspot/share/gc/parallel/psCardTable.cpp line 315: >> >>> 313: >>> 314: // Reset cached object >>> 315: cached_obj = {nullptr, old_gen_bottom}; >> >> Suggestion: >> >> // Prepare for actual scavenge. >> const size_t stripe_size_in_words = num_cards_in_stripe * _card_size_in_words; >> const size_t slice_size_in_words = stripe_size_in_words * n_stripes; >> >> cached_obj = {nullptr, old_gen_bottom}; >> >> >> (I do not feel that "Reset cached object" adds a lot) > > The reset is needed because the cache requires that queries are monotonic. There was an assertion checking monotonicity. Albert thought it wouldn't be needed. I meant the comment "Reset Cached Object". I also think that the code is required :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365959327 From tschatzl at openjdk.org Thu Oct 19 18:49:37 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 19 Oct 2023 18:49:37 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 16:01:20 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/parallel/psCardTable.hpp line 49: >> >>> 47: // Old gen top is not card aligned. >>> 48: size_t copy_length = align_up(stripe.byte_size(), _card_size) >> _card_shift; >>> 49: size_t clear_length = align_down(stripe.byte_size(), _card_size) >> _card_shift; >> >> Can you explain why `align_down` is needed here? I remember some reason why this needs to be the case at least for the old code, and @albertnetymk also explained it to me recently, but just now I can't figure it out (and it may not be required any more). Please add a comment, this is not obvious. > > Since old-gen-top before scavenging might not be card-aligned, it's unsafe to clear it; hence the conservative (align-down) calculation. However, the right shift will do implicit align-down as well, so it is probably not needed. Better be explicit, I was thinking. Either is fine, I guess. The highest value `byte_size()` can have is old_gen-end - old_gen_bottom (both card-aligned; one stripe, one slice), which is the exact length needed when covering all cards. Any top value != end must have a committed corresponding card table entry, otherwise marking the card that contains `top` would crash. Also the copying would fail then, reading from uncommitted areas beyond the card table. The code does not do that afaics. So the only problematic one I can see would be clearing the card exactly starting at old_gen-end, which an `align_up()` wouldn't do either. So I do not completely get why clearing the card containing top would be unsafe. Can you give an example? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365962741 From rrich at openjdk.org Thu Oct 19 19:06:49 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Oct 2023 19:06:49 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: References: Message-ID: <2sSmoJUYGi_R0UTfD_3FA2o2q_n_zljpP2hYxqDWiZQ=.d8967c04-ee42-4338-b5ba-67c9c7633953@github.com> On Thu, 19 Oct 2023 18:38:03 GMT, Thomas Schatzl wrote: >> Since old-gen-top before scavenging might not be card-aligned, it's unsafe to clear it; hence the conservative (align-down) calculation. However, the right shift will do implicit align-down as well, so it is probably not needed. Better be explicit, I was thinking. Either is fine, I guess. > > The highest value `byte_size()` can have is old_gen-end - old_gen_bottom (both card-aligned; one stripe, one slice), which is the exact length needed when covering all cards. > Any top value != end must have a committed corresponding card table entry, otherwise marking the card that contains `top` would crash. Also the copying would fail then, reading from uncommitted areas beyond the card table. The code does not do that afaics. > > So the only problematic one I can see would be clearing the card exactly starting at old_gen-end, which an `align_up()` wouldn't do either. > > So I do not completely get why clearing the card containing top would be unsafe. Can you give an example? Likely I do not completely understand what you are saying but this would be my explanation why the `align_down` for `clear_length` is needed. `T := old_gen->object_space()->top()` is not necessarily card aligned at scavenge start. We must not clear the card for `T` if an object was copied there because it was promoted and it has a reference to a young object on that card. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1365992310 From dlong at openjdk.org Thu Oct 19 19:36:06 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 19 Oct 2023 19:36:06 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms In-Reply-To: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Wed, 18 Oct 2023 18:58:58 GMT, Aleksey Shipilev wrote: > See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. > > Unfortunately, we cannot test these apart from the existing gtest. > > Additional testing: > - [x] linux-x86-server-fastdebug, atomic tests pass > - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) I guess on arm32 without 64-bit atomic support, the gtest operations on int64 would crash, because the _LP64 guard was removed. It looks like C2 already requires 64-bit atomic support, so older arm32 without that would not be able to run C2. For those older arm32, the atomic support could use AccessLocker I guess, but I think C2 would still fail unless it was changed to also use AccessLocker. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1771590679 From matsaave at openjdk.org Thu Oct 19 20:00:07 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 19 Oct 2023 20:00:07 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 08:03:38 GMT, Andrew Dinn wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed some comments and relocated code > > src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 2314: > >> 2312: CALL_VM(InterpreterRuntime::resolve_from_cache(THREAD, (Bytecodes::Code)opcode), >> 2313: handle_exception); >> 2314: entry = cp->resolved_method_entry_at(index); > > Do you actually need to lookup the entry again? I'm not really sure why the old code needed to do so. This is probably unnecessary. I will try removing it and see what the test results look like. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1366042191 From iklam at openjdk.org Thu Oct 19 19:59:02 2023 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Oct 2023 19:59:02 GMT Subject: RFR: 8318484: Initial version of cdsConfig.hpp In-Reply-To: References: Message-ID: <0zaCmn28Mpcg3C3kXEvFJTEPq_NItT515f8lFro58OM=.d1b318af-2872-476c-a0b3-ba231ef02ccf@github.com> On Thu, 19 Oct 2023 06:54:05 GMT, David Holmes wrote: >> This is the first step for [JDK-8318483 - Move CDS configuration management into cdsConfig.hpp](https://bugs.openjdk.org/browse/JDK-8318483) >> >> - Remove `Arguments::is_dumping_archive()` and `Arguments assert_is_dumping_archive()` >> - Add the following new APIs >> >> >> class CDSConfig { >> static bool is_dumping_archive(); >> static bool is_dumping_static_archive(); >> static bool is_dumping_dynamic_archive(); >> static bool is_dumping_heap(); >> }; >> >> >> - Convert some use of `DumpSharedSpaces` and `DynamicDumpSharedSpaces` to these new APIs >> >> (More APIs will be added in future sub tasks of [JDK-8318483](https://bugs.openjdk.org/browse/JDK-8318483)) > > src/hotspot/share/cds/metaspaceShared.cpp line 778: > >> 776: >> 777: #if INCLUDE_CDS_JAVA_HEAP >> 778: if (CDSConfig::is_dumping_heap()) { > > This seems a new condition. Why is it needed now? This was a bug uncovered during refactoring. `StringTable::allocate_shared_strings_array()` used to assert `DumpSharedSpaces`. However, this function is useful only in a more limited scope (`CDSConfig::is_dumping_heap()` which is a subset of `DumpSharedSpaces`). So after changing the assert in `StringTable::allocate_shared_strings_array()`, I have to change the condition where this function is called. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16257#discussion_r1366041643 From rrich at openjdk.org Thu Oct 19 21:27:09 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Oct 2023 21:27:09 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v27] In-Reply-To: References: Message-ID: <8qkepkG9gUUj8f_GfFCZww2VPlK8htgBmLV1c4EbIWk=.25825e8f-46b0-412a-9373-17eef459a5d9@github.com> > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Review Thomas ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/bd853c4a..fd5d0725 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=25-26 Stats: 164 lines in 3 files changed: 82 ins; 65 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Thu Oct 19 21:30:17 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Oct 2023 21:30:17 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v27] In-Reply-To: <8qkepkG9gUUj8f_GfFCZww2VPlK8htgBmLV1c4EbIWk=.25825e8f-46b0-412a-9373-17eef459a5d9@github.com> References: <8qkepkG9gUUj8f_GfFCZww2VPlK8htgBmLV1c4EbIWk=.25825e8f-46b0-412a-9373-17eef459a5d9@github.com> Message-ID: <2WssUSGC3w_LXQw8t3saKVnyBmoqox5gnbMfB3gr3XI=.c82c3412-0210-4007-b8e5-41bd8ceec597@github.com> On Thu, 19 Oct 2023 21:27:09 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Review Thomas Thanks for all the feedback Thomas! // Haven't incorporated all of it yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1771730013 From tschatzl at openjdk.org Thu Oct 19 21:46:45 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 19 Oct 2023 21:46:45 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: References: <2sSmoJUYGi_R0UTfD_3FA2o2q_n_zljpP2hYxqDWiZQ=.d8967c04-ee42-4338-b5ba-67c9c7633953@github.com> Message-ID: <-z08i1g6s8t-is8iRk-C6Br94tGw0ZQwRvJsFjvTIlw=.51bfed2c-3d04-4f6e-8e67-69d4b652c676@github.com> On Thu, 19 Oct 2023 20:06:59 GMT, Richard Reingruber wrote: >> Likely I do not completely understand what you are saying but this would be my explanation why the `align_down` for `clear_length` is needed. >> `T := old_gen->object_space()->top()` is not necessarily card aligned at scavenge start. We must not clear the card for `T` if an object was copied there because it was promoted and it has a reference to a young object on that card. > > Example > > We cannot clear card n containing old gen top T because we won't scan the promoted > objects on card n and we can only clear cards if we scan all objects on them afterwards. > > > card n-1 card n card n+1 > +--------------------+--------------------+-------------------- > | | . Promoted | > | | . Objects | > | | . | > +--------------------+--------------------+-------------------- > ^ > | > T > old gen top at scavenge start > / end of last stripe Okay, I think I understood this now. I would probably just fill up the last card before scavenge, but this explanation makes sense. Please add a comment there for the next person. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366158560 From mdoerr at openjdk.org Thu Oct 19 21:54:57 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 19 Oct 2023 21:54:57 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v5] In-Reply-To: References: Message-ID: <8hx8KP0ZWoairTjroq63c1XF9AChGkNeayW_83uIUAM=.90edba42-261c-4192-ba1a-9ce3eddb9004@github.com> On Thu, 19 Oct 2023 12:58:39 GMT, Martin Doerr wrote: >>> I wonder if the native_invoker_size_per_arg thing still works good enough. We may exceed the computed size, now, right? >> >> Good point. I'll have a look at enhancing the test we have for this. >> >> Intuitively, I think it will be okay. It's true that we generate more code to add the oops and offsets together, but at the same time, we don't have any code to shuffle the offsets. > >> > I wonder if the native_invoker_size_per_arg thing still works good enough. We may exceed the computed size, now, right? >> >> Good point. I'll have a look at enhancing the test we have for this. >> >> Intuitively, I think it will be okay. It's true that we generate more code to add the oops and offsets together, but at the same time, we don't have any code to shuffle the offsets. > > Looks like we use 2 input regs per obj, so we reserve 2x native_invoker_size_per_arg. However, the case in which `reg_oop` and `reg_offset` are on stack together with `arg_shuffle` can produce more than this size (depending on platform). If we pass lots of objects on stack, we may exceed the computed size. Correct? > @TheRealMDoerr Note that `reg_offset` is filtered out, and not handled by `arg_shuffle`. So, it becomes the question whether shuffling a register, or adding an offset to an oop takes more bytes. I think most of the `native_invoker_size_per_arg` have some lenience built in though? (I did do that for x64 and aarch64 at least). So, I think it will be okay. Right. The filtering happens after the size computation `code_size = native_invoker_code_base_size + (num_args * native_invoker_size_per_arg)`, so we reserve 2 * native_invoker_size_per_arg per heap segment which is 2 * 2 instructions on PPC64. The maximum actual code size is the size of the 1 * add_offset operation + 1 arg shuffle, which is 4 + 2 instructions on PPC64. So, we reserve space for 4 instructions, but emit 6 ones. The reason why it still works is that `_needs_transition` must be false and that saves more space than we exceeded by the oversized argument handling code. Not a very nice design, but I can live with it. > I've added an additional test case to the existing test for this, which should stress the new code gen. ([dd9e974](https://github.com/openjdk/jdk/commit/dd9e9741de3ca07e6a4cc561002255f98e1e3330)) Thanks for adding it! Would you mind using `limit(84)` which is the maximum? We'd be on the safe side when testing this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1771757690 From rrich at openjdk.org Thu Oct 19 20:10:12 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Oct 2023 20:10:12 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: <2sSmoJUYGi_R0UTfD_3FA2o2q_n_zljpP2hYxqDWiZQ=.d8967c04-ee42-4338-b5ba-67c9c7633953@github.com> References: <2sSmoJUYGi_R0UTfD_3FA2o2q_n_zljpP2hYxqDWiZQ=.d8967c04-ee42-4338-b5ba-67c9c7633953@github.com> Message-ID: On Thu, 19 Oct 2023 19:04:23 GMT, Richard Reingruber wrote: >> The highest value `byte_size()` can have is old_gen-end - old_gen_bottom (both card-aligned; one stripe, one slice), which is the exact length needed when covering all cards. >> Any top value != end must have a committed corresponding card table entry, otherwise marking the card that contains `top` would crash. Also the copying would fail then, reading from uncommitted areas beyond the card table. The code does not do that afaics. >> >> So the only problematic one I can see would be clearing the card exactly starting at old_gen-end, which an `align_up()` wouldn't do either. >> >> So I do not completely get why clearing the card containing top would be unsafe. Can you give an example? > > Likely I do not completely understand what you are saying but this would be my explanation why the `align_down` for `clear_length` is needed. > `T := old_gen->object_space()->top()` is not necessarily card aligned at scavenge start. We must not clear the card for `T` if an object was copied there because it was promoted and it has a reference to a young object on that card. Example We cannot clear card n containing old gen top T because we won't scan the promoted objects on card n and we can only clear cards if we scan all objects on them afterwards. card n-1 card n card n+1 +--------------------+--------------------+-------------------- | | . Promoted | | | . Objects | | | . | +--------------------+--------------------+-------------------- ^ | T old gen top at scavenge start / end of last stripe ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366051508 From rrich at openjdk.org Thu Oct 19 20:34:12 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Oct 2023 20:34:12 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 14:51:52 GMT, Thomas Schatzl wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> preprocess_card_table_parallel should be private > > src/hotspot/share/gc/parallel/psCardTable.cpp line 235: > >> 233: } >> 234: } >> 235: } > > I think this code becomes more clear if the nested-ifs are replaced by negation and `continue`. I also added some additional comments giving reasons for the conditions. > > Suggestion: > > for (CardValue* cur_card = byte_for(old_gen_bottom) + stripe_index * num_cards_in_stripe; // this may be left outside, your call, it is a bit long. > cur_card < end_card; > cur_card += num_cards_in_slice) { > HeapWord* stripe_addr = addr_for(cur_card); > if (is_dirty(cur_card) { > // The first card of this stripe is already dirty, no need to see if the reaching-in object is a potentially imprecisely marked non-array object. > continue; > } > HeapWord* first_obj_addr = object_start(stripe_addr); > if (first_obj_addr == stripe_addr) { // (random comment) can't be > I think > // No object reaching into this stripe. > continue; > } > oop first_obj = cast_to_oop(first_obj_addr); > if (!first_obj->is_array() && is_dirty(byte_for(first_obj_addr))) { > // Found a non-array object reaching into the stripe assigned to this thread that has potentially been marked imprecisely. > // Mark first card of stripe dirty so that this thread will process it later. > *cur_card = dirty_card_val(); > } > } It's actually just a minor detail that the thread that marks the first card dirty will also process that stripe. The assignment could be changed without effect. I'll leave that part out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366079466 From jsjolen at openjdk.org Thu Oct 19 20:20:21 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 19 Oct 2023 20:20:21 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory Message-ID: I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? 1. Moved all the nmt source code from services/ to nmt/ 2. Renamed all the include statements and sorted them 3. Fixed the include guards ------------- Commit messages: - Move NMT to its own subdirectory Changes: https://git.openjdk.org/jdk/pull/16276/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318447 Stats: 485 lines in 100 files changed: 204 ins; 206 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/16276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16276/head:pull/16276 PR: https://git.openjdk.org/jdk/pull/16276 From mdoerr at openjdk.org Thu Oct 19 22:24:45 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 19 Oct 2023 22:24:45 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v8] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 15:28:54 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with three additional commits since the last revision: > > - add stub size stress test for allowHeap > - RISC-V impl > - remove leftover debug log line This is probably the wrong place to ask this, but the sizing topic leads me to another issue: `RuntimeStub::new_runtime_stub` can return `nullptr` when the code cache is full and we would crash when trying to call `nullptr->print_on(&ls)`. Also, what will the Java code do when `downcallStubAddress` is 0 in the `NativeEntryPoint`? Do you want me to file an issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1771784660 From ccheung at openjdk.org Thu Oct 19 22:25:34 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 19 Oct 2023 22:25:34 GMT Subject: RFR: 8318484: Initial version of cdsConfig.hpp In-Reply-To: References: Message-ID: <_ELyh9MSG1g0UxlY8A9R0Ld7OILOUT-3LuJA_py4mac=.46792912-7094-4afd-a337-7ffe0a75c206@github.com> On Thu, 19 Oct 2023 05:56:53 GMT, Ioi Lam wrote: > This is the first step for [JDK-8318483 - Move CDS configuration management into cdsConfig.hpp](https://bugs.openjdk.org/browse/JDK-8318483) > > - Remove `Arguments::is_dumping_archive()` and `Arguments assert_is_dumping_archive()` > - Add the following new APIs > > > class CDSConfig { > static bool is_dumping_archive(); > static bool is_dumping_static_archive(); > static bool is_dumping_dynamic_archive(); > static bool is_dumping_heap(); > }; > > > - Convert some use of `DumpSharedSpaces` and `DynamicDumpSharedSpaces` to these new APIs > > (More APIs will be added in future sub tasks of [JDK-8318483](https://bugs.openjdk.org/browse/JDK-8318483)) One nit in cdsConfig.hpp. src/hotspot/share/cds/cdsConfig.hpp line 39: > 37: > 38: // CDS archived heap > 39: static bool is_dumping_heap() NOT_CDS_JAVA_HEAP_RETURN_(false); Too much blank spaces between the function declarations and NOT_CDS_* macros. The function declarations could also be shifted more to the left. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16257#pullrequestreview-1688783121 PR Review Comment: https://git.openjdk.org/jdk/pull/16257#discussion_r1366197088 From hgreule at openjdk.org Thu Oct 19 23:27:40 2023 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 19 Oct 2023 23:27:40 GMT Subject: Integrated: 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 20:56:13 GMT, Hannes Greule wrote: > See the bug description for more information. > > This implementation brings down the time to take a heap dump on the example application in the bug report to <2 seconds on my machine. This pull request has now been integrated. Changeset: 8f5f4407 Author: Hannes Greule Committer: David Holmes URL: https://git.openjdk.org/jdk/commit/8f5f44070a7c6dbbbd1005f9d0af5ab7c35179df Stats: 300 lines in 3 files changed: 292 ins; 1 del; 7 mod 8317692: jcmd GC.heap_dump performance regression after JDK-8292818 Reviewed-by: amenkov, fparain ------------- PR: https://git.openjdk.org/jdk/pull/16083 From vlivanov at openjdk.org Fri Oct 20 00:53:44 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 20 Oct 2023 00:53:44 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <3lvSpzzS_CILELs_1ChPbPqdkO9FewipR1o8zm-ooBQ=.4dc090d3-424b-47d8-be8d-9e388bdc009f@github.com> On Thu, 19 Oct 2023 09:33:52 GMT, Andrew Haley wrote: > I took them out because of a potential backwards-compatibility breakage. Ok, I checked the removed code (https://github.com/openjdk/jdk/pull/10661/commits/b817d4757c78594be5960ee0be27013e2588d30a) and agree it is not needed here. `RestoreMXCSROnJNICalls` unconditionally restores MXCSR contents while the aforementioned code made it conditional based on FTZ bit fast check. It could be considered as an optimization (to speed up `-XX:+RestoreMXCSROnJNICalls` mode), but then the fast path check should be extended to cover all possible failure modes (e.g., rounding issues). ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1771900576 From iklam at openjdk.org Fri Oct 20 02:24:35 2023 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 20 Oct 2023 02:24:35 GMT Subject: RFR: 8318484: Initial version of cdsConfig.hpp In-Reply-To: <_ELyh9MSG1g0UxlY8A9R0Ld7OILOUT-3LuJA_py4mac=.46792912-7094-4afd-a337-7ffe0a75c206@github.com> References: <_ELyh9MSG1g0UxlY8A9R0Ld7OILOUT-3LuJA_py4mac=.46792912-7094-4afd-a337-7ffe0a75c206@github.com> Message-ID: On Thu, 19 Oct 2023 22:21:30 GMT, Calvin Cheung wrote: >> This is the first step for [JDK-8318483 - Move CDS configuration management into cdsConfig.hpp](https://bugs.openjdk.org/browse/JDK-8318483) >> >> - Remove `Arguments::is_dumping_archive()` and `Arguments assert_is_dumping_archive()` >> - Add the following new APIs >> >> >> class CDSConfig { >> static bool is_dumping_archive(); >> static bool is_dumping_static_archive(); >> static bool is_dumping_dynamic_archive(); >> static bool is_dumping_heap(); >> }; >> >> >> - Convert some use of `DumpSharedSpaces` and `DynamicDumpSharedSpaces` to these new APIs >> >> (More APIs will be added in future sub tasks of [JDK-8318483](https://bugs.openjdk.org/browse/JDK-8318483)) > > src/hotspot/share/cds/cdsConfig.hpp line 39: > >> 37: >> 38: // CDS archived heap >> 39: static bool is_dumping_heap() NOT_CDS_JAVA_HEAP_RETURN_(false); > > Too much blank spaces between the function declarations and NOT_CDS_* macros. > The function declarations could also be shifted more to the left. The spaces are for functions that will be added in the next PR that have longer names: // Basic CDS features static bool is_dumping_archive() NOT_CDS_RETURN_(false); static bool is_dumping_static_archive() NOT_CDS_RETURN_(false); static bool is_dumping_dynamic_archive() NOT_CDS_RETURN_(false); // CDS archived heap static bool is_dumping_heap() NOT_CDS_JAVA_HEAP_RETURN_(false); static bool is_loading_heap() NOT_CDS_JAVA_HEAP_RETURN_(false); static void disable_dumping_full_module_graph(const char* reason = nullptr) NOT_CDS_JAVA_HEAP_RETURN; static bool is_dumping_full_module_graph() NOT_CDS_JAVA_HEAP_RETURN_(false); static void disable_loading_full_module_graph(const char* reason = nullptr) NOT_CDS_JAVA_HEAP_RETURN; static bool is_loading_full_module_graph() NOT_CDS_JAVA_HEAP_RETURN_(false); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16257#discussion_r1366367480 From matsaave at openjdk.org Fri Oct 20 03:38:19 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 20 Oct 2023 03:38:19 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v3] In-Reply-To: References: Message-ID: <_s0uKVoQ0XjWB6GHNHUQ-rSCM4uVPgtQObwu_32MJz0=.cdb028d2-1b39-4774-978e-92f521169853@github.com> > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: - Removed dead code in interpreters - Removed unused structures, improved set_method_handle and appendix_if_resolved ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15455/files - new: https://git.openjdk.org/jdk/pull/15455/files/9ce5f591..1c720ea0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=01-02 Stats: 180 lines in 17 files changed: 25 ins; 100 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From sspitsyn at openjdk.org Fri Oct 20 04:05:33 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 20 Oct 2023 04:05:33 GMT Subject: RFR: 8318484: Initial version of cdsConfig.hpp In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 05:56:53 GMT, Ioi Lam wrote: > This is the first step for [JDK-8318483 - Move CDS configuration management into cdsConfig.hpp](https://bugs.openjdk.org/browse/JDK-8318483) > > - Remove `Arguments::is_dumping_archive()` and `Arguments assert_is_dumping_archive()` > - Add the following new APIs > > > class CDSConfig { > static bool is_dumping_archive(); > static bool is_dumping_static_archive(); > static bool is_dumping_dynamic_archive(); > static bool is_dumping_heap(); > }; > > > - Convert some use of `DumpSharedSpaces` and `DynamicDumpSharedSpaces` to these new APIs > > (More APIs will be added in future sub tasks of [JDK-8318483](https://bugs.openjdk.org/browse/JDK-8318483)) Looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16257#pullrequestreview-1689170021 From duke at openjdk.org Fri Oct 20 05:38:09 2023 From: duke at openjdk.org (Liming Liu) Date: Fri, 20 Oct 2023 05:38:09 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v6] In-Reply-To: References: Message-ID: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Make the jtreg test checke the usage of THP ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15781/files - new: https://git.openjdk.org/jdk/pull/15781/files/98642e37..07d1326b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=04-05 Stats: 136 lines in 2 files changed: 85 ins; 51 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From duke at openjdk.org Fri Oct 20 05:44:40 2023 From: duke at openjdk.org (Liming Liu) Date: Fri, 20 Oct 2023 05:44:40 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v6] In-Reply-To: References: Message-ID: On Wed, 4 Oct 2023 14:00:04 GMT, Thomas Stuefe wrote: >> Liming Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Make the jtreg test checke the usage of THP > > src/hotspot/os/linux/os_linux.cpp line 2911: > >> 2909: if (::madvise(first, len, MADV_POPULATE_WRITE) == -1) { >> 2910: int err = errno; >> 2911: if (err == EINVAL) { // Not supported > > Would be nice to avoid repeated syscalls to madvise if this fails once; no reason to try again, then. I tested the performance of this patch on kernel 4.18 and 5.13, and found the repeat calls have no impact. So I would not change anything about this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1366496241 From rrich at openjdk.org Fri Oct 20 05:48:49 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 20 Oct 2023 05:48:49 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v23] In-Reply-To: References: <1HXBowkNZo1iyNgOVA6qZYcyz0alZ8r5FBMQ3FqAvTE=.8a95691e-f02d-486a-96c3-88d5ec836228@github.com> Message-ID: On Thu, 19 Oct 2023 13:51:10 GMT, Thomas Schatzl wrote: >> Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 36 additional commits since the last revision: >> >> - Use better name: _preprocessing_active_workers >> - Merge branch 'master' >> - Remove obsolete comment >> - Feedback Albert >> - Merge branch 'master' >> - Re-cleanup (was accidentally reverted) >> - Make sure to scan obj reaching in just once >> - Simplification suggested by Albert >> - Don't overlap card table processing with scavenging for simplicity >> - Cleanup >> - ... and 26 more: https://git.openjdk.org/jdk/compare/355d3adc...f7965512 > > src/hotspot/share/gc/parallel/psCardTable.hpp line 91: > >> 89: return end; >> 90: } >> 91: }; > > Could these implementations moved into the .cpp file? They are only every referenced by that and should be inlined anyway to not clog the interface/hpp file too much. Moved the class StripeShadowTable to psCardTable.cpp (and renamed it). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366499415 From duke at openjdk.org Fri Oct 20 05:54:06 2023 From: duke at openjdk.org (Liming Liu) Date: Fri, 20 Oct 2023 05:54:06 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v7] In-Reply-To: References: Message-ID: <-p-qDbsbPtk_7h79043ct8sTCcM6d-TiD62zh4c7Q0Q=.9691fe3d-0a85-4f6f-a876-9761a240f866@github.com> > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Liming Liu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Make the jtreg test check the usage of THP ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15781/files - new: https://git.openjdk.org/jdk/pull/15781/files/07d1326b..ed2c9da7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From rrich at openjdk.org Fri Oct 20 05:55:40 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 20 Oct 2023 05:55:40 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 14:37:06 GMT, Thomas Schatzl wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> preprocess_card_table_parallel should be private > > src/hotspot/share/gc/parallel/psCardTable.cpp line 242: > >> 240: SpinYield spin_yield; >> 241: while (Atomic::load_acquire(&_preprocessing_active_workers) > 0) { >> 242: spin_yield.wait(); > > I would prefer to have the synchronization as part of `scavenge_contents_parallel`; i.e. the logic there being > > Prepare Scavenge > Synchronize > Scavenge > > Here the synchronization feels out of place and surprising for a method that nowhere indicates that it is doing anything other than preprocessing the table. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366503671 From duke at openjdk.org Fri Oct 20 06:07:41 2023 From: duke at openjdk.org (Liming Liu) Date: Fri, 20 Oct 2023 06:07:41 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v7] In-Reply-To: References: Message-ID: <8m3vJwf85Nh4LSy-ry1oM3uJEkZifxnAhunZX6hWWso=.691eb01e-8337-4bf9-a044-91d54d6956bb@github.com> On Wed, 4 Oct 2023 14:02:43 GMT, Thomas Stuefe wrote: >> Liming Liu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Make the jtreg test check the usage of THP > > src/hotspot/share/runtime/os.cpp line 2108: > >> 2106: // granularity, so we can touch anywhere in a page. Touch at the >> 2107: // beginning of each page to simplify iteration. >> 2108: void* first = align_down(start, page_size); > > minor nit, since you are touching this, could you make it const too? (void* const) Touch needs a write anyway, and all related functions also do not use const here. So I would not add const for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1366511981 From dholmes at openjdk.org Fri Oct 20 06:12:36 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 20 Oct 2023 06:12:36 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms In-Reply-To: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: <7bp5nUgMUswwVTqXNaQcDDXRv60XauWA6KHdDDgjLSM=.08e9e79e-6685-480a-99bb-db8ad2e72c09@github.com> On Wed, 18 Oct 2023 18:58:58 GMT, Aleksey Shipilev wrote: > See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. > > Unfortunately, we cannot test these apart from the existing gtest. > > Additional testing: > - [x] linux-x86-server-fastdebug, atomic tests pass > - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) I'm afraid I've lost track of the history here. We had supports_cx8 to deal with exactly this problem. If a platform could not support 64-bit cmpxchg (and thus implicitly any 64-bit r-m-w operation) then it has to fall back to locking. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1772140277 From thartmann at openjdk.org Fri Oct 20 06:29:34 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 20 Oct 2023 06:29:34 GMT Subject: RFR: 8318489: Remove unused alignment_unit and alignment_offset In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 08:53:59 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16263#pullrequestreview-1689325339 From stefank at openjdk.org Fri Oct 20 06:32:44 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 20 Oct 2023 06:32:44 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 20:06:50 GMT, Johan Sj?len wrote: > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards Changes requested by stefank (Reviewer). Changes requested by stefank (Reviewer). src/hotspot/share/gc/parallel/psParallelCompact.cpp line 33: > 31: #include "code/codeCache.hpp" > 32: #include "compiler/oopMap.hpp" > 33: #include "gc/parallel/parMarkBitMap.inline.hpp" This uses a case-sensitive sort, whereas I think that when we add includes without using a sorting tool we would do so in a case-insensitive manner. I'm not sure we should make this change here. (The same goes for similar cases in the rest of the patch) src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 89: > 87: #include "runtime/safepointMechanism.hpp" > 88: #include "runtime/vmThread.hpp" > 89: #include "nmt/memTracker.hpp" This included end up at the wrong place. src/hotspot/share/nmt/memBaseline.cpp line 32: > 30: #include "runtime/safepoint.hpp" > 31: #include "nmt/memBaseline.hpp" > 32: #include "nmt/memTracker.hpp" Sort order src/hotspot/share/nmt/nmtPreInit.hpp line 33: > 31: #include "runtime/atomic.hpp" > 32: #endif > 33: #include "nmt/memTracker.hpp" The ASSERT Section should be moved down to after the main include block, or even better just include it without the guard. Might be nice to fix this while you are moving around the includes. src/hotspot/share/nmt/threadStackTracker.cpp line 31: > 29: #include "nmt/memTracker.hpp" > 30: #include "nmt/threadStackTracker.hpp" > 31: #include "nmt/virtualMemoryTracker.hpp" Sort order src/hotspot/share/nmt/virtualMemoryTracker.hpp line 33: > 31: #include "nmt/allocationSite.hpp" > 32: #include "nmt/nmtCommon.hpp" > 33: #include "utilities/linkedlist.hpp" This file didn't get an update to the include guard src/hotspot/share/services/mallocTracker.inline.hpp line 30: > 28: > 29: #include "services/mallocLimit.hpp" > 30: #include "services/mallocTracker.hpp" Missing include guard rename src/hotspot/share/services/nmtPreInit.hpp line 33: > 31: #include "runtime/atomic.hpp" > 32: #endif > 33: #include "services/memTracker.hpp" Missing include guard rename test/hotspot/gtest/nmt/test_nmt_buffer_overflow_detection.cpp line 31: > 29: #include "utilities/debug.hpp" > 30: #include "utilities/ostream.hpp" > 31: You make this change in some files but not all. Maybe revert this unrelated change? test/hotspot/gtest/nmt/test_nmt_cornercases.cpp line 26: > 24: > 25: #include "precompiled.hpp" > 26: You make this change in some files but not all. Maybe revert this unrelated change? ------------- PR Review: https://git.openjdk.org/jdk/pull/16276#pullrequestreview-1689312655 PR Review: https://git.openjdk.org/jdk/pull/16276#pullrequestreview-1689327479 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366529454 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366521538 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366522881 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366524596 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366524886 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366525477 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366530660 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366530318 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366526935 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366527038 From dholmes at openjdk.org Fri Oct 20 06:37:33 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 20 Oct 2023 06:37:33 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 20:06:50 GMT, Johan Sj?len wrote: > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards Looks okay for moving the NMT files. The unrelated include file ordering changes just made it harder to review though. Thanks. src/hotspot/os/linux/os_linux.cpp line 65: > 63: #include "runtime/threadCritical.hpp" > 64: #include "runtime/threadSMR.hpp" > 65: #include "runtime/threads.hpp" This was correct originally - lowercase sorts before upper. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16276#pullrequestreview-1689327994 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366530642 From rrich at openjdk.org Fri Oct 20 06:37:37 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 20 Oct 2023 06:37:37 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v24] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 18:35:03 GMT, Thomas Schatzl wrote: >> The reset is needed because the cache requires that queries are monotonic. There was an assertion checking monotonicity. Albert thought it wouldn't be needed. > > I meant the comment "Reset Cached Object". I also think that the code is required :) Resetting would be redundant if we checked `addr >= cached_obj.start_addr`. The logic couldn't be misused then either. I'll add a comment that the queries are expected to be monotonic. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366534590 From stuefe at openjdk.org Fri Oct 20 07:05:40 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 20 Oct 2023 07:05:40 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 20:06:50 GMT, Johan Sj?len wrote: > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards Mostly okay. Was overdue. Don't need another look if you fix Davids and Stefan's remarks. test/hotspot/gtest/nmt/test_nmt_locationprinting.cpp line 30: > 28: #include "nmt/mallocHeader.inline.hpp" > 29: #include "runtime/os.hpp" > 30: #include "nmt/memTracker.hpp" order ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16276#pullrequestreview-1689369969 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1366556167 From stefank at openjdk.org Fri Oct 20 07:10:37 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 20 Oct 2023 07:10:37 GMT Subject: RFR: 8316436: ContinuationWrapper uses unhandled nullptr oop [v2] In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 12:35:41 GMT, Stefan Karlsson wrote: >> The ZGC oop verification code in combination with CheckUnhandledOops finds an unhandled oop in ContinuationWrapper: >> >> >> Test java/lang/Thread/virtual/stress/Skynet.java#ZGenerational with ' -XX:+CheckUnhandledOops' crashes with >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (src/hotspot/share/gc/z/zAddress.inline.hpp:296), pid=986260, tid=986296 >> # assert(!assert_on_failure) failed: Has low-order bits set: 0xfffffffffffffff1 >> >> V [libjvm.so+0x1962fda] initialize_check_oop_function()::{lambda(oopDesc*)#1}::_FUN(oopDesc*)+0x5a (zAddress.inline.hpp:296) >> V [libjvm.so+0xa6d484] ContinuationWrapper::~ContinuationWrapper()+0x24 (oopsHierarchy.hpp:89) >> V [libjvm.so+0xa66c83] int freeze_internal >(JavaThread*, long*)+0x373 (continuationFreezeThaw.cpp:1584) >> V [libjvm.so+0xa6711b] int freeze >(JavaThread*, long*)+0x5b (continuationFreezeThaw.cpp:272) >> J 216 jdk.internal.vm.Continuation.doYield()I [java.base at 22-internal](mailto:java.base at 22-internal) (0 bytes) @ 0x00007f614c630875 [0x00007f614c630820+0x0000000000000055] >> >> >> This is the scenario that triggers this bug: >> 1) ContinuationWrapper is created on the stack >> 2) We enter a JRT_BLOCK section >> 3) Call ContinuationWrapper::done() >> 4) Exit the JRT_BLOCK >> 5) ~ContinuationWrapper is called >> >> (3) sets ContinuationWrapper::_continuation to nullptr >> (4) hits a safepoint and sets ContinuationWrapper::_continuation to 0xfffffffffffffff1 >> (5) uses ContinuationWrapper::_continuation in `_continuation != nullptr`, which triggers ZGC's verification code that finds the broken oop. >> >> So, this crashes with ZGC, but that's because ZGC finds a broken usage of _continuation. To show that this is still a problem with other GCs I added this assert: >> >> diff --git a/src/hotspot/share/runtime/javaThread.hpp b/src/hotspot/share/runtime/javaThread.hpp >> index 40205d324a6..80b60d0b7b8 100644 >> --- a/src/hotspot/share/runtime/javaThread.hpp >> +++ b/src/hotspot/share/runtime/javaThread.hpp >> @@ -258,7 +258,7 @@ class JavaThread: public Thread { >> >> public: >> void inc_no_safepoint_count() { _no_safepoint_count++; } >> - void dec_no_safepoint_count() { _no_safepoint_count--; } >> + void dec_no_safepoint_count() { _no_safepoint_count--; assert(_no_safepoint_count >= 0, "Catch G1 in the act!"); } >> #endif // ASSERT >> public: >> // These functions check conditions before possibly going to ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix thread argument Thanks for reviewing! I'm going to leave the _done variable usage as-is, because I find the code nice that way and I haven't seen any strong motivation why this should be guarded by the ASSERT ifdefs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15810#issuecomment-1772198162 From stefank at openjdk.org Fri Oct 20 07:10:38 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 20 Oct 2023 07:10:38 GMT Subject: Integrated: 8316436: ContinuationWrapper uses unhandled nullptr oop In-Reply-To: References: Message-ID: On Tue, 19 Sep 2023 06:51:16 GMT, Stefan Karlsson wrote: > The ZGC oop verification code in combination with CheckUnhandledOops finds an unhandled oop in ContinuationWrapper: > > > Test java/lang/Thread/virtual/stress/Skynet.java#ZGenerational with ' -XX:+CheckUnhandledOops' crashes with > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (src/hotspot/share/gc/z/zAddress.inline.hpp:296), pid=986260, tid=986296 > # assert(!assert_on_failure) failed: Has low-order bits set: 0xfffffffffffffff1 > > V [libjvm.so+0x1962fda] initialize_check_oop_function()::{lambda(oopDesc*)#1}::_FUN(oopDesc*)+0x5a (zAddress.inline.hpp:296) > V [libjvm.so+0xa6d484] ContinuationWrapper::~ContinuationWrapper()+0x24 (oopsHierarchy.hpp:89) > V [libjvm.so+0xa66c83] int freeze_internal >(JavaThread*, long*)+0x373 (continuationFreezeThaw.cpp:1584) > V [libjvm.so+0xa6711b] int freeze >(JavaThread*, long*)+0x5b (continuationFreezeThaw.cpp:272) > J 216 jdk.internal.vm.Continuation.doYield()I [java.base at 22-internal](mailto:java.base at 22-internal) (0 bytes) @ 0x00007f614c630875 [0x00007f614c630820+0x0000000000000055] > > > This is the scenario that triggers this bug: > 1) ContinuationWrapper is created on the stack > 2) We enter a JRT_BLOCK section > 3) Call ContinuationWrapper::done() > 4) Exit the JRT_BLOCK > 5) ~ContinuationWrapper is called > > (3) sets ContinuationWrapper::_continuation to nullptr > (4) hits a safepoint and sets ContinuationWrapper::_continuation to 0xfffffffffffffff1 > (5) uses ContinuationWrapper::_continuation in `_continuation != nullptr`, which triggers ZGC's verification code that finds the broken oop. > > So, this crashes with ZGC, but that's because ZGC finds a broken usage of _continuation. To show that this is still a problem with other GCs I added this assert: > > diff --git a/src/hotspot/share/runtime/javaThread.hpp b/src/hotspot/share/runtime/javaThread.hpp > index 40205d324a6..80b60d0b7b8 100644 > --- a/src/hotspot/share/runtime/javaThread.hpp > +++ b/src/hotspot/share/runtime/javaThread.hpp > @@ -258,7 +258,7 @@ class JavaThread: public Thread { > > public: > void inc_no_safepoint_count() { _no_safepoint_count++; } > - void dec_no_safepoint_count() { _no_safepoint_count--; } > + void dec_no_safepoint_count() { _no_safepoint_count--; assert(_no_safepoint_count >= 0, "Catch G1 in the act!"); } > #endif // ASSERT > public: > // These functions check conditions before possibly going to a safepoint. > > > To catch the broken nullptr check in: > > void allow_safepoint() { > ... This pull request has now been integrated. Changeset: 292aad2c Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/292aad2c4901f2ffba37274763e1cc617711918e Stats: 28 lines in 2 files changed: 8 ins; 12 del; 8 mod 8316436: ContinuationWrapper uses unhandled nullptr oop Reviewed-by: pchilanomate, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/15810 From tschatzl at openjdk.org Fri Oct 20 07:33:43 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 20 Oct 2023 07:33:43 GMT Subject: RFR: 8317350: Move code cache purging out of CodeCache::UnloadingScope In-Reply-To: References: Message-ID: <24PQu5F8UyuLPpat6tVbKzzpIXquqsZqUhJKeiNObTY=.3a9dd5fe-0284-4e39-8621-a17371d4cb59@github.com> On Thu, 19 Oct 2023 16:59:39 GMT, Ivan Walulya wrote: >> Hi all, >> >> please review this refactoring that moves actual code cache flushing/purging out of `CodeCache::UnloadingScope`. Reasons: >> >> * I prefer that a destructor does not do anything substantial - in some cases, 90% of time is spent in the destructor in that extracted method (due to https://bugs.openjdk.org/browse/JDK-8316959) >> * imho it does not fit the class which does nothing but sets/resets some code cache unloading behavior (probably should be renamed to `UnloadingBehaviorScope` too in a separate CR). >> * other existing methods at that level are placed out of that (or any other) scope object too - which is already the case for when doing concurrent unloading. >> * putting it there makes future logging of the various phases a little bit easier, not having `GCTraceTimer` et al. in various places. >> >> Testing: gha >> >> Thanks, >> Thomas > > Marked as reviewed by iwalulya (Reviewer). Thanks @walulyai @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/16011#issuecomment-1772224539 From tschatzl at openjdk.org Fri Oct 20 07:33:45 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 20 Oct 2023 07:33:45 GMT Subject: Integrated: 8317350: Move code cache purging out of CodeCache::UnloadingScope In-Reply-To: References: Message-ID: <4Y8Eh6q0_GqZyQwNc3JqCBY6yjXw-GfeMfZJ0wAuvHY=.57cdd6f9-61c2-49ee-a0b9-eab9bf96fbf5@github.com> On Mon, 2 Oct 2023 12:56:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring that moves actual code cache flushing/purging out of `CodeCache::UnloadingScope`. Reasons: > > * I prefer that a destructor does not do anything substantial - in some cases, 90% of time is spent in the destructor in that extracted method (due to https://bugs.openjdk.org/browse/JDK-8316959) > * imho it does not fit the class which does nothing but sets/resets some code cache unloading behavior (probably should be renamed to `UnloadingBehaviorScope` too in a separate CR). > * other existing methods at that level are placed out of that (or any other) scope object too - which is already the case for when doing concurrent unloading. > * putting it there makes future logging of the various phases a little bit easier, not having `GCTraceTimer` et al. in various places. > > Testing: gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: bd3bc2c6 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/bd3bc2c6181668b5856732666dc251136b7fbb99 Stats: 60 lines in 7 files changed: 26 ins; 6 del; 28 mod 8317350: Move code cache purging out of CodeCache::UnloadingScope Reviewed-by: ayang, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/16011 From jvernee at openjdk.org Fri Oct 20 07:35:14 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 20 Oct 2023 07:35:14 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v9] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: s390 updates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16201/files - new: https://git.openjdk.org/jdk/pull/16201/files/dd9e9741..5b7fc19e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=07-08 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From jvernee at openjdk.org Fri Oct 20 07:35:14 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 20 Oct 2023 07:35:14 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v7] In-Reply-To: <6YFCmKEmndViMxqdZTi0Jl-CvAz3LGJ-Et6bOnCqGVg=.7678f793-9c66-4d76-96f5-6b791e3172be@github.com> References: <6YFCmKEmndViMxqdZTi0Jl-CvAz3LGJ-Et6bOnCqGVg=.7678f793-9c66-4d76-96f5-6b791e3172be@github.com> Message-ID: On Thu, 19 Oct 2023 18:15:45 GMT, Sidraya Jayagond wrote: >> @feilongjiang Thanks, I've added it to the PR >> >> @TheRealMDoerr Note that `reg_offset` is filtered out, and not handled by `arg_shuffle`. So, it becomes the question whether shuffling a register, or adding an offset to an oop takes more bytes. I think most of the `native_invoker_size_per_arg` have some lenience built in though? (I did do that for x64 and aarch64 at least). So, I think it will be okay. >> >> I've added an additional test case to the existing test for this, which should stress the new code gen. (https://github.com/openjdk/jdk/pull/16201/commits/dd9e9741de3ca07e6a4cc561002255f98e1e3330) > > @JornVernee Please add below patch for addressing of @RealLucy review comments. > [0001-Address-Lutz-Schmidt-s-review-comments.txt](https://github.com/openjdk/jdk/files/13046224/0001-Address-Lutz-Schmidt-s-review-comments.txt) @sid8606 I've added the patch. There were a couple of commented out lines which I've removed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1772229009 From jvernee at openjdk.org Fri Oct 20 07:58:37 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 20 Oct 2023 07:58:37 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v8] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 22:21:33 GMT, Martin Doerr wrote: >> Jorn Vernee has updated the pull request incrementally with three additional commits since the last revision: >> >> - add stub size stress test for allowHeap >> - RISC-V impl >> - remove leftover debug log line > > This is probably the wrong place to ask this, but the sizing topic leads me to another issue: `RuntimeStub::new_runtime_stub` can return `nullptr` when the code cache is full and we would crash when trying to call `nullptr->print_on(&ls)`. Also, what will the Java code do when `downcallStubAddress` is 0 in the `NativeEntryPoint`? > Do you want me to file an issue? @TheRealMDoerr > So, we reserve space for 4 instructions, but emit 6 ones. We could bump up the size per arg. I think I want to get a review for the rest of the hotspot changes first though, to avoid having to go through that process twice, in case changes to the code gen are needed. > Would you mind using limit(84) which is the maximum? Why is 84 the maximum? That would result in 170 Java parameters at most, which is well within the 255 limit imposed by the VM spec. > `RuntimeStub::new_runtime_stub` can return `nullptr` when the code cache is full As far as I can see we would hit a `fatal` error when the allocation of the runtime stub fails? https://github.com/openjdk/jdk/blob/4812cabaa489e99481facddce69686a9fee29c44/src/hotspot/share/code/codeBlob.cpp#L425-L430 Note that `alloc_fail_is_fatal` is true by default, and we don't set it when calling `new_runtime_stub`. Potentially we could bubble up this allocation failure as a Java exception (and do the same for upcall stubs). That would have the benefit of maybe only bringing down a single Java thread, but is at the same time not really a recoverable error, i.e. we can not just disable the linker, like we can for e.g. the JIT compiler. So I'm not sure if the fatal error is really that problematic. We should probably do the same for upcall stubs either way though. I've filed: https://bugs.openjdk.org/browse/JDK-8318586 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1772259563 From jvernee at openjdk.org Fri Oct 20 08:02:46 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 20 Oct 2023 08:02:46 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v8] In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 07:55:45 GMT, Jorn Vernee wrote: > Why is 84 the maximum? That would result in 170 Java parameters at most, which is well within the 255 limit imposed by the VM spec. Ah, right, longs take up 2 slots. So, we get 84*3 = 252 + 4 (target address & return buffer) + 2 receiver method handle + NativeEntryPoint, which gives 258. Ok, I'll find out what the limit is and use that ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1772264403 From mbaesken at openjdk.org Fri Oct 20 08:14:12 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 20 Oct 2023 08:14:12 GMT Subject: RFR: JDK-8318587: refresh libraries cache on AIX in print_vm_info Message-ID: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> print_vm_info outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. ------------- Commit messages: - JDK-8318587 Changes: https://git.openjdk.org/jdk/pull/16284/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16284&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318587 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16284/head:pull/16284 PR: https://git.openjdk.org/jdk/pull/16284 From lkorinth at openjdk.org Fri Oct 20 08:34:35 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Fri, 20 Oct 2023 08:34:35 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v5] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 15:16:13 GMT, Leo Korinth wrote: >> Rename createJavaProcessBuilder so that it is not used by mistake instead of createTestJvm. >> >> I have used the following sed script: `find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"` >> >> Then I have manually modified ProcessTools.java. In that file I have moved one version of createJavaProcessBuilder so that it is close to the other version. Then I have added a javadoc comment in bold telling: >> >> /** >> * Create ProcessBuilder using the java launcher from the jdk to >> * be tested. >> * >> *

Please observe that you likely should use >> * createTestJvm() instead of this method because createTestJvm() >> * will add JVM options from "test.vm.opts" and "test.java.opts" >> * and this method will not do that. >> * >> * @param command Arguments to pass to the java command. >> * @return The ProcessBuilder instance representing the java command. >> */ >> >> >> I have used the name createJavaProcessBuilderIgnoreTestJavaOpts because of the name of Utils.prependTestJavaOpts that adds those VM flags. If you have a better name I could do a rename of the method. I kind of like that it is long and clumsy, that makes it harder to use... >> >> I have run tier 1 testing, and I have started more exhaustive testing. > > Leo Korinth has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Batch update using sed > > find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createLimitedJavaTestProcessBuilder(/g" > find -name "*.java" | xargs -n 1 sed -i -e "s/createTestJvm(/createJavaTestProcessBuilder(/g" > find -name "*.java" | xargs -n 1 sed -i -e "s/import static jdk.test.lib.process.ProcessTools.createJavaProcessBuilder/import static jdk.test.lib.process.ProcessTools.createLimitedJavaTestProcessBuilder/g" > - Merge branch '_master_jdk' into _8315097 > - explain usage > - Revert "8315097: Rename createJavaProcessBuilder" > > This reverts commit 4b2d171133c40c5c48114602bfd0d4da75531317. > - Revert "copyright" > > This reverts commit f3418c80cc0d4cbb722ee5e368f1a001e898b43e. > - Revert "fix static import" > > This reverts commit 27da71508aec9a4bec1c0ad07031887286580171. > - fix static import > - copyright > - 8315097: Rename createJavaProcessBuilder Just ignore what I just pushed, I will have a new version out sorry... ------------- PR Comment: https://git.openjdk.org/jdk/pull/15452#issuecomment-1772309579 PR Comment: https://git.openjdk.org/jdk/pull/15452#issuecomment-1772309992 From jvernee at openjdk.org Fri Oct 20 08:38:22 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 20 Oct 2023 08:38:22 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v10] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: bump up argument counts in TestLargeStub to their maximum ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16201/files - new: https://git.openjdk.org/jdk/pull/16201/files/5b7fc19e..fef40cdb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=08-09 Stats: 14 lines in 1 file changed: 8 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From azafari at openjdk.org Fri Oct 20 08:39:42 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 20 Oct 2023 08:39:42 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v6] In-Reply-To: <-ALHHMcYPfciG6g2sOT-XIEVTf1pA6XXa93eNXQamD4=.88329bf9-627b-4d78-93fc-299550fc2be0@github.com> References: <-ALHHMcYPfciG6g2sOT-XIEVTf1pA6XXa93eNXQamD4=.88329bf9-627b-4d78-93fc-299550fc2be0@github.com> Message-ID: On Thu, 28 Sep 2023 09:49:05 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > first arg of `find` casted to `uint*` @dholmes-ora, any more comments? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1772316760 From ayang at openjdk.org Fri Oct 20 08:40:37 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 20 Oct 2023 08:40:37 GMT Subject: RFR: 8318489: Remove unused alignment_unit and alignment_offset In-Reply-To: References: Message-ID: <-_T6yHzMpdzSr-pSwCReJyvsfmXPXJzvS0kwPJvUYrM=.e81a710b-7158-482b-8b73-334a588ee8f0@github.com> On Thu, 19 Oct 2023 08:53:59 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16263#issuecomment-1772317078 From ayang at openjdk.org Fri Oct 20 08:40:39 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 20 Oct 2023 08:40:39 GMT Subject: Integrated: 8318489: Remove unused alignment_unit and alignment_offset In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 08:53:59 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. This pull request has now been integrated. Changeset: 80992610 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/8099261050a6c021f193d6dac94caa11dccbb5ec Stats: 24 lines in 4 files changed: 0 ins; 24 del; 0 mod 8318489: Remove unused alignment_unit and alignment_offset Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/16263 From mdoerr at openjdk.org Fri Oct 20 08:43:39 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 20 Oct 2023 08:43:39 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v8] In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 07:59:25 GMT, Jorn Vernee wrote: > > Why is 84 the maximum? That would result in 170 Java parameters at most, which is well within the 255 limit imposed by the VM spec. > > Ah, right, longs take up 2 slots. So, we get 84*3 = 252 + 4 (target address & return buffer) + receiver method handle + NativeEntryPoint, which gives 258. Ok, I'll find out what the limit is and use that. > > P.S. it's 83 Thanks! I should have mentioned that 84 is the limit for linux PPC64le. 83 should work for all platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1772321142 From gcao at openjdk.org Fri Oct 20 08:52:41 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 20 Oct 2023 08:52:41 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v3] In-Reply-To: <_s0uKVoQ0XjWB6GHNHUQ-rSCM4uVPgtQObwu_32MJz0=.cdb028d2-1b39-4774-978e-92f521169853@github.com> References: <_s0uKVoQ0XjWB6GHNHUQ-rSCM4uVPgtQObwu_32MJz0=.cdb028d2-1b39-4774-978e-92f521169853@github.com> Message-ID: On Fri, 20 Oct 2023 03:38:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Removed dead code in interpreters > - Removed unused structures, improved set_method_handle and appendix_if_resolved Hello, I'm preparing the riscv part of the support, but I noticed that the x86 backend is reporting errors. zifeihan at plct-c8:~/jdk$ make test TEST="tier1" JTREG="TIMEOUT_FACTOR=16" Building target 'test' in configuration 'linux-x86_64-server-release' Running tests using JTREG control variable 'TIMEOUT_FACTOR=16' Test selection 'tier1', will run: * jtreg:test/hotspot/jtreg:tier1 * jtreg:test/jdk:tier1 * jtreg:test/langtools:tier1 * jtreg:test/jaxp:tier1 * jtreg:test/lib-test:tier1 Running test 'jtreg:test/hotspot/jtreg:tier1' An exception has occurred in the compiler (22-internal). Please file a bug against the Java compiler via the Java bug reporting page (https://bugreport.java.com) after checking the Bug Database (https://bugs.java.com) for duplicates. Include your program, the following diagnostic, and the parameters passed to the Java compiler in your report. Thank you. java.lang.invoke.WrongMethodTypeException: cannot convert MethodHandle(VarHandle,byte[],int)long to (VarHandle,byte[],int)int at java.base/java.lang.invoke.MethodHandle.asTypeUncached(MethodHandle.java:903) at java.base/java.lang.invoke.MethodHandle.asType(MethodHandle.java:870) at java.base/jdk.internal.util.ByteArray.getLong(ByteArray.java:188) at java.base/java.io.DataInputStream.readLong(DataInputStream.java:408) at jdk.compiler/com.sun.tools.javac.util.ByteBuffer.getLong(ByteBuffer.java:211) at jdk.compiler/com.sun.tools.javac.jvm.PoolReader.resolve(PoolReader.java:260) at jdk.compiler/com.sun.tools.javac.jvm.PoolReader$ImmutablePoolHelper.readIfNeeded(PoolReader.java:391) at jdk.compiler/com.sun.tools.javac.jvm.PoolReader.getConstant(PoolReader.java:206) at jdk.compiler/com.sun.tools.javac.jvm.ClassReader$3.read(ClassReader.java:855) at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readAttrs(ClassReader.java:1433) at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readMemberAttrs(ClassReader.java:1423) at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readField(ClassReader.java:2283) at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readClass(ClassReader.java:2645) at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readClassBuffer(ClassReader.java:2738) at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readClassFile(ClassReader.java:2762) at jdk.compiler/com.sun.tools.javac.code.ClassFinder.fillIn(ClassFinder.java:373) at jdk.compiler/com.sun.tools.javac.code.ClassFinder.complete(ClassFinder.java:302) at jdk.compiler/com.sun.tools.javac.code.Symtab$2.complete(Symtab.java:360) at jdk.compiler/com.sun.tools.javac.code.Symbol.complete(Symbol.java:682) at jdk.compiler/com.sun.tools.javac.code.Symbol$ClassSymbol.complete(Symbol.java:1418) at jdk.compiler/com.sun.tools.javac.code.Symbol$ClassSymbol.flags(Symbol.java:1334) at jdk.compiler/com.sun.tools.javac.code.Types$12.visitClassType(Types.java:2183) at jdk.compiler/com.sun.tools.javac.code.Types$12.visitClassType(Types.java:2159) at jdk.compiler/com.sun.tools.javac.code.Type$ClassType.accept(Type.java:1050) at jdk.compiler/com.sun.tools.javac.code.Types$DefaultTypeVisitor.visit(Types.java:4894) at jdk.compiler/com.sun.tools.javac.code.Types.asSuper(Types.java:2156) at jdk.compiler/com.sun.tools.javac.code.Types$12.visitClassType(Types.java:2179) at jdk.compiler/com.sun.tools.javac.code.Types$12.visitClassType(Types.java:2159) at jdk.compiler/com.sun.tools.javac.code.Type$ClassType.accept(Type.java:1050) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1772334734 From rrich at openjdk.org Fri Oct 20 09:02:50 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 20 Oct 2023 09:02:50 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v28] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Cleanup/improve comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/fd5d0725..7c20c9f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=26-27 Stats: 31 lines in 2 files changed: 16 ins; 12 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Fri Oct 20 09:13:48 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 20 Oct 2023 09:13:48 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v28] In-Reply-To: References: Message-ID: <70mj0_esW4xbhmSqjbtzuw5rdfUvuQlZYrLYsKiZzCw=.8712242c-d72d-402d-bb5a-53703058b6c7@github.com> On Fri, 20 Oct 2023 09:02:50 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup/improve comments I think I've addressed all remarks now. Let me know if I should revisit anything again. I haven't yet moved the summarizing comment from the definition of `scavenge_contents_parallel` in the cpp file to the header file. I thought it should remain close to the implementing code. Is that ok or should it be moved to the header? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1772366594 From adinn at openjdk.org Fri Oct 20 09:18:33 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 20 Oct 2023 09:18:33 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v3] In-Reply-To: <_s0uKVoQ0XjWB6GHNHUQ-rSCM4uVPgtQObwu_32MJz0=.cdb028d2-1b39-4774-978e-92f521169853@github.com> References: <_s0uKVoQ0XjWB6GHNHUQ-rSCM4uVPgtQObwu_32MJz0=.cdb028d2-1b39-4774-978e-92f521169853@github.com> Message-ID: <1fSRooLU5TUyIU9Z0BP-rjTvVptbh9sbOZzG1PU9u9g=.980c9301-61b0-4f7d-85b3-6ba74f55ec6d@github.com> On Fri, 20 Oct 2023 03:38:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Removed dead code in interpreters > - Removed unused structures, improved set_method_handle and appendix_if_resolved src/hotspot/share/oops/resolvedMethodEntry.hpp line 196: > 194: > 195: // Offsets > 196: static ByteSize klass_offset() { return byte_offset_of(ResolvedMethodEntry, _entry_specific._interface_klass); } Perhaps the getters for the entry specific union fields should include an assert to verify that the flags are consistent with the access being performed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1366707618 From tschatzl at openjdk.org Fri Oct 20 09:45:43 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 20 Oct 2023 09:45:43 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v28] In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 09:02:50 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). >> >> [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. >> >> Observations >> >> * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. >> >> * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. >> >> * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid ... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup/improve comments Please fix the `PS` prefix issue for `StripeShadowCardTable`. Maybe incorporate the other comment related changes too. Other than that it looks great! src/hotspot/share/gc/parallel/psCardTable.cpp line 141: > 139: } > 140: > 141: class StripeShadowCardTable { Unfortunately, due to C++ having a global namespace, please prefix with `PS` even if it is local here. src/hotspot/share/gc/parallel/psCardTable.cpp line 161: > 159: size_t stripe_byte_size = pointer_delta(end, start) * HeapWordSize; > 160: size_t copy_length = align_up(stripe_byte_size, _card_size) >> _card_shift; > 161: size_t clear_length = align_down(stripe_byte_size, _card_size) >> _card_shift; Some reordering of the comment, moving it closer to the code it describes, some rewording. Suggestion: size_t stripe_byte_size = pointer_delta(end, start) * HeapWordSize; size_t copy_length = align_up(stripe_byte_size, _card_size) >> _card_shift; // The end of the last stripe may not be card aligned as it is equal to old // gen top at scavenge start. We should not clear the card containing old gen // top if not card aligned because there can be promoted objects on that // same card. If it were marked dirty because of the promoted object and we // cleared it, we would loose a card mark. size_t clear_length = align_down(stripe_byte_size, _card_size) >> _card_shift; src/hotspot/share/gc/parallel/psCardTable.cpp line 216: > 214: // The "shadow" table is a copy of the card table entries of the current stripe. > 215: // It is used to separate card reading, clearing and redirtying which reduces > 216: // complexity significantly. That would be a perfect comment to put just before the `ShadowCardTable` class definition. Not seeing the point of putting this at the place we instantiate it. src/hotspot/share/gc/parallel/psCardTable.cpp line 284: > 282: CardValue* const end_card = byte_for(old_gen_top - 1) + 1; > 283: > 284: for ( /* empty */ ; cur_card < end_card; cur_card += num_cards_in_slice) { Suggestion: for (/* empty */; cur_card < end_card; cur_card += num_cards_in_slice) { To be consistent with the other place in the file. Regular initializations also do not have a space after the bracket. src/hotspot/share/gc/parallel/psCardTable.cpp line 382: > 380: preprocess_card_table_parallel(object_start, old_gen_bottom, old_gen_top, stripe_index, n_stripes); > 381: > 382: // Sync with other workers Suggestion: // Sync with other workers. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14846#pullrequestreview-1689634534 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366726043 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366729626 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366731067 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366731713 PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366733572 From stuefe at openjdk.org Fri Oct 20 10:35:40 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 20 Oct 2023 10:35:40 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v3] In-Reply-To: References: <1yvPQl57ipfn7sd_gi1pN1y731drqlMm7WPrzrtQyww=.97958bba-b0d6-460e-a974-bf89314aa733@github.com> Message-ID: On Mon, 9 Oct 2023 07:23:38 GMT, Liming Liu wrote: > > Side note, does anyone know why we pretouch memory for _explicit_ large pages? I would have thought that memory is already online and as "live" as it can get once it is mmapped. > > `UseTransparentHugePages` just gives kernel advice to use transparent huge pages. It is not regular huge pages that need to be allocated explicitly through /sys/kernel/mm/hugepages. Yes, I know, but my question was why we bother to pretouch at all if we run with explicit large pages, so with +UseHugePages. Nothing to do with THP. Because explicit large pages are allocated right at reservation from the huge page pool, and the huge page pool looks like it already counts toward dirty pages, whether or not someone allocated from it. Therefore I wonder whether they are already paged in. Would be a question for @kstefanj maybe. I wonder if we could just omit pretouching in that case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1772492494 From shade at openjdk.org Fri Oct 20 10:54:42 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 20 Oct 2023 10:54:42 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v2] In-Reply-To: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: > See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. > > Unfortunately, we cannot test these apart from the existing gtest. > > Additional testing: > - [x] linux-x86-server-fastdebug, atomic tests pass > - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into JDK-8316961-fallback-atomics - Work ------------- Changes: https://git.openjdk.org/jdk/pull/16252/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16252&range=01 Stats: 93 lines in 5 files changed: 82 ins; 8 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16252.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16252/head:pull/16252 PR: https://git.openjdk.org/jdk/pull/16252 From jsjolen at openjdk.org Fri Oct 20 11:34:50 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 20 Oct 2023 11:34:50 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 20:06:50 GMT, Johan Sj?len wrote: > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards Hi, Thank you for looking through these changes. I applied your comments and also did a run through to look for incorrectly ordered includes. For the gtest source files I separated the includes in a consistent manner, they all look like this pattern now: ```c++ #include "precompiled.hpp" #include "memory/allocation.hpp" #include "nmt/mallocHeader.inline.hpp" #include "nmt/memTracker.hpp" #include "runtime/os.hpp" #include "testutils.hpp" #include "unittest.hpp" ------------- PR Comment: https://git.openjdk.org/jdk/pull/16276#issuecomment-1772570292 From jsjolen at openjdk.org Fri Oct 20 11:34:49 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 20 Oct 2023 11:34:49 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v2] In-Reply-To: References: Message-ID: > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fixed reviewed changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16276/files - new: https://git.openjdk.org/jdk/pull/16276/files/2fd7c355..08e6f4bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=00-01 Stats: 43 lines in 15 files changed: 19 ins; 15 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16276/head:pull/16276 PR: https://git.openjdk.org/jdk/pull/16276 From shade at openjdk.org Fri Oct 20 12:03:55 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 20 Oct 2023 12:03:55 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v3] In-Reply-To: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: > See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. > > Unfortunately, we cannot test these apart from the existing gtest. > > Additional testing: > - [x] linux-x86-server-fastdebug, atomic tests pass > - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Protect 64-bit tests with supports_cx8() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16252/files - new: https://git.openjdk.org/jdk/pull/16252/files/f2c523ae..e7677da5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16252&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16252&range=01-02 Stats: 15 lines in 1 file changed: 15 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16252.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16252/head:pull/16252 PR: https://git.openjdk.org/jdk/pull/16252 From shade at openjdk.org Fri Oct 20 12:03:58 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 20 Oct 2023 12:03:58 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v2] In-Reply-To: References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Fri, 20 Oct 2023 10:54:42 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8316961-fallback-atomics > - Work All right, hear me out. I am trying to briefly recap where we are and where we are going from here. >From the perspective of caller, the `Atomic` contract is ambiguous. On one hand, it says to check for `supports_cx8()`: // Atomic operations on int64 types are not available on all 32-bit // platforms. If atomic ops on int64 are defined here they must only // be used from code that verifies they are available at runtime and // can provide an alternative action if not - see supports_cx8() for // a means to test availability. ...and on the other hand, it requires platforms to implement 8-byte CAS: // Platform-specific implementation of cmpxchg. Support for sizes // of 1, 4, and 8 are required. ... template struct PlatformCmpxchg; This PR implicitly relies on second assumption. I think the only platform left that needs the `supports_cx8()` is ARM <= 6. The fact that current ARM code does not do consistent support for 64-bit atomics for all platforms is regrettable. To recap, while ARM 7 works fine, the situation with ARM <=6 is more complicated: Before this PR: - `Atomic` unit test would pass - Attempt to use 8-byte cmpxchg would fail at runtime (either assert or `stop` in stub) - Attempt to use 8-byte add or xchg would fail to link After this PR: - `Atomic` unit test would pass (now that it checks for `supports_cx8()`) - Attempt to use 8-byte cmpxchg would fail at runtime (either assert or `stop` in stub) - Attempt to use 8-byte add or xchg would fail at runtime (either assert or `stop` in stub) So, I don't think this qualifies as regression for ARM <= 6: we are trading the link-time failure for all ARM platforms for clean runtime failure on ARM <= 6 platform, *if* we violate the `supports_cx8()` contract. What we gain with this PR, is that we are able to do 64-bit atomics when `supports_cx8()` is actually true on x86_32 and ARM v7, even if we don't check it. I dabbled a little in trying to implement 8-byte CAS (locked) implementation for ARM <= 6, and I think it would require some rewrites. There are cases, for example, when we are doing creepy non-atomic stuff when `!os:is_MP` if no hardware support is available. If we go locking route, I think we would also need to check if the atomic load/stores need to be covered by the same lock. Once/if we fix ARM <= 6, that would eliminate the need for `supports_cx8()` altogether. And I think that is a good thing, because frankly it should not be an `Atomic` caller responsibility to handle obscure platforms in 2023. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1772605369 From mdoerr at openjdk.org Fri Oct 20 12:12:28 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 20 Oct 2023 12:12:28 GMT Subject: RFR: JDK-8318587: refresh libraries cache on AIX in print_vm_info In-Reply-To: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> References: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> Message-ID: On Fri, 20 Oct 2023 08:07:46 GMT, Matthias Baesken wrote: > print_vm_info outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16284#pullrequestreview-1689898398 From jsjolen at openjdk.org Fri Oct 20 12:14:52 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 20 Oct 2023 12:14:52 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v3] In-Reply-To: References: Message-ID: > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge remote-tracking branch 'origin/master' into move-nmt - Fixed reviewed changes - Move NMT to its own subdirectory ------------- Changes: https://git.openjdk.org/jdk/pull/16276/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=02 Stats: 502 lines in 100 files changed: 211 ins; 210 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/16276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16276/head:pull/16276 PR: https://git.openjdk.org/jdk/pull/16276 From jsjolen at openjdk.org Fri Oct 20 12:33:50 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 20 Oct 2023 12:33:50 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v4] In-Reply-To: References: Message-ID: > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Missed this include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16276/files - new: https://git.openjdk.org/jdk/pull/16276/files/c2b14b34..1647b41e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16276/head:pull/16276 PR: https://git.openjdk.org/jdk/pull/16276 From stuefe at openjdk.org Fri Oct 20 12:43:28 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 20 Oct 2023 12:43:28 GMT Subject: RFR: JDK-8318587: refresh libraries cache on AIX in print_vm_info In-Reply-To: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> References: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> Message-ID: On Fri, 20 Oct 2023 08:07:46 GMT, Matthias Baesken wrote: > print_vm_info outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. Does it have to be in shared code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16284#issuecomment-1772669917 From jsjolen at openjdk.org Fri Oct 20 12:49:46 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 20 Oct 2023 12:49:46 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v5] In-Reply-To: References: Message-ID: > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fix messed up include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16276/files - new: https://git.openjdk.org/jdk/pull/16276/files/1647b41e..4dfb027e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=03-04 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16276/head:pull/16276 PR: https://git.openjdk.org/jdk/pull/16276 From mbaesken at openjdk.org Fri Oct 20 12:54:36 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 20 Oct 2023 12:54:36 GMT Subject: RFR: JDK-8318587: refresh libraries cache on AIX in print_vm_info In-Reply-To: References: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> Message-ID: <3gubpK3BTEiil0crxODFplasqdYpq_A2tS0yQfg1hJE=.7869b753-f3cb-45f0-8357-420451f64189@github.com> On Fri, 20 Oct 2023 12:40:38 GMT, Thomas Stuefe wrote: > Does it have to be in shared code? Good question, maybe a move to os_aix.cpp os::print_dll_info would be possible ? On the other hand this would cause reloading also in the crash case and currently I am not sure if this is a good idea . ------------- PR Comment: https://git.openjdk.org/jdk/pull/16284#issuecomment-1772684971 From rrich at openjdk.org Fri Oct 20 13:13:50 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 20 Oct 2023 13:13:50 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v28] In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 09:36:34 GMT, Thomas Schatzl wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup/improve comments > > src/hotspot/share/gc/parallel/psCardTable.cpp line 216: > >> 214: // The "shadow" table is a copy of the card table entries of the current stripe. >> 215: // It is used to separate card reading, clearing and redirtying which reduces >> 216: // complexity significantly. > > That would be a perfect comment to put just before the `ShadowCardTable` class definition. Not seeing the point of putting this at the place we instantiate it. I will move the comment to the decl. of `PSStripeShadowCardTable`. I put it in `PSCardTable::process_range` because `PSStripeShadowCardTable` feels like an implementation detail of it. I'd move the whole class there if it wasn't too big. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14846#discussion_r1366956932 From igavrilin at openjdk.org Fri Oct 20 13:13:55 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Fri, 20 Oct 2023 13:13:55 GMT Subject: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v6] In-Reply-To: References: Message-ID: > Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. > CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. > Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. > > Tests: > Performance tests on t-head board: > With intrinsics: > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms > MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms > MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms > MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms > > Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms > MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms > MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms > MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms > > Regression tests: tier1, hotspot:tier2 on risc-v board. > > Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. > Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: > > diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java > index 6cd1353907e..0bee25366bf 100644 > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -143,12 +143,12 @@ public double ceilDouble() { > > @Benchmark > public double copySignDouble() { > - return Math.copySign(double81, doubleNegative12); > + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); > } > > @Benchmark > public float copySignFloat() { > - return Math.copySign(floatNegative99, float1); > + return Math.copySign(floatNegative99, float1) + Math.copySign(eFloat, float1) + Math.copySign... Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Fix pipe classes inside copysign ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16186/files - new: https://git.openjdk.org/jdk/pull/16186/files/c79fb9e6..b6e0b569 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16186&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16186&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16186.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16186/head:pull/16186 PR: https://git.openjdk.org/jdk/pull/16186 From stuefe at openjdk.org Fri Oct 20 13:16:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 20 Oct 2023 13:16:37 GMT Subject: RFR: JDK-8318587: refresh libraries cache on AIX in print_vm_info In-Reply-To: <3gubpK3BTEiil0crxODFplasqdYpq_A2tS0yQfg1hJE=.7869b753-f3cb-45f0-8357-420451f64189@github.com> References: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> <3gubpK3BTEiil0crxODFplasqdYpq_A2tS0yQfg1hJE=.7869b753-f3cb-45f0-8357-420451f64189@github.com> Message-ID: <3aU5mK7GZs1l4g06GIkOOwsOTsU7QxAqlwu4f8sKAFw=.30121098-2224-4bda-8a24-ef26fe2dcf07@github.com> On Fri, 20 Oct 2023 12:51:24 GMT, Matthias Baesken wrote: > > Does it have to be in shared code? > > Good question, maybe a move to os_aix.cpp os::print_dll_info would be possible ? On the other hand this would cause reloading also in the crash case and currently I am not sure if this is a good idea . OTOH in crash situations you want dll info to be up to date, for callstacks and whatnot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16284#issuecomment-1772719302 From rrich at openjdk.org Fri Oct 20 13:18:58 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 20 Oct 2023 13:18:58 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v29] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Accepting Thomas' (smaller) suggestions Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/7c20c9f1..67416b20 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=27-28 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From mbaesken at openjdk.org Fri Oct 20 13:21:28 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 20 Oct 2023 13:21:28 GMT Subject: RFR: JDK-8318587: refresh libraries cache on AIX in print_vm_info In-Reply-To: <3aU5mK7GZs1l4g06GIkOOwsOTsU7QxAqlwu4f8sKAFw=.30121098-2224-4bda-8a24-ef26fe2dcf07@github.com> References: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> <3gubpK3BTEiil0crxODFplasqdYpq_A2tS0yQfg1hJE=.7869b753-f3cb-45f0-8357-420451f64189@github.com> <3aU5mK7GZs1l4g06GIkOOwsOTsU7QxAqlwu4f8sKAFw=.30121098-2224-4bda-8a24-ef26fe2dcf07@github.com> Message-ID: On Fri, 20 Oct 2023 13:13:24 GMT, Thomas Stuefe wrote: > OTOH in crash situations you want dll info to be up to date, for callstacks and whatnot. Yes true, of course the outdated lib cache is a problem in case of crashes (and not only in theory, also with some real world test crashes we faced recently on AIX). But currently the `LoadedLibraries::reload()` is doing quite a lot of different allocs, this can be problematic in some crash situations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16284#issuecomment-1772728418 From rrich at openjdk.org Fri Oct 20 13:27:57 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 20 Oct 2023 13:27:57 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v30] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Review Thomas (PSStripeShadowCardTable) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/67416b20..a4430424 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=28-29 Stats: 14 lines in 2 files changed: 5 ins; 5 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From rrich at openjdk.org Fri Oct 20 13:34:59 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 20 Oct 2023 13:34:59 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v31] In-Reply-To: References: Message-ID: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the geometric mean of the duration of the last 5 young collections (lower is better). > > [BigArrayInOldGenRR.pdf](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.pdf)([BigArrayInOldGenRR.ods](https://cr.openjdk.org/~rrich/webrevs/8310031/BigArrayInOldGenRR.ods)) presents the benchmark results with 1 to 64 gc threads. > > Observations > > * JDK22 scales inversely. Adding gc threads prolongues young collections. With 32 threads young collections take ~15x longer than single threaded. > > * Fixed JDK22 scales well. Adding gc theads reduces the duration of young collections. With 32 threads young collections are 5x shorter than single threaded. > > * With just 1 gc thread there is a regression. Young collections are 1.5x longer with the fix. I assume the reason is that the iteration over the array elements is interrupted at the end of a stripe which makes it less efficient. The prize for parallelization is paid without actually doing it. Also ParallelGC will use at lea... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Forgot to move comment to PSStripeShadowCardTable. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14846/files - new: https://git.openjdk.org/jdk/pull/14846/files/a4430424..71b08484 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14846&range=29-30 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14846.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14846/head:pull/14846 PR: https://git.openjdk.org/jdk/pull/14846 From eosterlund at openjdk.org Fri Oct 20 13:53:29 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 20 Oct 2023 13:53:29 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: References: Message-ID: <6lpIn9ZaugJ1TbQHiYHe_BUhWOjNeuIcGkWO0SEpr4g=.2d2975e7-c102-4ede-ad46-06bf44277614@github.com> On Tue, 10 Oct 2023 04:24:05 GMT, Dean Long wrote: > Wouldn't it be better to put a cross_modify_fence() at the end of BarrierSetNMethod::nmethod_entry_barrier()? I don't see any code patching after that. Then we don't need it in the platform-specific generate_method_entry_barrier(). And the cross_modify_fence() could be condition depending on if code was actually patched, which it sounds like doesn't happen on aarch64 unless Generational ZGC is used? That's a good point. Perhaps BarrierSetNMethod::nmethod_stub_entry_barrier is the better place though, as we sometimes call the nmethod entry barrier from the runtime. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1772778461 From lucy at openjdk.org Fri Oct 20 14:00:35 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 20 Oct 2023 14:00:35 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v10] In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 08:38:22 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > bump up argument counts in TestLargeStub to their maximum s390 part looks good now. I did not check the shared code nor the other CPU architectures. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16201#pullrequestreview-1690103693 From lucy at openjdk.org Fri Oct 20 14:03:37 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 20 Oct 2023 14:03:37 GMT Subject: RFR: JDK-8318587: refresh libraries cache on AIX in print_vm_info In-Reply-To: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> References: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> Message-ID: On Fri, 20 Oct 2023 08:07:46 GMT, Matthias Baesken wrote: > print_vm_info outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16284#pullrequestreview-1690110412 From stuefe at openjdk.org Fri Oct 20 14:07:33 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 20 Oct 2023 14:07:33 GMT Subject: RFR: JDK-8318587: refresh libraries cache on AIX in print_vm_info In-Reply-To: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> References: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> Message-ID: On Fri, 20 Oct 2023 08:07:46 GMT, Matthias Baesken wrote: > print_vm_info outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. ok ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16284#pullrequestreview-1690119846 From rrich at openjdk.org Fri Oct 20 14:28:39 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 20 Oct 2023 14:28:39 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v11] In-Reply-To: References: Message-ID: On Wed, 27 Sep 2023 14:13:23 GMT, Thomas Schatzl wrote: >> Richard Reingruber has updated the pull request incrementally with five additional commits since the last revision: >> >> - Eliminate special case for scanning the large array end >> - First card of large array should be cleared if dirty >> - Do all large array scanning in separate method >> - Limit stripe size to 1m with at least 8 threads >> - Small clean-ups > > Hi, > >> > I experimented with the aforementioned read-only card table idea a bit and here is the draft: >> > https://github.com/openjdk/jdk/compare/master...albertnetymk:jdk:pgc-precise-obj-arr?expand=1 >> >> This looks very nice! The code is a lot easier to follow than the baseline and this pr. >> >> With your draft I found out too that the regressions with just 2 threads come from the remaining `object_start` calls. Larger stripes mean fewer of them. The caching used in your draft is surly better. >> >> So by default 1 card table byte per 512b card is needed. The shadow card table will require 2M per gigabyte used old generation. I guess that's affordable. >> >> Would you think that your solution can be backported? > > I had a brief look at @albertnetymk's suggestion, a few comments: > > * it uses another card table - while "just" another 0.2% of the heap, we should try to avoid such regressions. G1 also does not need another card table... maybe some more effort should be put into optimizing that one away. > * obviously allocating and freeing during the pause is suboptimal wrt to pause time so the prototype should be improved in that regard :) > * the copying will stay (if there is a second card table), I would be interested in pause time changes for more throughput'y applications (jbb2005, timefold/optaplanner https://timefold.ai/blog/2023/java-21-performance) > * anything can be backported, but the question is whether the individual maintainers of these versions are going to. It does have a good case though which may make it easier to convince maintainers. > > Hth, > Thomas Thanks a lot @tschatzl for reviewing. I hope I've covered all of your feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1772842725 From igavrilin at openjdk.org Fri Oct 20 14:34:48 2023 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Fri, 20 Oct 2023 14:34:48 GMT Subject: Integrated: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics In-Reply-To: References: Message-ID: <2KZQtr80bQETNa18fg-tFW79W3iIAuq0AFTn2mzl2Fg=.727f8de4-0f69-45db-a179-f1b11269fd29@github.com> On Fri, 13 Oct 2023 15:36:56 GMT, Ilya Gavrilin wrote: > Hi all, please review this changes into risc-v floating point copysign and signum intrinsics. > CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic. > Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value. > > Tests: > Performance tests on t-head board: > With intrinsics: > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 34156.580 ? 76.272 ops/ms > MathBench.copySignFloat 0 thrpt 8 34181.731 ? 38.182 ops/ms > MathBench.signumDouble 0 thrpt 8 31977.258 ? 1122.327 ops/ms > MathBench.signumFloat 0 thrpt 8 31836.852 ? 56.013 ops/ms > > Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt 8 31000.996 ? 943.094 ops/ms > MathBench.copySignFloat 0 thrpt 8 30678.016 ? 28.087 ops/ms > MathBench.signumDouble 0 thrpt 8 25435.010 ? 2047.085 ops/ms > MathBench.signumFloat 0 thrpt 8 25257.058 ? 79.175 ops/ms > > Regression tests: tier1, hotspot:tier2 on risc-v board. > > Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`. > Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests: > > diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java > index 6cd1353907e..0bee25366bf 100644 > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -143,12 +143,12 @@ public double ceilDouble() { > > @Benchmark > public double copySignDouble() { > - return Math.copySign(double81, doubleNegative12); > + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12); > } > > @Benchmark > public float copySignFloat() { > - return Math.copySign(floatNegative99, float1); > + return Math.copySign(floatNegative99, float1) + Math.copySign(eFloat, float1) + Math.copySign... This pull request has now been integrated. Changeset: 5a97411f Author: Ilya Gavrilin Committer: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/5a97411f857b0bc9e70b417efa76a5fd5f887fe0 Stats: 90 lines in 5 files changed: 89 ins; 0 del; 1 mod 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics Reviewed-by: fyang, vkempik ------------- PR: https://git.openjdk.org/jdk/pull/16186 From eosterlund at openjdk.org Fri Oct 20 14:59:38 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 20 Oct 2023 14:59:38 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 04:31:42 GMT, Dean Long wrote: > > The assumption is that if the nmethod immediate oops are patched first, and the guard value (immediate of the cmp instruction) is patched after, then if a thread sees the new cmp instruction, it will also see the new oop immediates. And that is indeed what the "asynchronous" cross modifying code description ensures will work in the AMD APM. So that all checks out. > > I guess this is a separate issue from this patch, but where does the AMD APM guarantee that? In the APM, volume 2 (cf. https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf), section 7.6.1 under "Asynchronous modification", it says "" > > The assumption is that if the nmethod immediate oops are patched first, and the guard value (immediate of the cmp instruction) is patched after, then if a thread sees the new cmp instruction, it will also see the new oop immediates. And that is indeed what the "asynchronous" cross modifying code description ensures will work in the AMD APM. So that all checks out. > > I guess this is a separate issue from this patch, but where does the AMD APM guarantee that? Hmm, it used to be in Volume 2, section 7.6.1. But in the latest revision, 3.41 from this summer, I can't find it any more. Strange. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1772895987 From eosterlund at openjdk.org Fri Oct 20 14:59:41 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 20 Oct 2023 14:59:41 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: <0p2dyy8w_5WbR_oeNqBVnT0F8k0_02sj-DlZ4jBOXaM=.9da09560-9ae6-4e2c-86fa-d1882911eb31@github.com> References: <0p2dyy8w_5WbR_oeNqBVnT0F8k0_02sj-DlZ4jBOXaM=.9da09560-9ae6-4e2c-86fa-d1882911eb31@github.com> Message-ID: On Wed, 11 Oct 2023 01:13:59 GMT, Dean Long wrote: > In particular, I'm wondering about branch prediction. The "beyond reach of the thread's current control flow" to me only rules out pre-fetching code that is truly unreachable (ignoring unconditional branches). What about this scenario: > > 1. Thread 1 reaches 1st instruction of nmethod, and predicts that the entry barrier slow path branch will not be taken, so it loads the some number of instructions past the branch into the pipeline, including instructions with oop immediates. > 2. Before Thread 1 reaches the entry barrier compare, another thread calls the same nmethod, takes the slow path, patches oops in instructions, and disarms the entry barrier > 3. Thread 1 sees the disarmed conditional branch and continues to execute the previously fetched pipeline which contains stale oops. That is precisely what the asynchronous cross modifying code has to take care of, and was explicitly mentioned before revision 3.41. And that's precisely why we don't do like AArch64 where an explicit epoching scheme forces it to synchronous cross modifying code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1772900035 From aph at openjdk.org Fri Oct 20 15:34:36 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 20 Oct 2023 15:34:36 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: References: Message-ID: <59q2I50auQj6g_46nsbgF8xPfVC-ahFPhXfz-C6jE8Q=.c5f3fdc2-b066-4d83-8f80-2c2ce6a1b888@github.com> On Fri, 20 Oct 2023 14:54:58 GMT, Erik ?sterlund wrote: > > > > I guess this is a separate issue from this patch, but where does the AMD APM guarantee that? > > Hmm, it used to be in Volume 2, section 7.6.1. But in the latest revision, 3.41 from this summer, I can't find it any more. Strange. I wonder if they may be making it up as they go along. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1772960890 From eosterlund at openjdk.org Fri Oct 20 16:03:08 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 20 Oct 2023 16:03:08 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: References: Message-ID: On Tue, 20 Jun 2023 08:26:08 GMT, Erik ?sterlund wrote: >> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). >> In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. >> >> The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. >> >> I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Typo in comment I moved the cross modify fence into the runtime, which reduces clutter in the patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1773001883 From eosterlund at openjdk.org Fri Oct 20 16:03:08 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 20 Oct 2023 16:03:08 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v3] In-Reply-To: References: Message-ID: > In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). > In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. > > The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. > > I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Move cross modify fence to the runtime code - Merge branch 'master' into 8310239_cross_modify_nmethod_entry - Typo in comment - 8310239: Add missing cross modifying fence in nmethod entry barriers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14543/files - new: https://git.openjdk.org/jdk/pull/14543/files/329625d9..034c94be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14543&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14543&range=01-02 Stats: 327788 lines in 6892 files changed: 147438 ins; 113457 del; 66893 mod Patch: https://git.openjdk.org/jdk/pull/14543.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14543/head:pull/14543 PR: https://git.openjdk.org/jdk/pull/14543 From tschatzl at openjdk.org Fri Oct 20 16:07:40 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 20 Oct 2023 16:07:40 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v31] In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 13:34:59 GMT, Richard Reingruber wrote: >> This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. >> >> #### Implementation (Updated 2023-10-20) >> >> Comment copied from `PSCardTable::scavenge_contents_parallel`: >> >> ```c++ >> // Scavenging and accesses to the card table are strictly limited to the stripe. >> // In particular scavenging of an object crossing stripe boundaries is shared >> // among the threads assigned to the stripes it resides on. This reduces >> // complexity and enables shared scanning of large objects. >> // It requires preprocessing of the card table though where imprecise card marks of >> // objects crossing stripe boundaries are propagated to the first card of >> // each stripe covered by the individual object. >> >> >> The baseline was refactored to make use of a read-only copy of the card table. That "shadow" table (`PSStripeShadowCardTable`) separates reading, clearing and redirtying of table entries which allows for a much simpler implementation. >> >> Scanning of object arrays is limited to dirty card chunks. >> >> ## Everything below refers to the Outdated Initial Implementation >> >> The algorithm to share scanning large arrays is supposed to be a straight >> forward extension of the scheme implemented in >> `PSCardTable::scavenge_contents_parallel`. >> >> - A worker scans the part of a large array located in its stripe >> >> - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. >> >> - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) >> >> The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. >> >> #### Performance testing >> >> ##### BigArrayInOldGenRR.java >> >> [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its refere... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Forgot to move comment to PSStripeShadowCardTable. Ship it! :) ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14846#pullrequestreview-1690399343 From john.r.rose at oracle.com Fri Oct 20 20:00:08 2023 From: john.r.rose at oracle.com (John Rose) Date: Fri, 20 Oct 2023 13:00:08 -0700 Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v3] In-Reply-To: References: Message-ID: On 20 Oct 2023, at 9:03, Erik ?sterlund wrote: >> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). Pages 205 and 206 in the AMD doc talk about self-modifying code and then (what we care about) cross-modifying code. It then goes on to discuss asynchronous support for CMC (which is the part we care most about for high-performance code) and synchronous CMC. It?s really well written; kudos to AMD. And it?s friendly to us. Specifically, they seem to have worked hard to make the instruction fetcher read in a total store order, respecting the ordering of writes from whatever gremlin is modifying the code stream. Also, any derived state (such as decodings of fetched instructions) are invalidated the right way after I$ changes. All that makes easier our job, of running in the fast lane, which requires knowing exactly what are the boundaries and limits of the fast lane, so we don?t fall off the icy cliff immediately to our left. By contrast, the Intel SDM, at 9.1.3 (Handling Self- and Cross-Modifying Code), only covers the synchronous case of CMC, and is rather short. I know the Intel architects have thought about this problem too, of asynchronous CMC. And we may have have verbally discussed with them rules like the ones AMD has published. I don?t recall seeing a write-up from Intel on this specific subject, though. Maybe Sandhya or another Intel person can help me find it? ? John From dlong at openjdk.org Fri Oct 20 20:42:34 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 20 Oct 2023 20:42:34 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v3] In-Reply-To: References: Message-ID: <06jkAzl3X5Y61E07bhzOjGvNWEmrprldqY3XNfAlvJE=.db3b52ca-97db-40da-a023-441209cdca2e@github.com> On Fri, 20 Oct 2023 16:03:08 GMT, Erik ?sterlund wrote: >> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). >> In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. >> >> The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. >> >> I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Move cross modify fence to the runtime code > - Merge branch 'master' into 8310239_cross_modify_nmethod_entry > - Typo in comment > - 8310239: Add missing cross modifying fence in nmethod entry barriers This looks much better. Is the cross modify fence always needed, or can it be conditional based on nmethod_patching_type()? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1773366181 From eosterlund at openjdk.org Fri Oct 20 20:55:37 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 20 Oct 2023 20:55:37 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v3] In-Reply-To: <06jkAzl3X5Y61E07bhzOjGvNWEmrprldqY3XNfAlvJE=.db3b52ca-97db-40da-a023-441209cdca2e@github.com> References: <06jkAzl3X5Y61E07bhzOjGvNWEmrprldqY3XNfAlvJE=.db3b52ca-97db-40da-a023-441209cdca2e@github.com> Message-ID: On Fri, 20 Oct 2023 20:39:42 GMT, Dean Long wrote: > This looks much better. Is the cross modify fence always needed, or can it be conditional based on nmethod_patching_type()? It could be conditional, but the patching type abstraction is currently platform specific, and here we are in shared code. I'm not sure if elevating that to the shared level is worth it, given that I found it not to impact performance. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1773379993 From dlong at openjdk.org Fri Oct 20 21:03:39 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 20 Oct 2023 21:03:39 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v3] In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 16:03:08 GMT, Erik ?sterlund wrote: >> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). >> In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. >> >> The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. >> >> I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Move cross modify fence to the runtime code > - Merge branch 'master' into 8310239_cross_modify_nmethod_entry > - Typo in comment > - 8310239: Add missing cross modifying fence in nmethod entry barriers If there is no impact on performance, especially on aarch64, then I'm OK with it as is, or maybe with a comment in the code saying it could be made conditional if needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1773388444 From iklam at openjdk.org Fri Oct 20 23:04:47 2023 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 20 Oct 2023 23:04:47 GMT Subject: RFR: 8318484: Initial version of cdsConfig.hpp [v2] In-Reply-To: References: Message-ID: > This is the first step for [JDK-8318483 - Move CDS configuration management into cdsConfig.hpp](https://bugs.openjdk.org/browse/JDK-8318483) > > - Remove `Arguments::is_dumping_archive()` and `Arguments assert_is_dumping_archive()` > - Add the following new APIs > > > class CDSConfig { > static bool is_dumping_archive(); > static bool is_dumping_static_archive(); > static bool is_dumping_dynamic_archive(); > static bool is_dumping_heap(); > }; > > > - Convert some use of `DumpSharedSpaces` and `DynamicDumpSharedSpaces` to these new APIs > > (More APIs will be added in future sub tasks of [JDK-8318483](https://bugs.openjdk.org/browse/JDK-8318483)) Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8318484-initial-version-of-cdsConfig-hpp - 8318484: Initial version of cdsConfig.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16257/files - new: https://git.openjdk.org/jdk/pull/16257/files/0a2f78b0..0d729778 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16257&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16257&range=00-01 Stats: 5386 lines in 163 files changed: 3808 ins; 820 del; 758 mod Patch: https://git.openjdk.org/jdk/pull/16257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16257/head:pull/16257 PR: https://git.openjdk.org/jdk/pull/16257 From jvernee at openjdk.org Sat Oct 21 12:04:10 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Sat, 21 Oct 2023 12:04:10 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v11] In-Reply-To: References: Message-ID: <2mpVRiA1idUeB0AN8eAtghk_sGFu90tZyTvpYOPaBq4=.72ab5bb9-3095-4352-9aa6-9e59151e482e@github.com> > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 46 commits: - Merge branch 'master' into AllowHeapNoLock - bump up argument counts in TestLargeStub to their maximum - s390 updates - add stub size stress test for allowHeap - RISC-V impl - remove leftover debug log line - add s390 support - add PPC impl - add missing file - Add xor benchmark - ... and 36 more: https://git.openjdk.org/jdk/compare/a876beb6...2e00beff ------------- Changes: https://git.openjdk.org/jdk/pull/16201/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=10 Stats: 2699 lines in 74 files changed: 1712 ins; 692 del; 295 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From duke at openjdk.org Sat Oct 21 15:05:57 2023 From: duke at openjdk.org (Ismael Juma) Date: Sat, 21 Oct 2023 15:05:57 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v21] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <0Yd0aqiJQYtnSe_WPTMt5RHHUtKlErdzZl6ooRm_FLs=.1e1e1ef2-4843-47c3-a546-87faa236c75e@github.com> On Thu, 6 Jul 2023 13:06:30 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'openjdk:master' into rematerialization-of-merges > - Addressing PR feedback. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Merge branch 'openjdk:master' into rematerialization-of-merges > - Rome minor refactorings. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > Catching up with master. > - Address PR review 6: debug format output & some refactoring. > - Catching up with master branch. > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address PR review 6: refactoring around rematerialization & improve test cases. > - Address PR review 5: refactor on rematerialization & add tests. > - ... and 12 more: https://git.openjdk.org/jdk/compare/97e99f01...25b683d6 If I understand correctly, this change was backported to older Microsoft OpenJDK versions. Is there a plan to do the same for upstream? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1773821824 From iklam at openjdk.org Sat Oct 21 15:46:45 2023 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 21 Oct 2023 15:46:45 GMT Subject: RFR: 8318484: Initial version of cdsConfig.hpp [v2] In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 06:58:22 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into 8318484-initial-version-of-cdsConfig-hpp >> - 8318484: Initial version of cdsConfig.hpp > > Initial refactoring looks good. One query below. > > Thanks Thanks @dholmes-ora @calvinccheung @sspitsyn for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16257#issuecomment-1773834764 From iklam at openjdk.org Sat Oct 21 15:46:46 2023 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 21 Oct 2023 15:46:46 GMT Subject: Integrated: 8318484: Initial version of cdsConfig.hpp In-Reply-To: References: Message-ID: On Thu, 19 Oct 2023 05:56:53 GMT, Ioi Lam wrote: > This is the first step for [JDK-8318483 - Move CDS configuration management into cdsConfig.hpp](https://bugs.openjdk.org/browse/JDK-8318483) > > - Remove `Arguments::is_dumping_archive()` and `Arguments assert_is_dumping_archive()` > - Add the following new APIs > > > class CDSConfig { > static bool is_dumping_archive(); > static bool is_dumping_static_archive(); > static bool is_dumping_dynamic_archive(); > static bool is_dumping_heap(); > }; > > > - Convert some use of `DumpSharedSpaces` and `DynamicDumpSharedSpaces` to these new APIs > > (More APIs will be added in future sub tasks of [JDK-8318483](https://bugs.openjdk.org/browse/JDK-8318483)) This pull request has now been integrated. Changeset: ecd25e7d Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/ecd25e7d6f9d69f9dbdbff0a4a9b9d6b19288593 Stats: 236 lines in 36 files changed: 125 ins; 16 del; 95 mod 8318484: Initial version of cdsConfig.hpp Reviewed-by: dholmes, ccheung, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/16257 From dholmes at openjdk.org Mon Oct 23 00:21:37 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Oct 2023 00:21:37 GMT Subject: RFR: JDK-8318587: refresh libraries cache on AIX in print_vm_info In-Reply-To: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> References: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> Message-ID: On Fri, 20 Oct 2023 08:07:46 GMT, Matthias Baesken wrote: > print_vm_info outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. I don't like the fact this is in shared code either. I'm surprised to see that `VMError::print_vm_info` is not actually used to report VM errors! ??? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16284#issuecomment-1774245154 From dholmes at openjdk.org Mon Oct 23 02:31:35 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Oct 2023 02:31:35 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v6] In-Reply-To: References: <-ALHHMcYPfciG6g2sOT-XIEVTf1pA6XXa93eNXQamD4=.88329bf9-627b-4d78-93fc-299550fc2be0@github.com> Message-ID: On Fri, 20 Oct 2023 08:37:11 GMT, Afshin Zafari wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> first arg of `find` casted to `uint*` > > @dholmes-ora, any more comments? @afshin-zafari could you merge (not rebase!) with current master please to ensure I can see the absolute latest version of this patch in relation to master. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1774336591 From dholmes at openjdk.org Mon Oct 23 02:44:30 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Oct 2023 02:44:30 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v2] In-Reply-To: References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Fri, 20 Oct 2023 12:00:50 GMT, Aleksey Shipilev wrote: > ...and on the other hand, it requires platforms to implement 8-byte CAS: Surely the `supports_cx8` issue must have been raised when the template atomics were put in place. I need to go back and see what was said then, to see how we got in this mess. I can accept the current proposal doesn't make the mess worse, it just shone a light on the fact there is a mess. If ARMv6 is broken then maybe it needs to be dropped? (Not in this PR of course.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1774345744 From jkratochvil at openjdk.org Mon Oct 23 03:31:58 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 23 Oct 2023 03:31:58 GMT Subject: RFR: 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo [v2] In-Reply-To: References: Message-ID: <6KNR9g_eUuJS_Fyilrniq-qMhf2w3R41tcFOJirJ6Dk=.ca584320-efcd-4cb6-8e84-90c16cb3ca0c@github.com> > In OpenJDK project CRaC I had a [need to fetch new CpuidInfo without affecting the existing one](https://github.com/openjdk/crac/commit/ed4ad9ba31b77732dcede2eb743b2f389ec9a0fe#diff-6ed856c57ddbe33e49883adb7c52ec51ed377e5f697dfd6d8bea505a97bfc5a5R2743). > Which led me to encapsulate the object more and I think this no-functionality-change patch is even appropriate for JDK. > I am sure interested primarily to reduce the CRaC patchset boilerplate. Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Make CpuidInfo a class - Merge branch 'master' into flagsencaps - 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16093/files - new: https://git.openjdk.org/jdk/pull/16093/files/683e048b..92568384 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16093&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16093&range=00-01 Stats: 23513 lines in 841 files changed: 14994 ins; 4870 del; 3649 mod Patch: https://git.openjdk.org/jdk/pull/16093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16093/head:pull/16093 PR: https://git.openjdk.org/jdk/pull/16093 From dholmes at openjdk.org Mon Oct 23 05:00:35 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Oct 2023 05:00:35 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v3] In-Reply-To: References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Fri, 20 Oct 2023 12:03:55 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Protect 64-bit tests with supports_cx8() >From a related discussion in [JDK-8246770](https://bugs.openjdk.org/browse/JDK-8246770) I was under the impression that the kernel helper would be used to provide this support on ARMv6 - see code in ./cpu/arm/vm_version_arm_32.cpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1774437060 From dholmes at openjdk.org Mon Oct 23 05:43:29 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Oct 2023 05:43:29 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v3] In-Reply-To: References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Fri, 20 Oct 2023 12:03:55 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Protect 64-bit tests with supports_cx8() Okay part of the (my) confusion here is that `supports_cx8` is mainly intended for protecting Java long/double variables - see the descriptions and mechanics in ./share/oops/accessBackend.*. For C++ atomic ops we've typically steered the VM code away from needing 64-bit support if it doesn't natively exist as all accesses to such variables needs to work in with whatever locking scheme is used. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1774470632 From dholmes at openjdk.org Mon Oct 23 06:04:38 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Oct 2023 06:04:38 GMT Subject: RFR: 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo [v2] In-Reply-To: <6KNR9g_eUuJS_Fyilrniq-qMhf2w3R41tcFOJirJ6Dk=.ca584320-efcd-4cb6-8e84-90c16cb3ca0c@github.com> References: <6KNR9g_eUuJS_Fyilrniq-qMhf2w3R41tcFOJirJ6Dk=.ca584320-efcd-4cb6-8e84-90c16cb3ca0c@github.com> Message-ID: On Mon, 23 Oct 2023 03:31:58 GMT, Jan Kratochvil wrote: >> In OpenJDK project CRaC I had a [need to fetch new CpuidInfo without affecting the existing one](https://github.com/openjdk/crac/commit/ed4ad9ba31b77732dcede2eb743b2f389ec9a0fe#diff-6ed856c57ddbe33e49883adb7c52ec51ed377e5f697dfd6d8bea505a97bfc5a5R2743). >> Which led me to encapsulate the object more and I think this no-functionality-change patch is even appropriate for JDK. >> I am sure interested primarily to reduce the CRaC patchset boilerplate. > > Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Make CpuidInfo a class > - Merge branch 'master' into flagsencaps > - 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo src/hotspot/cpu/x86/vm_version_x86.hpp line 527: > 525: }; > 526: > 527: class CpuidInfo : public _CpuidInfo { Why not just declare the original `CpuidInfo` as a class instead of extending the struct ??? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16093#discussion_r1368163276 From dholmes at openjdk.org Mon Oct 23 06:14:39 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Oct 2023 06:14:39 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v5] In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 12:49:46 GMT, Johan Sj?len wrote: >> I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? >> >> 1. Moved all the nmt source code from services/ to nmt/ >> 2. Renamed all the include statements and sorted them >> 3. Fixed the include guards > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix messed up include Still good. (Your merge commit seems to have confused GitHub for some reason.) ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16276#pullrequestreview-1691770241 From jkratochvil at openjdk.org Mon Oct 23 06:30:38 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 23 Oct 2023 06:30:38 GMT Subject: RFR: 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo [v2] In-Reply-To: References: <6KNR9g_eUuJS_Fyilrniq-qMhf2w3R41tcFOJirJ6Dk=.ca584320-efcd-4cb6-8e84-90c16cb3ca0c@github.com> Message-ID: On Mon, 23 Oct 2023 06:01:36 GMT, David Holmes wrote: >> Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Make CpuidInfo a class >> - Merge branch 'master' into flagsencaps >> - 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo > > src/hotspot/cpu/x86/vm_version_x86.hpp line 527: > >> 525: }; >> 526: >> 527: class CpuidInfo : public _CpuidInfo { > > Why not just declare the original `CpuidInfo` as a class instead of extending the struct ??? This way the members are always zero-initialized. Which simplifies existing code -VM_Version::CpuidInfo VM_Version::_cpuid_info = { 0, }; +VM_Version::CpuidInfo VM_Version::_cpuid_info; And makes it more foolproof - therefore also fixing an existing bug (I haven't been aware of so far) in my code in the [CRaC](https://openjdk.org/projects/crac/) branch. Although the ` = { 0, }` initialization above was not needed as it is a static member anyway. That means I should change the name of this patch as it is no longer just a "refactor"-ization. Or should I remove the zero-initialization? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16093#discussion_r1368180078 From jwaters at openjdk.org Mon Oct 23 07:00:48 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 23 Oct 2023 07:00:48 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn Message-ID: On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset ------------- Commit messages: - 8304939 Changes: https://git.openjdk.org/jdk/pull/16303/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304939 Stats: 39 lines in 5 files changed: 11 ins; 8 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/16303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16303/head:pull/16303 PR: https://git.openjdk.org/jdk/pull/16303 From jsjolen at openjdk.org Mon Oct 23 08:34:59 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 23 Oct 2023 08:34:59 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v6] In-Reply-To: References: Message-ID: > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge remote-tracking branch 'origin/master' into move-nmt - Fix messed up include - Missed this include - Merge remote-tracking branch 'origin/master' into move-nmt - Fixed reviewed changes - Move NMT to its own subdirectory ------------- Changes: https://git.openjdk.org/jdk/pull/16276/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=05 Stats: 507 lines in 102 files changed: 214 ins; 212 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/16276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16276/head:pull/16276 PR: https://git.openjdk.org/jdk/pull/16276 From stefank at openjdk.org Mon Oct 23 08:41:39 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 23 Oct 2023 08:41:39 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v5] In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 12:49:46 GMT, Johan Sj?len wrote: >> I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? >> >> 1. Moved all the nmt source code from services/ to nmt/ >> 2. Renamed all the include statements and sorted them >> 3. Fixed the include guards > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix messed up include Changes requested by stefank (Reviewer). src/hotspot/share/nmt/memTracker.inline.hpp line 30: > 28: > 29: #include "nmt/mallocTracker.inline.hpp" > 30: #include "nmt/memTracker.hpp" Bonus points if you fix the include for this .inline.hpp file. Suggestion: #include "nmt/memTracker.hpp" #include "nmt/mallocTracker.inline.hpp" src/hotspot/share/nmt/nmtPreInit.hpp line 35: > 33: #include "utilities/macros.hpp" > 34: > 35: #ifdef ASSERT The blank line at 34 is not following the style for our conditional includes. Remove it, or better yet skip conventionalize the include of runtime/atomic.hpp since it just adds to noise to the file. ------------- PR Review: https://git.openjdk.org/jdk/pull/16276#pullrequestreview-1691967144 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1368296670 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1368299406 From stefank at openjdk.org Mon Oct 23 08:44:42 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 23 Oct 2023 08:44:42 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v6] In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 08:34:59 GMT, Johan Sj?len wrote: >> I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? >> >> 1. Moved all the nmt source code from services/ to nmt/ >> 2. Renamed all the include statements and sorted them >> 3. Fixed the include guards > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'origin/master' into move-nmt > - Fix messed up include > - Missed this include > - Merge remote-tracking branch 'origin/master' into move-nmt > - Fixed reviewed changes > - Move NMT to its own subdirectory Changes requested by stefank (Reviewer). src/hotspot/share/services/mallocLimit.cpp line 32: > 30: #include "runtime/globals.hpp" > 31: #include "services/mallocLimit.hpp" > 32: #include "nmt/nmtCommon.hpp" Sort order test/hotspot/gtest/nmt/test_nmt_cornercases.cpp line 30: > 28: #include "nmt/mallocTracker.hpp" > 29: #include "runtime/os.hpp" > 30: #include "nmt/memTracker.hpp" Sort order ------------- PR Review: https://git.openjdk.org/jdk/pull/16276#pullrequestreview-1692002562 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1368319264 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1368318877 From stefank at openjdk.org Mon Oct 23 08:44:43 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 23 Oct 2023 08:44:43 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 11:31:11 GMT, Johan Sj?len wrote: > For the gtest source files I separated the includes in a consistent manner, they all look like this pattern now: That's not what I see in the latest patch. Could you revert that separation and then we can consider that style change in a separate RFE? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16276#issuecomment-1774703929 From thartmann at openjdk.org Mon Oct 23 09:03:37 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 23 Oct 2023 09:03:37 GMT Subject: RFR: 8317689: [JVMCI] include error message when CreateJavaVM in libgraal fails [v2] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 21:11:41 GMT, Doug Simon wrote: >> Creating a new libgraal isolate can fail for a number of reasons. Currently, all that one sees on such a failure is a numeric error code. For example: >> >> >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024) >> >> >> Native Image has been [enhanced](https://github.com/oracle/graal/blob/14ca57efd35941a3b60c6224285ad8153f77059c/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jni/functions/JNIInvocationInterface.java#L209-L214) to return an error message along with an error code by a non-standard `_createvm_errorstr` argument passed to the `CreateJavaVM` JNI invocation interface function: >> >> >> |--------------------|-----------------------------------------------------------------------------------| >> | _createvm_errorstr | extraInfo is a "const char**" value. | >> | | If CreateJavaVM returns non-zero, then extraInfo is assigned a newly malloc'ed | >> | | 0-terminated C string describing the error if a description is available, | >> | | otherwise extraInfo is set to null. | >> |--------------------|-----------------------------------------------------------------------------------| >> >> >> This PR updates JVMCI to take advantage of this Native Image enhancement. >> >> This is sample `-XX:+PrintCompilation` output from testing this PR on libgraal: >> >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024, Image page size is incompatible with run-time page size. Rebuild image with -H:PageSize=[pagesize] to set appropriately.) > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > renamed _strerror to _createvm_errorstr Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16086#pullrequestreview-1692043289 From xgong at openjdk.org Mon Oct 23 09:05:28 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 23 Oct 2023 09:05:28 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 09:45:01 GMT, Andrew Haley wrote: > This looks good. As far as I can tell the choice you've made of accuracy matches what we need to meet the spec. I'm very nervous about binding ourselves to a specific version of the SLEEF ABI, because Java releases are maintained for decades, and we don't want to be dependent on other projects. > > We'll have to make a plan for version evolution. Thanks! I agree that this is a somewhat short term close-the-gap solution, and bundling a good library makes more sense. Also, as John mentioned, implementing those math APIs using Vector API itself, sounds very desirable in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1774740285 From xgong at openjdk.org Mon Oct 23 09:05:30 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 23 Oct 2023 09:05:30 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: References: <6OkPww11l85K0HiU01s6TE6n-1PLTHjZqHSJ1CTSJU8=.f1931736-312f-4c15-915e-25b2a6352c73@github.com> Message-ID: On Wed, 18 Oct 2023 09:37:52 GMT, Andrew Haley wrote: >> Sounds reasonable. Thanks a lot for the reminder! > > Hard-coding the libsleef ABI version into OpenJDK is a code smell. For now I suppose it'll do, but we need a better strategy going forward, perhaps involving a bundled library. Yes, that's true. I will consider it. Thanks for the advice! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1368348767 From xgong at openjdk.org Mon Oct 23 09:10:06 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 23 Oct 2023 09:10:06 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: > Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). > > SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. > > To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. > > Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. > > [1] https://github.com/openjdk/jdk/pull/3638 > [2] https://sleef.org/ > [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ > [4] https://packages.debian.org/bookworm/libsleef3 > [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Disable sleef by default - Merge 'jdk:master' into JDK-8312425 - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16234/files - new: https://git.openjdk.org/jdk/pull/16234/files/2a7730d6..f2098d4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16234&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16234&range=00-01 Stats: 22082 lines in 862 files changed: 14460 ins; 3885 del; 3737 mod Patch: https://git.openjdk.org/jdk/pull/16234.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16234/head:pull/16234 PR: https://git.openjdk.org/jdk/pull/16234 From xgong at openjdk.org Mon Oct 23 09:10:06 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 23 Oct 2023 09:10:06 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: <6OkPww11l85K0HiU01s6TE6n-1PLTHjZqHSJ1CTSJU8=.f1931736-312f-4c15-915e-25b2a6352c73@github.com> References: <6OkPww11l85K0HiU01s6TE6n-1PLTHjZqHSJ1CTSJU8=.f1931736-312f-4c15-915e-25b2a6352c73@github.com> Message-ID: On Wed, 18 Oct 2023 08:18:45 GMT, Xiaohong Gong wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8555: >> >>> 8553: } else { >>> 8554: if (FLAG_IS_DEFAULT(UseSleefLib)) { >>> 8555: log_info(library)("Fail to load sleef library!"); >> >> The library name being looked up is probably useful here too. > > Thanks for the review! I will address this in next commit. It is disabled (i.e. `""`) by default. So the library name is not needed here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1368354227 From shade at openjdk.org Mon Oct 23 09:13:40 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Oct 2023 09:13:40 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v3] In-Reply-To: References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Fri, 20 Oct 2023 12:03:55 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Protect 64-bit tests with supports_cx8() Yes, it is confusing. `Atomic` docs still mention `supports_cx8` as the pre-check for using 64-bit atomics, which gets extra confusing if it was only intended to serve as availability check for Java accesses: https://github.com/openjdk/jdk/blob/729f4c5d141cdc272249c4c69efd05f96a654137/src/hotspot/share/runtime/atomic.hpp#L58-L62 So I would like to go forward with this PR. The only question that remains for me is whether to protect the gtest cases with `supports_cx8` (satisfying the `Atomic` contract), or drop the checks there, thus giving the early warning for platform maintainers that they are actually expected to implement 64-bit atomics regardless. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1774754901 From eosterlund at openjdk.org Mon Oct 23 09:35:55 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 23 Oct 2023 09:35:55 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v4] In-Reply-To: References: Message-ID: > In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). > In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. > > The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. > > I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14543/files - new: https://git.openjdk.org/jdk/pull/14543/files/034c94be..f3c15b91 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14543&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14543&range=02-03 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14543.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14543/head:pull/14543 PR: https://git.openjdk.org/jdk/pull/14543 From eosterlund at openjdk.org Mon Oct 23 09:35:57 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 23 Oct 2023 09:35:57 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v3] In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 21:00:49 GMT, Dean Long wrote: > If there is no impact on performance, especially on aarch64, then I'm OK with it as is, or maybe with a comment in the code saying it could be made conditional if needed. Thank you. I added a comment describing that it can be made conditional if needed in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1774793777 From azafari at openjdk.org Mon Oct 23 09:41:42 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 23 Oct 2023 09:41:42 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v6] In-Reply-To: References: <-ALHHMcYPfciG6g2sOT-XIEVTf1pA6XXa93eNXQamD4=.88329bf9-627b-4d78-93fc-299550fc2be0@github.com> Message-ID: <7kjUHAm2miHLca5yBML-XS86qncel6Bwne7gLbDitZI=.ed191be2-d263-4fed-807a-55489e86c0db@github.com> On Fri, 20 Oct 2023 08:37:11 GMT, Afshin Zafari wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> first arg of `find` casted to `uint*` > > @dholmes-ora, any more comments? > @afshin-zafari could you merge (not rebase!) with current master please to ensure I can see the absolute latest version of this patch in relation to master. Thanks. It is up to date with master and nothing to push. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1774803631 From eosterlund at openjdk.org Mon Oct 23 09:43:39 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 23 Oct 2023 09:43:39 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v3] In-Reply-To: References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: <3p9AqKonDPrvTKZw69QHvGDRychY1m9a5qdM8xY76-0=.51241702-8489-4592-a54b-78c30344f036@github.com> On Mon, 23 Oct 2023 09:10:50 GMT, Aleksey Shipilev wrote: > Yes, it is confusing. `Atomic` docs still mention `supports_cx8` as the pre-check for using 64-bit atomics, which gets extra confusing if it was only intended to serve as availability check for Java accesses: > > https://github.com/openjdk/jdk/blob/729f4c5d141cdc272249c4c69efd05f96a654137/src/hotspot/share/runtime/atomic.hpp#L58-L62 > > So I would like to go forward with this PR. The only question that remains for me is whether to protect the gtest cases with `supports_cx8` (satisfying the `Atomic` contract), or drop the checks there, thus giving the early warning for platform maintainers that they are actually expected to implement 64-bit atomics regardless. It would be nice if Atomic used a PlatformMutex or something internally, for platforms that don't have supports_cx8, so we could just assume that this stuff works across all platforms, just maybe "a bit" slower on some platforms. If we wanted to be really fancy, we could have an array that is say 8 * number of CPUs large, where addresses are hashed to a PlatformMutex to reduce contention a bit. But perhaps that's more than you bargained for with this PR! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1774807729 From kbarrett at openjdk.org Mon Oct 23 09:50:39 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 23 Oct 2023 09:50:39 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v3] In-Reply-To: References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Fri, 20 Oct 2023 12:03:55 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Protect 64-bit tests with supports_cx8() Throughout, Atomic::PlatformMumble is intended to be used in the implementation of Atomic::Mumble, and isn't intended to be used elsewhere. I think intent is even adhered to currently. This proposed change violates that in a bunch of places. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16252#pullrequestreview-1692131480 From simonis at openjdk.org Mon Oct 23 10:41:40 2023 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 23 Oct 2023 10:41:40 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v3] In-Reply-To: References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Mon, 23 Oct 2023 09:47:21 GMT, Kim Barrett wrote: > Throughout, Atomic::PlatformMumble is intended to be used in the implementation of Atomic::Mumble, and isn't intended to be used elsewhere. I think intent is even adhered to currently. This proposed change violates that in a bunch of places. @kimbarrett, do you suggest to use `Atomic::cmpxchg()` instead of `PlatformCmpxchg()` in `Atomic::XchgUsingCmpxchg::operator()` and `Atomic::AddUsingCmpxchg::fetch_then_add()`? I think that would be reasonable. Otherwise I don't understand your comment as this PR follows the same pattern you've used when introducing [atomic bitset functions](https://bugs.openjdk.org/browse/JDK-8293117). Or am I missing something? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1774911476 From sjohanss at openjdk.org Mon Oct 23 10:43:42 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 23 Oct 2023 10:43:42 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v7] In-Reply-To: <-p-qDbsbPtk_7h79043ct8sTCcM6d-TiD62zh4c7Q0Q=.9691fe3d-0a85-4f6f-a876-9761a240f866@github.com> References: <-p-qDbsbPtk_7h79043ct8sTCcM6d-TiD62zh4c7Q0Q=.9691fe3d-0a85-4f6f-a876-9761a240f866@github.com> Message-ID: On Fri, 20 Oct 2023 05:54:06 GMT, Liming Liu wrote: >> As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). >> >> Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: >> >> >> >> >> >> >> >> >> >> >> >>
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
> > Liming Liu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Make the jtreg test check the usage of THP I did check `numa_maps` and `smap` under `/proc/{PID}` when allocating a very large object with and without pretouching. And without pretouching the pages are not marked as dirty in `numa_maps` before the actual allocation and in `smaps` the `Private_Hugetlb:` field was not covering the whole heap when pretouch was not used and increased to the amount of the allocation once it happened. I also measured the time for doing the large allocation. There were some run to run variations but not big with large pages turned on. +UseLargePages +AlwaysPreTouch: 444ms +UseLargePages -AlwaysPreTouch: 474ms -UseLargePages +AlwaysPreTouch: 450ms (much higher variation) -UseLargePages -AlwaysPreTouch: 1.3s (also higher variation) So from what I can tell it is really needed @tstuefe. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1774915217 From kbarrett at openjdk.org Mon Oct 23 10:51:35 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 23 Oct 2023 10:51:35 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v3] In-Reply-To: References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Mon, 23 Oct 2023 10:38:44 GMT, Volker Simonis wrote: > > Throughout, Atomic::PlatformMumble is intended to be used in the implementation of Atomic::Mumble, and isn't intended to be used elsewhere. I think intent is even adhered to currently. This proposed change violates that in a bunch of places. > > @kimbarrett, do you suggest to use `Atomic::cmpxchg()` instead of `PlatformCmpxchg()` in `Atomic::XchgUsingCmpxchg::operator()` and `Atomic::AddUsingCmpxchg::fetch_then_add()`? I think that would be reasonable. Otherwise I don't understand your comment as this PR follows the same pattern you've used when introducing [atomic bitset functions](https://bugs.openjdk.org/browse/JDK-8293117). Or am I missing something? Yes, and Atomic::load instead of PlatformLoad. So actually follow the model of the atomic bitset functions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1774927356 From stuefe at openjdk.org Mon Oct 23 10:57:41 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 23 Oct 2023 10:57:41 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v7] In-Reply-To: References: <-p-qDbsbPtk_7h79043ct8sTCcM6d-TiD62zh4c7Q0Q=.9691fe3d-0a85-4f6f-a876-9761a240f866@github.com> Message-ID: <9d6_walPOa7qdvzzavri4IwjxTG8ZmzpKZn530El-fA=.4e56f61c-c011-401d-8dc5-9ac324c2487a@github.com> On Mon, 23 Oct 2023 10:41:03 GMT, Stefan Johansson wrote: > I did check `numa_maps` and `smap` under `/proc/{PID}` when allocating a very large object with and without pretouching. And without pretouching the pages are not marked as dirty in `numa_maps` before the actual allocation and in `smaps` the `Private_Hugetlb:` field was not covering the whole heap when pretouch was not used and increased to the amount of the allocation once it happened. > > I also measured the time for doing the large allocation. There were some run to run variations but not big with large pages turned on. > > ``` > +UseLargePages +AlwaysPreTouch: 444ms > +UseLargePages -AlwaysPreTouch: 474ms > -UseLargePages +AlwaysPreTouch: 450ms (much higher variation) > -UseLargePages -AlwaysPreTouch: 1.3s (also higher variation) > ``` > > So from what I can tell it is really needed @tstuefe. Oh, okay. Good to know. Thank you for verifying. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1774935740 From mcimadamore at openjdk.org Mon Oct 23 11:20:43 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 23 Oct 2023 11:20:43 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v11] In-Reply-To: <2mpVRiA1idUeB0AN8eAtghk_sGFu90tZyTvpYOPaBq4=.72ab5bb9-3095-4352-9aa6-9e59151e482e@github.com> References: <2mpVRiA1idUeB0AN8eAtghk_sGFu90tZyTvpYOPaBq4=.72ab5bb9-3095-4352-9aa6-9e59151e482e@github.com> Message-ID: <4sBQQH1b4696lwpEUQHfClgixwFfXnhoS-_ERbEwzS0=.1ae52882-4b5e-43a9-abbd-7cde799285d5@github.com> On Sat, 21 Oct 2023 12:04:10 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 46 commits: > > - Merge branch 'master' into AllowHeapNoLock > - bump up argument counts in TestLargeStub to their maximum > - s390 updates > - add stub size stress test for allowHeap > - RISC-V impl > - remove leftover debug log line > - add s390 support > - add PPC impl > - add missing file > - Add xor benchmark > - ... and 36 more: https://git.openjdk.org/jdk/compare/a876beb6...2e00beff Looking more holistically at the Linker javadoc, there seems to be something missing in that we never say that passing heap segments to downcall is not supported? We have clarified the documentation w.r.t. by-value structs here: https://github.com/openjdk/panama-foreign/pull/881 But I can't find anything for by-reference structs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1774971391 From jvernee at openjdk.org Mon Oct 23 11:32:35 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 23 Oct 2023 11:32:35 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v11] In-Reply-To: <2mpVRiA1idUeB0AN8eAtghk_sGFu90tZyTvpYOPaBq4=.72ab5bb9-3095-4352-9aa6-9e59151e482e@github.com> References: <2mpVRiA1idUeB0AN8eAtghk_sGFu90tZyTvpYOPaBq4=.72ab5bb9-3095-4352-9aa6-9e59151e482e@github.com> Message-ID: On Sat, 21 Oct 2023 12:04:10 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 46 commits: > > - Merge branch 'master' into AllowHeapNoLock > - bump up argument counts in TestLargeStub to their maximum > - s390 updates > - add stub size stress test for allowHeap > - RISC-V impl > - remove leftover debug log line > - add s390 support > - add PPC impl > - add missing file > - Add xor benchmark > - ... and 36 more: https://git.openjdk.org/jdk/compare/a876beb6...2e00beff > Looking more holistically at the Linker javadoc, there seems to be something missing in that we never say that passing heap segments to downcall is not supported? We have clarified the documentation w.r.t. by-value structs here: > > [openjdk/panama-foreign#881](https://github.com/openjdk/panama-foreign/pull/881) > > But I can't find anything for by-reference structs. We added the check as part of: https://github.com/openjdk/panama-foreign/pull/737 I don't see any linker doc update there. I don't think we have anything. We could add a line to `donwcallHandle` that says that the returned handle throws an `IllegalArgumentException` if a heap segment is passed, unless heap segments are explicitly allowed through the `critical(true)` option. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1774989773 From mcimadamore at openjdk.org Mon Oct 23 11:38:36 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 23 Oct 2023 11:38:36 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v11] In-Reply-To: References: <2mpVRiA1idUeB0AN8eAtghk_sGFu90tZyTvpYOPaBq4=.72ab5bb9-3095-4352-9aa6-9e59151e482e@github.com> Message-ID: On Mon, 23 Oct 2023 11:29:28 GMT, Jorn Vernee wrote: > We could add a line to `donwcallHandle` that says that the returned handle throws an `IllegalArgumentException` if a heap segment is passed, unless heap segments are explicitly allowed through the `critical(true)` option. Something like that would be ok, yes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1774999918 From shade at openjdk.org Mon Oct 23 14:17:58 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Oct 2023 14:17:58 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v4] In-Reply-To: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> > See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. > > Unfortunately, we cannot test these apart from the existing gtest. > > Additional testing: > - [x] linux-x86-server-fastdebug, atomic tests pass > - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Use public methods instead of Platform* ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16252/files - new: https://git.openjdk.org/jdk/pull/16252/files/e7677da5..bbc64206 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16252&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16252&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16252.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16252/head:pull/16252 PR: https://git.openjdk.org/jdk/pull/16252 From shade at openjdk.org Mon Oct 23 14:29:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Oct 2023 14:29:39 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v3] In-Reply-To: <3p9AqKonDPrvTKZw69QHvGDRychY1m9a5qdM8xY76-0=.51241702-8489-4592-a54b-78c30344f036@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> <3p9AqKonDPrvTKZw69QHvGDRychY1m9a5qdM8xY76-0=.51241702-8489-4592-a54b-78c30344f036@github.com> Message-ID: On Mon, 23 Oct 2023 09:40:47 GMT, Erik ?sterlund wrote: > It would be nice if Atomic used a PlatformMutex or something internally, for platforms that don't have supports_cx8, so we could just assume that this stuff works across all platforms, just maybe "a bit" slower on some platforms. Yes, providing the locked implementation was my initial intent as well, but now that I looked into it, it seems to be too much hassle for little, if any, gain. In other words, we can already assume 8-byte atomics work on all platforms we build for. Even Zero -- which runs on obscurest platforms imaginable -- goes into GCC built-ins for `PlatformCmpxchg<8>`, `PlatformXchg<8>` and `PlatformAdd<8>`! So there is already a better path than locking. Note that I only found ARMv6 "issue" due to a test bug. `PlatformXchg<8>` is actually implemented for ARM v6 with the help of kernel helpers. So if we do locking/gcc-builtins implementation for the case where kernel helpers are not available, it would require some massaging for how ARM32 platform atomics are arranged. From which I hope to weasel out in this PR :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1775326377 From shade at openjdk.org Mon Oct 23 14:29:42 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Oct 2023 14:29:42 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v4] In-Reply-To: <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> Message-ID: <7tmOeoOhzsuYhs_A3WwnAeiuHSOU_r8DyLbGTCEGNX0=.8c681160-da63-4d1a-a480-cf2dafd04a51@github.com> On Mon, 23 Oct 2023 14:17:58 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Use public methods instead of Platform* > Yes, and Atomic::load instead of PlatformLoad. So actually follow the model of the atomic bitset functions. Right. Done in new commit. The previous version was from the time I did these fallbacks straight in platform atomic files. But in the shared code, we should definitely go for the non-platform methods. I am re-running tests, but expect no trouble. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1775329068 From mdoerr at openjdk.org Mon Oct 23 15:05:43 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 23 Oct 2023 15:05:43 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: <0sUz142mQSpOZo16aXReYJIC5DifiEeqA8XNsT1LDww=.6e36c9a4-dcd9-46c6-bd54-485af0e01349@github.com> References: <0sUz142mQSpOZo16aXReYJIC5DifiEeqA8XNsT1LDww=.6e36c9a4-dcd9-46c6-bd54-485af0e01349@github.com> Message-ID: On Fri, 13 Oct 2023 20:53:48 GMT, Dean Long wrote: >> Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Pass may_be_unordered information to lightweight_unlock. >> - Merge remote-tracking branch 'origin' into 8316746_lock_stack >> - Add x86_64 and aarch64 implementation. >> - 8316746: Top of lock-stack does not match the unlocked object > > If the locks are inflated then you won't hit the top of stack check in the fast path. > Can you reproduce the StepEvent problem with C1 using -XX:TieredStopAtLevel=3? @dean-long: I have attached a synthetic reproducer to the JBS issue which simply checks the lock order when reaching the OSR entry. I'd appreciate it if you could take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1775409983 From mli at openjdk.org Mon Oct 23 15:51:44 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 23 Oct 2023 15:51:44 GMT Subject: RFR: 8318222: RISC-V: C2 CmpU3 Message-ID: Hi, Can you review the change to add intrinsic for CmpU3 and CmpUL3? Thanks! ## Test ### functionality pass jtreg test: jdk/java/lang/Long/Unsigned.java, jdk/java/lang/Integer/Unsigned.java ### performance #### Long **before**: Benchmark (size) Mode Cnt Score Error Units Longs.compareUnsignedDirect 500 avgt 5 1454.789 ? 129.557 ns/op Longs.compareUnsignedIndirect 500 avgt 5 1410.146 ? 120.017 ns/op **after**: Benchmark (size) Mode Cnt Score Error Units Longs.compareUnsignedDirect 500 avgt 5 1286.129 ? 8.441 ns/op Longs.compareUnsignedIndirect 500 avgt 5 993.490 ? 0.840 ns/op #### Integer **before**: Benchmark (size) Mode Cnt Score Error Units Integers.compareUnsignedDirect 500 avgt 5 1611.753 ? 0.700 ns/op Integers.compareUnsignedIndirect 500 avgt 5 1775.093 ? 1.520 ns/op **after**: Benchmark (size) Mode Cnt Score Error Units Integers.compareUnsignedDirect 500 avgt 5 1159.351 ? 0.601 ns/op Integers.compareUnsignedIndirect 500 avgt 5 776.185 ? 0.924 ns/op ------------- Commit messages: - indent - rename from 'ui' to 'uw' - Merge branch 'master' into compareUnsigned - fix indent - Initial commit Changes: https://git.openjdk.org/jdk/pull/16314/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16314&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318222 Stats: 67 lines in 3 files changed: 63 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16314.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16314/head:pull/16314 PR: https://git.openjdk.org/jdk/pull/16314 From never at openjdk.org Mon Oct 23 16:44:35 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 23 Oct 2023 16:44:35 GMT Subject: RFR: 8317689: [JVMCI] include error message when CreateJavaVM in libgraal fails [v2] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 21:11:41 GMT, Doug Simon wrote: >> Creating a new libgraal isolate can fail for a number of reasons. Currently, all that one sees on such a failure is a numeric error code. For example: >> >> >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024) >> >> >> Native Image has been [enhanced](https://github.com/oracle/graal/blob/14ca57efd35941a3b60c6224285ad8153f77059c/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jni/functions/JNIInvocationInterface.java#L209-L214) to return an error message along with an error code by a non-standard `_createvm_errorstr` argument passed to the `CreateJavaVM` JNI invocation interface function: >> >> >> |--------------------|-----------------------------------------------------------------------------------| >> | _createvm_errorstr | extraInfo is a "const char**" value. | >> | | If CreateJavaVM returns non-zero, then extraInfo is assigned a newly malloc'ed | >> | | 0-terminated C string describing the error if a description is available, | >> | | otherwise extraInfo is set to null. | >> |--------------------|-----------------------------------------------------------------------------------| >> >> >> This PR updates JVMCI to take advantage of this Native Image enhancement. >> >> This is sample `-XX:+PrintCompilation` output from testing this PR on libgraal: >> >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024, Image page size is incompatible with run-time page size. Rebuild image with -H:PageSize=[pagesize] to set appropriately.) > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > renamed _strerror to _createvm_errorstr Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16086#pullrequestreview-1693027258 From rehn at openjdk.org Mon Oct 23 17:04:27 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 23 Oct 2023 17:04:27 GMT Subject: RFR: 8318222: RISC-V: C2 CmpU3 In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 15:45:39 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for CmpU3 and CmpUL3? > Thanks! > > ## Test > > ### functionality > pass jtreg test: > jdk/java/lang/Long/Unsigned.java, jdk/java/lang/Integer/Unsigned.java > > ### performance > #### Long > **before**: > Benchmark (size) Mode Cnt Score Error Units > Longs.compareUnsignedDirect 500 avgt 5 1454.789 ? 129.557 ns/op > Longs.compareUnsignedIndirect 500 avgt 5 1410.146 ? 120.017 ns/op > > **after**: > Benchmark (size) Mode Cnt Score Error Units > Longs.compareUnsignedDirect 500 avgt 5 1286.129 ? 8.441 ns/op > Longs.compareUnsignedIndirect 500 avgt 5 993.490 ? 0.840 ns/op > > #### Integer > **before**: > Benchmark (size) Mode Cnt Score Error Units > Integers.compareUnsignedDirect 500 avgt 5 1611.753 ? 0.700 ns/op > Integers.compareUnsignedIndirect 500 avgt 5 1775.093 ? 1.520 ns/op > > **after**: > Benchmark (size) Mode Cnt Score Error Units > Integers.compareUnsignedDirect 500 avgt 5 1159.351 ? 0.601 ns/op > Integers.compareUnsignedIndirect 500 avgt 5 776.185 ? 0.924 ns/op Thank you, looks good! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16314#pullrequestreview-1693066472 From dlong at openjdk.org Mon Oct 23 18:05:37 2023 From: dlong at openjdk.org (Dean Long) Date: Mon, 23 Oct 2023 18:05:37 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v4] In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 09:35:55 GMT, Erik ?sterlund wrote: >> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). >> In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. >> >> The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. >> >> I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14543#pullrequestreview-1693183713 From simonis at openjdk.org Mon Oct 23 18:32:30 2023 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 23 Oct 2023 18:32:30 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v4] In-Reply-To: <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> Message-ID: On Mon, 23 Oct 2023 14:17:58 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Use public methods instead of Platform* Thanks, looks good to me. ------------- Marked as reviewed by simonis (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16252#pullrequestreview-1693236508 From Matthew.Carter at microsoft.com Mon Oct 23 18:59:32 2023 From: Matthew.Carter at microsoft.com (Mat Carter) Date: Mon, 23 Oct 2023 18:59:32 +0000 Subject: RFR: 8317562: [JFR] Compilation queue statistics [v2] In-Reply-To: <1hW4B6WGVoZg_Dob78Xnwz33rME20-1HlINbOwY9YsM=.f33470c8-e33f-42ac-b98f-9f94078ca2c3@github.com> References: <-O9AczApq9UKq3h7GrInmyZ-5Eap0wuE-bGQqwOrySA=.a31a9076-4629-4c9b-8a0e-8fe2778b9617@github.com> <1hW4B6WGVoZg_Dob78Xnwz33rME20-1HlINbOwY9YsM=.f33470c8-e33f-42ac-b98f-9f94078ca2c3@github.com> Message-ID: If there are no concerns with the JFR event, it?s initial settings or the test, given that the small compiler changes have already been reviewed would someone be kind enough to review and sponsor the commit Thanks in advance Mat From: hotspot-jfr-dev on behalf of Mat Carter Date: Wednesday, October 18, 2023 at 11:12 AM To: hotspot-dev at openjdk.org , hotspot-jfr-dev at openjdk.org Subject: Re: RFR: 8317562: [JFR] Compilation queue statistics [v2] On Wed, 18 Oct 2023 17:35:59 GMT, Vladimir Kozlov wrote: >> Mat Carter has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed return type and changed NULL to nullptr > > src/hotspot/share/compiler/compileBroker.cpp line 530: > >> 528: return _c2_compile_queue; >> 529: } >> 530: > > Note, `*_compiler_queue` could be `nullptr` if VM is build without C2 or C1 or when run with `-XX:-TieredCompilation` (only C2 is used) or with `-XX:TierdStopAtLevel={1,2,3}` (only C1 is used). > > Make sure you check it in JFR event. Thank you! The JFR event does check for NULL (now nullptr) ------------- PR Review Comment: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.openjdk.org%2Fjdk%2Fpull%2F16211%23discussion_r1364317003&data=05%7C01%7Cmatthew.carter%40microsoft.com%7C4c80af72893e4fb6326408dbd005cf39%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638332495614878626%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5wvfj1sZg3QJyoivoKGyz7xD3gO1lG3wRhuqFpt7sJY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgu at openjdk.org Mon Oct 23 19:30:27 2023 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 23 Oct 2023 19:30:27 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v2] In-Reply-To: References: <39f-0nlOdQBABHr1cQOq7jITuRLzM9yLDUEUm1--0N8=.f3aeed11-7151-4885-8376-7d91ca84e8a7@github.com> Message-ID: On Thu, 19 Oct 2023 16:11:56 GMT, Leela Mohan Venati wrote: >> src/hotspot/share/gc/shenandoah/shenandoahVMOperations.cpp line 64: >> >>> 62: OopMapCache::cleanup_old_entries(); >>> 63: } >>> 64: >> >> Do you think, VM_ShenandoahFinalMarkStartEvac walks the stack roots. If yes, i recommend adding OopMapCache::cleanup_old_entries() in VM_ShenandoahOperation::doit_epilogue(). And this would make the change simple and also revert the change in this [PR](https://github.com/openjdk/jdk/pull/15921) > > I stand corrected. > > My question is still relevant >>> Do you think, VM_ShenandoahFinalMarkStartEvac walks the stack roots. > > My recommendation is incorrect. No, `VM_ShenandoahFinalMarkStartEvac ` does not walk the stack roots, it signals the end of mark phase. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16074#discussion_r1369175629 From zgu at openjdk.org Mon Oct 23 19:30:30 2023 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 23 Oct 2023 19:30:30 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v2] In-Reply-To: <39f-0nlOdQBABHr1cQOq7jITuRLzM9yLDUEUm1--0N8=.f3aeed11-7151-4885-8376-7d91ca84e8a7@github.com> References: <39f-0nlOdQBABHr1cQOq7jITuRLzM9yLDUEUm1--0N8=.f3aeed11-7151-4885-8376-7d91ca84e8a7@github.com> Message-ID: On Mon, 16 Oct 2023 19:56:52 GMT, Leela Mohan Venati wrote: >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup old oop map cache entry after class redefinition > > src/hotspot/share/oops/method.cpp line 311: > >> 309: void Method::mask_for(int bci, InterpreterOopMap* mask) { >> 310: methodHandle h_this(Thread::current(), this); >> 311: method_holder()->mask_for(h_this, bci, mask); > > Removing this condition allows all the threads including java threads to use/mutate oopMapCache. > > For ex: Java threads calls [JVM_CallStackWalk](https://github.com/openjdk/jdk/blob/741ae06c55de65dcdfe38e328022bd8dde4fa007/src/hotspot/share/prims/jvm.cpp#L586) which walks the stack and calls locals() and expressions [here](https://github.com/openjdk/jdk/blob/741ae06c55de65dcdfe38e328022bd8dde4fa007/src/hotspot/share/prims/stackwalk.cpp#L345) which access oopMapCache. The `oopMapCache` now is fully concurrent, it can be used/modified by Java threads. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16074#discussion_r1369176823 From matsaave at openjdk.org Mon Oct 23 19:53:54 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 23 Oct 2023 19:53:54 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v4] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into method_entry_8301997 - Added asserts for getters and fixed printing - Removed dead code in interpreters - Removed unused structures, improved set_method_handle and appendix_if_resolved - Removed some comments and relocated code - 8301997: Move method resolution information out of the cpCache ------------- Changes: https://git.openjdk.org/jdk/pull/15455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=03 Stats: 2925 lines in 64 files changed: 937 ins; 1542 del; 446 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From matsaave at openjdk.org Mon Oct 23 19:53:54 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 23 Oct 2023 19:53:54 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v3] In-Reply-To: References: <_s0uKVoQ0XjWB6GHNHUQ-rSCM4uVPgtQObwu_32MJz0=.cdb028d2-1b39-4774-978e-92f521169853@github.com> Message-ID: On Fri, 20 Oct 2023 08:49:22 GMT, Gui Cao wrote: > Hello, I'm preparing the riscv part of the support, but I noticed that the x86 backend is reporting errors. > > ``` > zifeihan at plct-c8:~/jdk$ make test TEST="tier1" JTREG="TIMEOUT_FACTOR=16" > Building target 'test' in configuration 'linux-x86_64-server-release' > Running tests using JTREG control variable 'TIMEOUT_FACTOR=16' > Test selection 'tier1', will run: > * jtreg:test/hotspot/jtreg:tier1 > * jtreg:test/jdk:tier1 > * jtreg:test/langtools:tier1 > * jtreg:test/jaxp:tier1 > * jtreg:test/lib-test:tier1 > > Running test 'jtreg:test/hotspot/jtreg:tier1' > An exception has occurred in the compiler (22-internal). Please file a bug against the Java compiler via the Java bug reporting page (https://bugreport.java.com) after checking the Bug Database (https://bugs.java.com) for duplicates. Include your program, the following diagnostic, and the parameters passed to the Java compiler in your report. Thank you. > java.lang.invoke.WrongMethodTypeException: cannot convert MethodHandle(VarHandle,byte[],int)long to (VarHandle,byte[],int)int > at java.base/java.lang.invoke.MethodHandle.asTypeUncached(MethodHandle.java:903) > at java.base/java.lang.invoke.MethodHandle.asType(MethodHandle.java:870) > at java.base/jdk.internal.util.ByteArray.getLong(ByteArray.java:188) > at java.base/java.io.DataInputStream.readLong(DataInputStream.java:408) > at jdk.compiler/com.sun.tools.javac.util.ByteBuffer.getLong(ByteBuffer.java:211) > at jdk.compiler/com.sun.tools.javac.jvm.PoolReader.resolve(PoolReader.java:260) > at jdk.compiler/com.sun.tools.javac.jvm.PoolReader$ImmutablePoolHelper.readIfNeeded(PoolReader.java:391) > at jdk.compiler/com.sun.tools.javac.jvm.PoolReader.getConstant(PoolReader.java:206) > at jdk.compiler/com.sun.tools.javac.jvm.ClassReader$3.read(ClassReader.java:855) > at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readAttrs(ClassReader.java:1433) > at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readMemberAttrs(ClassReader.java:1423) > at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readField(ClassReader.java:2283) > at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readClass(ClassReader.java:2645) > at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readClassBuffer(ClassReader.java:2738) > at jdk.compiler/com.sun.tools.javac.jvm.ClassReader.readClassFile(ClassReader.java:2762) > at jdk.compiler/com.sun.tools.javac.code.ClassFinder.fillIn(ClassFinder.java:373) > at jdk.compiler/com.sun.tools.javac.code.ClassFinder.complete(ClassFinder.java:302) > at jdk.compiler/com.sun.tools.javac.code.Symtab$2.complete(Symtab.java:360) > at jdk.compiler/com.sun.tools.javac.code.Symbol.complete(Symbol.java:682) > at jdk.compiler/com.sun.tools.javac.code.Symbol$ClassSymbol.complete(Symbol.java:1418) > at jdk.compiler/com.sun.tools.javac.code.Symbol$ClassSymbol.flags(Symbol.java:1334) > at jdk.compiler/com.sun.tools.javac.code.Types$12.visitClassType(Types.java:2183) > at jdk.compiler/com.sun.tools.javac.code.Types$12.visitClassType(Types.java:2159) > at jdk.compiler/com.sun.tools.javac.code.Type$ClassType.accept(Type.java:1050) > at jdk.compiler/com.sun.tools.javac.code.Types$DefaultTypeVisitor.visit(Types.java:4894) > at jdk.compiler/com.sun.tools.javac.code.Types.asSuper(Types.java:2156) > at jdk.compiler/com.sun.tools.javac.code.Types$12.visitClassType(Types.java:2179) > at jdk.compiler/com.sun.tools.javac.code.Types$12.visitClassType(Types.java:2159) > at jdk.compiler/com.sun.tools.javac.code.Type$ClassType.accept(Type.java:1050) > ``` This issue was the result of a change to include the a union inside `ResolvedMethodEntry`. The most recent commit and merge should have addressed this problem and made the interpreter code stable. Thanks for your patience ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1775916150 From duke at openjdk.org Mon Oct 23 20:56:36 2023 From: duke at openjdk.org (Leela Mohan Venati) Date: Mon, 23 Oct 2023 20:56:36 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v2] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 18:50:14 GMT, Zhengyu Gu wrote: >> Interpreter oop maps are computed lazily during GC root scan and they are expensive to compute. >> >> GCs uses a small hash table per instance class to cache computed oop maps during STW root scan, but not for concurrent root scan. >> >> This patch is intended to enable `OopMapCache` for concurrent GCs. >> >> Test: >> tier1 and tier2 fastdebug and release on MacOSX, Linux 86_84 and Linux 86_32. > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup old oop map cache entry after class redefinition Before of this change, VM_Operations which enabled "is_gc_active" and walks stack to collect roots had a call to OopMapCache::cleanup_old_entries() in its corresponding doit_epilogue(). This is to free the old entries which are collected during safepoint. Now scope of the change isn't clear. We seem to extend them to concurrent GCs during their concurrent phases (Not just safepoints of concurrent GCs). Calling OopMapCache::cleanup_old_entries() in doit_epilogue() would effectively cleanup old entries accumulated during concurrent phase of the GC and also during safepoint. But change also allows java threads to accumulate old entries. When/Who calls cleanup_old_entries() in this case ? These needs to wait until future GC which does cleanup in doit_epilogue(). However, at least theoretically, we can have large time windows without GCs. Old entries accumulated by java threads can be seen as a leak (until next GC) A separate thought to make change simpler. Can cleanup_old_entries become a [list of Cleanup tasks](https://github.com/openjdk/jdk/blob/8d9a4b43f4fff30fd217dab2c224e641cb913c18/src/hotspot/share/runtime/safepoint.hpp#L72) VM thread does during cleanup phase. ------------- PR Review: https://git.openjdk.org/jdk/pull/16074#pullrequestreview-1693431578 From duke at openjdk.org Mon Oct 23 20:56:38 2023 From: duke at openjdk.org (Leela Mohan Venati) Date: Mon, 23 Oct 2023 20:56:38 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v2] In-Reply-To: References: <39f-0nlOdQBABHr1cQOq7jITuRLzM9yLDUEUm1--0N8=.f3aeed11-7151-4885-8376-7d91ca84e8a7@github.com> Message-ID: <5-zr90Y1fr5aTEKhVxgYJOgqiwGYtil5jW66h3mgE1w=.ae65f301-52ac-4ce1-93c2-a39b31d8d56f@github.com> On Mon, 23 Oct 2023 19:26:03 GMT, Zhengyu Gu wrote: >> I stand corrected. >> >> My question is still relevant >>>> Do you think, VM_ShenandoahFinalMarkStartEvac walks the stack roots. >> >> My recommendation is incorrect. > > No, `VM_ShenandoahFinalMarkStartEvac ` does not walk the stack roots, it signals the end of mark phase. Note we don't need to call OopMapCache::cleanup_old_entries() after this [PR](https://github.com/openjdk/jdk/pull/15921). VM_ShenandoahFinalMarkStartEvac is derived from VM_ShenandoahReferenceOperation. VM_ShenandoahReferenceOperation::doit_epilogue calls OopMapCache::cleanup_old_entries(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16074#discussion_r1369207915 From duke at openjdk.org Mon Oct 23 20:56:41 2023 From: duke at openjdk.org (Leela Mohan Venati) Date: Mon, 23 Oct 2023 20:56:41 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v2] In-Reply-To: References: <39f-0nlOdQBABHr1cQOq7jITuRLzM9yLDUEUm1--0N8=.f3aeed11-7151-4885-8376-7d91ca84e8a7@github.com> Message-ID: On Mon, 23 Oct 2023 19:27:26 GMT, Zhengyu Gu wrote: >> src/hotspot/share/oops/method.cpp line 311: >> >>> 309: void Method::mask_for(int bci, InterpreterOopMap* mask) { >>> 310: methodHandle h_this(Thread::current(), this); >>> 311: method_holder()->mask_for(h_this, bci, mask); >> >> Removing this condition allows all the threads including java threads to use/mutate oopMapCache. >> >> For ex: Java threads calls [JVM_CallStackWalk](https://github.com/openjdk/jdk/blob/741ae06c55de65dcdfe38e328022bd8dde4fa007/src/hotspot/share/prims/jvm.cpp#L586) which walks the stack and calls locals() and expressions [here](https://github.com/openjdk/jdk/blob/741ae06c55de65dcdfe38e328022bd8dde4fa007/src/hotspot/share/prims/stackwalk.cpp#L345) which access oopMapCache. > > The `oopMapCache` now is fully concurrent, it can be used/modified by Java threads. Which thread takes up responsibility of cleaning up old entries accumulated by java threads. Allowing for java threads definitely increases the scope (than it is mentioned in the bug ). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16074#discussion_r1369212468 From duke at openjdk.org Mon Oct 23 22:41:43 2023 From: duke at openjdk.org (Elif Aslan) Date: Mon, 23 Oct 2023 22:41:43 GMT Subject: RFR: 8318608: Enable parallelism in vmTestbase/nsk/stress/threads tests Message-ID: The commit includes changes to unblock parallelism for more `hotspot:tier4` tests. in `test/hotspot/jtreg/vmTestbase/nsk/stress/thread `tests. Below are the before and after test run comparisons: Before: time,count 15, 1 33, 1 48, 1 66, 1 72, 1 77, 1 Mean 51.83s Standard deviation 22.24s Total elapsed time 1m 17s After: time,count 19, 1 23, 1 29, 1 34, 1 48, 1 53, 1 Mean 34.33s Standard deviation 12.43s Total elapsed time 0m 53s ------------- Commit messages: - JDK-8315937: Enable parallelism in vmTestbase/nsk/stress/threads tests Changes: https://git.openjdk.org/jdk/pull/16327/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16327&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318608 Stats: 24 lines in 1 file changed: 0 ins; 24 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16327.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16327/head:pull/16327 PR: https://git.openjdk.org/jdk/pull/16327 From duke at openjdk.org Mon Oct 23 22:41:44 2023 From: duke at openjdk.org (Elif Aslan) Date: Mon, 23 Oct 2023 22:41:44 GMT Subject: RFR: 8318608: Enable parallelism in vmTestbase/nsk/stress/threads tests In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 22:30:42 GMT, Elif Aslan wrote: > The commit includes changes to unblock parallelism for more `hotspot:tier4` tests. in `test/hotspot/jtreg/vmTestbase/nsk/stress/thread `tests. > > Below are the before and after test run comparisons: > > Before: > time,count > 15, 1 > 33, 1 > 48, 1 > 66, 1 > 72, 1 > 77, 1 > > Mean 51.83s > Standard deviation 22.24s > Total elapsed time 1m 17s > > After: > time,count > 19, 1 > 23, 1 > 29, 1 > 34, 1 > 48, 1 > 53, 1 > > Mean 34.33s > Standard deviation 12.43s > Total elapsed time 0m 53s @lmesnik could you please test this to your system and report the results? TIA ------------- PR Comment: https://git.openjdk.org/jdk/pull/16327#issuecomment-1776124746 From zgu at openjdk.org Mon Oct 23 23:48:34 2023 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 23 Oct 2023 23:48:34 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v2] In-Reply-To: References: Message-ID: <1u5Usl2mr-Q2jvGdBW0V0qVeC4u-Aqh0MdZS03lIUJk=.45e08b12-60f0-41b0-a76f-5ad2a0aff4ea@github.com> On Mon, 23 Oct 2023 20:53:39 GMT, Leela Mohan Venati wrote: > Before of this change, VM_Operations which enabled "is_gc_active" and walks stack to collect roots had a call to OopMapCache::cleanup_old_entries() in its corresponding doit_epilogue(). This is to free the old entries which are collected during safepoint. > > Now scope of the change isn't clear. > > We seem to extend them to concurrent GCs during their concurrent phases (Not just safepoints of concurrent GCs). Calling OopMapCache::cleanup_old_entries() in doit_epilogue() would effectively cleanup old entries accumulated during concurrent phase of the GC and also during safepoint. > > But change also allows java threads to accumulate old entries. When/Who calls cleanup_old_entries() in this case ? These needs to wait until future GC which does cleanup in doit_epilogue(). However, at least theoretically, we can have large time windows without GCs. Old entries accumulated by java threads can be seen as a leak (until next GC) > Yes, it could accumulate a few. But from what I saw, there are not many. They are cleanup by: - Serial, Parallel and G1: any `VM_GC_Operation`s - ZGC: any `VM_ZOperation`s - Shenandoah: I should have added `OopMapCache::cleanup_old_entries()` in `VM_ShenandoahOperation`, where should help jdk11u backport. > A separate thought to make change simpler. Can cleanup_old_entries become a [list of Cleanup tasks](https://github.com/openjdk/jdk/blob/8d9a4b43f4fff30fd217dab2c224e641cb913c18/src/hotspot/share/runtime/safepoint.hpp#L72) VM thread does during cleanup phase. No. There is no point to add safepoint latency when it can not done outside of safepoint. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16074#issuecomment-1776227534 From kbarrett at openjdk.org Tue Oct 24 00:05:28 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 24 Oct 2023 00:05:28 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v4] In-Reply-To: <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> Message-ID: On Mon, 23 Oct 2023 14:17:58 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Use public methods instead of Platform* Do I understand correctly that all platforms support cas64 other than arm32, and even it does so if using a non-ancient hardware version or the kernel helper? If so, I'd be inclined to just say it must always be supported and be done with the supports_cx8 stuff around the Atomic class. So update the comments for Atomic accordingly, and remove the supports_cx8 checks in the tests. But I'm okay with the current version and leaving that for followup. Regarding the comment "inconsistency" for Atomic, I read those as being a general comment about 64bit operations not necessarily being supported, but that cas64 is an exception and must be supported. I'm guessing we didn't do what's in this PR and provide cas64-based implementations of other operations due to also needing load64/store64, though I suppose one could even implement those with cas64, horrid as that might be. The templatizing effort just maintained the status quo in this regard. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16252#pullrequestreview-1693730500 From lmesnik at openjdk.org Tue Oct 24 00:43:34 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 24 Oct 2023 00:43:34 GMT Subject: RFR: 8318608: Enable parallelism in vmTestbase/nsk/stress/threads tests In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 22:30:42 GMT, Elif Aslan wrote: > The commit includes changes to unblock parallelism for more `hotspot:tier4` tests. in `test/hotspot/jtreg/vmTestbase/nsk/stress/thread `tests. > > Below are the before and after test run comparisons: > > Before: > time,count > 15, 1 > 33, 1 > 48, 1 > 66, 1 > 72, 1 > 77, 1 > > Mean 51.83s > Standard deviation 22.24s > Total elapsed time 1m 17s > > After: > time,count > 19, 1 > 23, 1 > 29, 1 > 34, 1 > 48, 1 > 53, 1 > > Mean 34.33s > Standard deviation 12.43s > Total elapsed time 0m 53s Looks good. Testing passed. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16327#pullrequestreview-1693754613 From jwaters at openjdk.org Tue Oct 24 01:39:58 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 24 Oct 2023 01:39:58 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v2] In-Reply-To: References: Message-ID: <3pQbdqLKqb5j9uCBATrYH7m8xITcqAJCxAcetD82-kg=.84f08d3b-6c00-4154-867a-051076a12f38@github.com> > On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. > > The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. > > Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method > > All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. > > This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Minor Style Change in os_windows.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16303/files - new: https://git.openjdk.org/jdk/pull/16303/files/c48a7b17..f7bda206 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16303/head:pull/16303 PR: https://git.openjdk.org/jdk/pull/16303 From dlong at openjdk.org Tue Oct 24 02:50:44 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 24 Oct 2023 02:50:44 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object The patch fails for me even though the locks are correct, because kptr->obj()->print_string() is not null-terminated. The patch needs to use strncmp or maybe something like kptr->obj()->klass()->name()->as_C_string(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1776414437 From fyang at openjdk.org Tue Oct 24 03:13:26 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 24 Oct 2023 03:13:26 GMT Subject: RFR: 8318222: RISC-V: C2 CmpU3 In-Reply-To: References: Message-ID: <1KMHfxdz_PQ194Psrr8eVNzF2sAvwzntIEXAeeB9VQ4=.39cc3651-ef9a-40a8-ac14-6334e859d017@github.com> On Mon, 23 Oct 2023 15:45:39 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for CmpU3 and CmpUL3? > Thanks! > > ## Test > > ### functionality > pass jtreg test: > jdk/java/lang/Long/Unsigned.java, jdk/java/lang/Integer/Unsigned.java > > ### performance > #### Long > **before**: > Benchmark (size) Mode Cnt Score Error Units > Longs.compareUnsignedDirect 500 avgt 5 1454.789 ? 129.557 ns/op > Longs.compareUnsignedIndirect 500 avgt 5 1410.146 ? 120.017 ns/op > > **after**: > Benchmark (size) Mode Cnt Score Error Units > Longs.compareUnsignedDirect 500 avgt 5 1286.129 ? 8.441 ns/op > Longs.compareUnsignedIndirect 500 avgt 5 993.490 ? 0.840 ns/op > > #### Integer > **before**: > Benchmark (size) Mode Cnt Score Error Units > Integers.compareUnsignedDirect 500 avgt 5 1611.753 ? 0.700 ns/op > Integers.compareUnsignedIndirect 500 avgt 5 1775.093 ? 1.520 ns/op > > **after**: > Benchmark (size) Mode Cnt Score Error Units > Integers.compareUnsignedDirect 500 avgt 5 1159.351 ? 0.601 ns/op > Integers.compareUnsignedIndirect 500 avgt 5 776.185 ? 0.924 ns/op LGTM. May I ask on which platform was the JMH tested? Also I think it's safer to perform some regression tests like tier1-3. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16314#pullrequestreview-1693866847 From shade at openjdk.org Tue Oct 24 06:40:27 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 24 Oct 2023 06:40:27 GMT Subject: RFR: 8318608: Enable parallelism in vmTestbase/nsk/stress/threads tests In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 22:30:42 GMT, Elif Aslan wrote: > The commit includes changes to unblock parallelism for more `hotspot:tier4` tests. in `test/hotspot/jtreg/vmTestbase/nsk/stress/thread `tests. > > Below are the before and after test run comparisons: > > Before: > time,count > 15, 1 > 33, 1 > 48, 1 > 66, 1 > 72, 1 > 77, 1 > > Mean 51.83s > Standard deviation 22.24s > Total elapsed time 1m 17s > > After: > time,count > 19, 1 > 23, 1 > 29, 1 > 34, 1 > 48, 1 > 53, 1 > > Mean 34.33s > Standard deviation 12.43s > Total elapsed time 0m 53s Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16327#pullrequestreview-1694114868 From shade at openjdk.org Tue Oct 24 06:49:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 24 Oct 2023 06:49:39 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v4] In-Reply-To: References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> Message-ID: On Tue, 24 Oct 2023 00:03:02 GMT, Kim Barrett wrote: > Do I understand correctly that all platforms support cas64 other than arm32, and even it does so if using a non-ancient hardware version or the kernel helper? That is my understanding after doing this work, yes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1776622343 From rrich at openjdk.org Tue Oct 24 07:09:00 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 24 Oct 2023 07:09:00 GMT Subject: RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v21] In-Reply-To: <-tM40FGW10TWooaxzhFrtU7Xx9sQga4nhvZdUqDRLnQ=.dc86d37b-3502-43b4-8818-6ed13454d31b@github.com> References: <1s9_eK30_SkOiLxFIRRv5w_JEbmEz93C3zsZpNaYK0Q=.9c91e63a-9122-4e81-82b3-93104d9444a2@github.com> <-tM40FGW10TWooaxzhFrtU7Xx9sQga4nhvZdUqDRLnQ=.dc86d37b-3502-43b4-8818-6ed13454d31b@github.com> Message-ID: On Fri, 13 Oct 2023 14:48:23 GMT, Albert Mingkun Yang wrote: >> Should https://bugs.openjdk.org/browse/JDK-8309960 be reverted? Better in a follow-up I guess. > >> Should https://bugs.openjdk.org/browse/JDK-8309960 be reverted? Better in a follow-up I guess. > > Better in its own PR, IMO. Thanks to @albertnetymk the final version is a good deal simpler than the _baseline_. The extra effort was totally woth it. In the end parallel processing of large arrays is like a side effect of the refactoring. Thanks everybody for the review work and feedback! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1776641794 From rrich at openjdk.org Tue Oct 24 07:09:01 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 24 Oct 2023 07:09:01 GMT Subject: Integrated: 8310031: Parallel: Implement better work distribution for large object arrays in old gen In-Reply-To: References: Message-ID: On Wed, 12 Jul 2023 08:05:59 GMT, Richard Reingruber wrote: > This pr introduces parallel scanning of large object arrays in the old generation containing roots for young collections of Parallel GC. This allows for better distribution of the actual work (following the array references) as opposed to "stealing" from other task queues which can lead to inverse scaling demonstrated by small tests (attached to JDK-8310031) and also observed in gerrit production systems. > > #### Implementation (Updated 2023-10-20) > > Comment copied from `PSCardTable::scavenge_contents_parallel`: > > ```c++ > // Scavenging and accesses to the card table are strictly limited to the stripe. > // In particular scavenging of an object crossing stripe boundaries is shared > // among the threads assigned to the stripes it resides on. This reduces > // complexity and enables shared scanning of large objects. > // It requires preprocessing of the card table though where imprecise card marks of > // objects crossing stripe boundaries are propagated to the first card of > // each stripe covered by the individual object. > > > The baseline was refactored to make use of a read-only copy of the card table. That "shadow" table (`PSStripeShadowCardTable`) separates reading, clearing and redirtying of table entries which allows for a much simpler implementation. > > Scanning of object arrays is limited to dirty card chunks. > > ## Everything below refers to the Outdated Initial Implementation > > The algorithm to share scanning large arrays is supposed to be a straight > forward extension of the scheme implemented in > `PSCardTable::scavenge_contents_parallel`. > > - A worker scans the part of a large array located in its stripe > > - Except for the end of the large array reaching into a stripe which is scanned by the thread owning the previous stripe. This is just what the current implementation does: it skips objects crossing into the stripe. > > - For this it is necessary that large arrays cover at least 3 stripes (see `PSCardTable::large_obj_arr_min_words`) > > The implementation also makes use of the precise card marks for arrays. Only dirty regions are actually scanned. > > #### Performance testing > > ##### BigArrayInOldGenRR.java > > [BigArrayInOldGenRR.java](https://bugs.openjdk.org/secure/attachment/104422/BigArrayInOldGenRR.java) is a micro benchmark that assigns new objects to a large array in a loop. Creating new array elements triggers young collections. In each collection the large array is scanned because of its references to the new elements in the young generation. The benchmark score is the g... This pull request has now been integrated. Changeset: 4bfe2268 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/4bfe226870a15306b1e015c38fe3835f26b41fe6 Stats: 334 lines in 5 files changed: 177 ins; 70 del; 87 mod 8310031: Parallel: Implement better work distribution for large object arrays in old gen Co-authored-by: Albert Mingkun Yang Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/14846 From mbaesken at openjdk.org Tue Oct 24 07:12:44 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 24 Oct 2023 07:12:44 GMT Subject: Integrated: JDK-8318587: refresh libraries cache on AIX in print_vm_info In-Reply-To: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> References: <1tniFEVjN_i0eJyrRzkD6fzxw4YlnpSRZAYTjZe4Qig=.f34b9f02-6463-4b5a-8039-3cb2f1bc48a3@github.com> Message-ID: On Fri, 20 Oct 2023 08:07:46 GMT, Matthias Baesken wrote: > print_vm_info outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. This pull request has now been integrated. Changeset: cb383c05 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/cb383c05b23ef4f6992796bdc5b27eb8386c65d5 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8318587: refresh libraries cache on AIX in print_vm_info Reviewed-by: mdoerr, lucy, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/16284 From aboldtch at openjdk.org Tue Oct 24 07:23:38 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 24 Oct 2023 07:23:38 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v4] In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 09:35:55 GMT, Erik ?sterlund wrote: >> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). >> In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. >> >> The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. >> >> I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14543#pullrequestreview-1694170469 From lkorinth at openjdk.org Tue Oct 24 07:49:30 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 24 Oct 2023 07:49:30 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v6] In-Reply-To: References: Message-ID: > Rename createJavaProcessBuilder so that it is not used by mistake instead of createTestJvm. > > I have used the following sed script: `find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"` > > Then I have manually modified ProcessTools.java. In that file I have moved one version of createJavaProcessBuilder so that it is close to the other version. Then I have added a javadoc comment in bold telling: > > /** > * Create ProcessBuilder using the java launcher from the jdk to > * be tested. > * > *

Please observe that you likely should use > * createTestJvm() instead of this method because createTestJvm() > * will add JVM options from "test.vm.opts" and "test.java.opts" > * and this method will not do that. > * > * @param command Arguments to pass to the java command. > * @return The ProcessBuilder instance representing the java command. > */ > > > I have used the name createJavaProcessBuilderIgnoreTestJavaOpts because of the name of Utils.prependTestJavaOpts that adds those VM flags. If you have a better name I could do a rename of the method. I kind of like that it is long and clumsy, that makes it harder to use... > > I have run tier 1 testing, and I have started more exhaustive testing. Leo Korinth has updated the pull request incrementally with six additional commits since the last revision: - static import - copyright - whitespace - whitespace - sed - fix test/lib/jdk/test/lib/process/ProcessTools.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15452/files - new: https://git.openjdk.org/jdk/pull/15452/files/f80dda8d..2f57a32d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15452&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15452&range=04-05 Stats: 1580 lines in 560 files changed: 44 ins; 34 del; 1502 mod Patch: https://git.openjdk.org/jdk/pull/15452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15452/head:pull/15452 PR: https://git.openjdk.org/jdk/pull/15452 From lkorinth at openjdk.org Tue Oct 24 09:10:43 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 24 Oct 2023 09:10:43 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v6] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 07:49:30 GMT, Leo Korinth wrote: >> Rename createJavaProcessBuilder so that it is not used by mistake instead of createTestJvm. >> >> I have used the following sed script: `find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"` >> >> Then I have manually modified ProcessTools.java. In that file I have moved one version of createJavaProcessBuilder so that it is close to the other version. Then I have added a javadoc comment in bold telling: >> >> /** >> * Create ProcessBuilder using the java launcher from the jdk to >> * be tested. >> * >> *

Please observe that you likely should use >> * createTestJvm() instead of this method because createTestJvm() >> * will add JVM options from "test.vm.opts" and "test.java.opts" >> * and this method will not do that. >> * >> * @param command Arguments to pass to the java command. >> * @return The ProcessBuilder instance representing the java command. >> */ >> >> >> I have used the name createJavaProcessBuilderIgnoreTestJavaOpts because of the name of Utils.prependTestJavaOpts that adds those VM flags. If you have a better name I could do a rename of the method. I kind of like that it is long and clumsy, that makes it harder to use... >> >> I have run tier 1 testing, and I have started more exhaustive testing. > > Leo Korinth has updated the pull request incrementally with six additional commits since the last revision: > > - static import > - copyright > - whitespace > - whitespace > - sed > - fix test/lib/jdk/test/lib/process/ProcessTools.java Hi, if you want to see what I have modified manually, you can do my sed commands and compare to this pull request: git switch -c _reproduce 15acf4b8d7cffcd0d74bf1b9c43cde9acaf31ea9 find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createLimitedTestJavaProcessBuilder(/g" find -name "*.java" | xargs -n 1 sed -i -e "s/createTestJvm(/createTestJavaProcessBuilder(/g" find -name "*.java" | xargs -n 1 sed -i -e "s/import static jdk.test.lib.process.ProcessTools.createJavaProcessBuilder/import static jdk.test.lib.process.ProcessTools.createLimitedTestJavaProcessBuilder/g" find -name "*.java" | xargs -n 1 sed -i -e "s/import static jdk.test.lib.process.ProcessTools.createTestJvm/import static jdk.test.lib.process.ProcessTools.createTestJavaProcessBuilder/g" git add -u; git commit -m sed git diff-tree --no-commit-id --name-only -r 15acf4b8d7cffcd0d74bf1b9c43cde9acaf31ea9..HEAD | xargs sed -i -e "s%^( * Copyright (c) ....)[^[:alpha:]]*(Oracle.*)%\1, 2023, \2%" git ls-files -m | xargs sed -i -e "s%(Copyright (c) 2023), 2023, (Oracle.*)%\1, \2%" git add -u; git commit -m copyright git diff HEAD 2f57a32df8d17da51a04177563327ca2a75e8061 It will give you an easier way to review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15452#issuecomment-1776817287 From dholmes at openjdk.org Tue Oct 24 09:33:39 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Oct 2023 09:33:39 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: <94xEhtV9YjxUS5QN2oHOWCzwhFaKi05PO9o3Y5tieDI=.ecd425b8-a7c3-4c4a-9e7b-1ae099b92b52@github.com> On Mon, 23 Oct 2023 09:10:06 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8496: > 8494: // Get sleef stub routine addresses > 8495: char ebuf[1024]; > 8496: void* libsleef = os::dll_load(UseSleefLib, ebuf, sizeof ebuf); Shouldn't this check that UseSleefLib has been set to something other than "" ? (To save the failing `dll_load` call.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1369876834 From dholmes at openjdk.org Tue Oct 24 09:40:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Oct 2023 09:40:31 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v6] In-Reply-To: <7kjUHAm2miHLca5yBML-XS86qncel6Bwne7gLbDitZI=.ed191be2-d263-4fed-807a-55489e86c0db@github.com> References: <-ALHHMcYPfciG6g2sOT-XIEVTf1pA6XXa93eNXQamD4=.88329bf9-627b-4d78-93fc-299550fc2be0@github.com> <7kjUHAm2miHLca5yBML-XS86qncel6Bwne7gLbDitZI=.ed191be2-d263-4fed-807a-55489e86c0db@github.com> Message-ID: On Mon, 23 Oct 2023 09:38:26 GMT, Afshin Zafari wrote: > It is up to date with master and nothing to push. That doesn't seem possible since you last merged with master on September 26. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1776866055 From jwaters at openjdk.org Tue Oct 24 09:46:51 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 24 Oct 2023 09:46:51 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v3] In-Reply-To: References: Message-ID: > On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. > > The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. > > Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method > > All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. > > This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'openjdk:master' into noreturn - Minor Style Change in os_windows.cpp - 8304939 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16303/files - new: https://git.openjdk.org/jdk/pull/16303/files/f7bda206..c025c250 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=01-02 Stats: 3458 lines in 117 files changed: 1999 ins; 741 del; 718 mod Patch: https://git.openjdk.org/jdk/pull/16303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16303/head:pull/16303 PR: https://git.openjdk.org/jdk/pull/16303 From stefank at openjdk.org Tue Oct 24 09:51:39 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 24 Oct 2023 09:51:39 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v6] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 07:49:30 GMT, Leo Korinth wrote: >> Rename createJavaProcessBuilder so that it is not used by mistake instead of createTestJvm. >> >> I have used the following sed script: `find -name "*.java" | xargs -n 1 sed -i -e "s/createJavaProcessBuilder(/createJavaProcessBuilderIgnoreTestJavaOpts(/g"` >> >> Then I have manually modified ProcessTools.java. In that file I have moved one version of createJavaProcessBuilder so that it is close to the other version. Then I have added a javadoc comment in bold telling: >> >> /** >> * Create ProcessBuilder using the java launcher from the jdk to >> * be tested. >> * >> *

Please observe that you likely should use >> * createTestJvm() instead of this method because createTestJvm() >> * will add JVM options from "test.vm.opts" and "test.java.opts" >> * and this method will not do that. >> * >> * @param command Arguments to pass to the java command. >> * @return The ProcessBuilder instance representing the java command. >> */ >> >> >> I have used the name createJavaProcessBuilderIgnoreTestJavaOpts because of the name of Utils.prependTestJavaOpts that adds those VM flags. If you have a better name I could do a rename of the method. I kind of like that it is long and clumsy, that makes it harder to use... >> >> I have run tier 1 testing, and I have started more exhaustive testing. > > Leo Korinth has updated the pull request incrementally with six additional commits since the last revision: > > - static import > - copyright > - whitespace > - whitespace > - sed > - fix test/lib/jdk/test/lib/process/ProcessTools.java Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15452#pullrequestreview-1694437335 From dholmes at openjdk.org Tue Oct 24 09:54:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Oct 2023 09:54:31 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v4] In-Reply-To: <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> Message-ID: <2xg71Q2_ry63yQ3WhPoQd73FhELo5Yt-xe89opt0Yi0=.89f2add9-7117-4c77-9e8f-3f8d8a6076c2@github.com> On Mon, 23 Oct 2023 14:17:58 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Use public methods instead of Platform* Note you cannot just provide a lock-based implementation of cmpxchg8 and think all is well with the world. All accesses to variables that could be updated by such a cmpxchg8 would have to be accessed in a way that is consistent/safe with the use of the lock (e.g. a raw read could see a partial update if done concurrently with a locked-cmpxchg8). Even on the Java side (via Unsafe) this is not fully in place but that issue is recognized and documented. I'm tempted to say the VM should simply fail to start if `supports_cx8` is not true - the kernel helper has been around for many years for ARMv6 so I can't imagine there would be platforms in use now that don't support it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1776884012 PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1776887812 From shade at openjdk.org Tue Oct 24 10:00:31 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 24 Oct 2023 10:00:31 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v4] In-Reply-To: <2xg71Q2_ry63yQ3WhPoQd73FhELo5Yt-xe89opt0Yi0=.89f2add9-7117-4c77-9e8f-3f8d8a6076c2@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> <2xg71Q2_ry63yQ3WhPoQd73FhELo5Yt-xe89opt0Yi0=.89f2add9-7117-4c77-9e8f-3f8d8a6076c2@github.com> Message-ID: <40HasbfLGQ_3O0B3SnW9zr6xOVDk6R5teJ1pF74rrA4=.d4969258-5e76-4faa-bc5c-b131a4408945@github.com> On Tue, 24 Oct 2023 09:52:07 GMT, David Holmes wrote: > I'm tempted to say the VM should simply fail to start if `supports_cx8` is not true - the kernel helper has been around for many years for ARMv6 so I can't imagine there would be platforms in use now that don't support it. Same. That would also remove any need for lockers in Access API, AFAICS. Let's try that in a separate PR and see what happens? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1776896651 From dholmes at openjdk.org Tue Oct 24 10:05:30 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Oct 2023 10:05:30 GMT Subject: RFR: 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo [v2] In-Reply-To: References: <6KNR9g_eUuJS_Fyilrniq-qMhf2w3R41tcFOJirJ6Dk=.ca584320-efcd-4cb6-8e84-90c16cb3ca0c@github.com> Message-ID: On Mon, 23 Oct 2023 06:27:15 GMT, Jan Kratochvil wrote: >> src/hotspot/cpu/x86/vm_version_x86.hpp line 527: >> >>> 525: }; >>> 526: >>> 527: class CpuidInfo : public _CpuidInfo { >> >> Why not just declare the original `CpuidInfo` as a class instead of extending the struct ??? > > This way the members are always zero-initialized. Which simplifies existing code > > -VM_Version::CpuidInfo VM_Version::_cpuid_info = { 0, }; > +VM_Version::CpuidInfo VM_Version::_cpuid_info; > > And makes it more foolproof - therefore also fixing an existing bug (I haven't been aware of so far) in my code in the [CRaC](https://openjdk.org/projects/crac/) branch. > > Although the ` = { 0, }` initialization above was not needed as it is a static member anyway. > > Or should I remove the zero-initialization? Sorry I'm not familiar enough with C++ initialization rules for structs versus classes here. IIUC: - the original struct instance was explicitly zero-initialized. - If we just made it a class there would not be any zero initialization - If the class extends the struct then we (somehow) regain zero initialization ?? Maybe methods on a struct weren't so bad after all. I'd like to hear other views on this code change. (Just to be clear 2 reviews are needed for hotspot code changes anyway.) Thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16093#discussion_r1369915053 From jsjolen at openjdk.org Tue Oct 24 10:13:33 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 24 Oct 2023 10:13:33 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v5] In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 08:23:44 GMT, Stefan Karlsson wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix messed up include > > src/hotspot/share/nmt/nmtPreInit.hpp line 35: > >> 33: #include "utilities/macros.hpp" >> 34: >> 35: #ifdef ASSERT > > The blank line at 34 is not following the style for our conditional includes. Remove it, or better yet skip conventionalize the include of runtime/atomic.hpp since it just adds to noise to the file. Removed the blank line, I think the noise is meaningful in this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1369924184 From rkennke at openjdk.org Tue Oct 24 10:35:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 24 Oct 2023 10:35:29 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v61] In-Reply-To: References: Message-ID: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: - Merge branch 'master' into JDK-8139457 - Fix ARM build - Merge remote-tracking branch 'upstream/master' into JDK-8139457 - Various cleanups - RISC changes - Move gap init into allocate_header() (x86) - Fix gtest failure on x86 - Merge remote-tracking branch 'upstream/master' into JDK-8139457 - Fix comments - Fix inconsistencies in argument naming C1_MacroAssembler::allocate_array() - ... and 80 more: https://git.openjdk.org/jdk/compare/9bfa0829...7eaca124 ------------- Changes: https://git.openjdk.org/jdk/pull/11044/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=60 Stats: 626 lines in 33 files changed: 478 ins; 83 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From jsjolen at openjdk.org Tue Oct 24 10:37:43 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 24 Oct 2023 10:37:43 GMT Subject: RFR: 8314644: Change "Rvalue references and move semantics" into an accepted feature [v2] In-Reply-To: References: Message-ID: <1umNl-OT5s7nHhWAsg1mUXwg6kPc_kOF_NQZmn6j8ik=.b46d688e-fc9e-40c4-8064-34d47da49d37@github.com> > Hi, > > I'd like to propose that rvalue references and move semantics are now considered permitted in the style guide. This change would allow for move constructors to be written. This enables more performant code, if the move ctr is less expensive than the copy ctr, but also more correct code. For the latter part, look at "8314571: GrowableArray should move its old data and not copy it". Here we can avoid using copy assignment, instead using move constructors, which more accurately reflects what is happening: The old elements are in fact moved, and not copied. > > Two useful std functions will become available to us with this change: > > 1. `std::move`, for explicitly moving a value. This is a slightly more powerful `static_cast(T)`, in that it also handles `T&` corectly. > 2. `std::forward`, which simplifies the usage of perfect forwarding. Perfect forwarding is a technique where in copying is minimized. To quote Scott Meyers ( https://cppandbeyond.com/2011/04/25/session-announcement-adventures-in-perfect-forwarding/ ): > >> Perfecting forwarding is an important C++0x technique built atop rvalue references. It allows move semantics to be automatically applied, even when the source and the destination of a move are separated by intervening function calls. Common examples include constructors and setter functions that forward arguments they receive to the data members of the class they are initializing or setting, as well as standard library functions like make_shared, which ?perfect-forwards? its arguments to the class constructor of whatever object the to-be-created shared_ptr is to point to. > > Looking forward to your feedback, thank you. > Johan Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Add a (admittedly clunky) single asterisk - Expand on the feature ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15386/files - new: https://git.openjdk.org/jdk/pull/15386/files/a85b1d18..2cba5f23 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15386&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15386&range=00-01 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15386/head:pull/15386 PR: https://git.openjdk.org/jdk/pull/15386 From jkratochvil at openjdk.org Tue Oct 24 10:43:36 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Tue, 24 Oct 2023 10:43:36 GMT Subject: RFR: 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo [v2] In-Reply-To: References: <6KNR9g_eUuJS_Fyilrniq-qMhf2w3R41tcFOJirJ6Dk=.ca584320-efcd-4cb6-8e84-90c16cb3ca0c@github.com> Message-ID: <3VjGgj9-bH8aRKuxWuyTImpKGZcZWJ4x-R-x6ZgI_Qg=.2d873004-c0c2-4a00-b834-e3c98d791c90@github.com> On Tue, 24 Oct 2023 10:02:42 GMT, David Holmes wrote: > * the original struct instance was explicitly zero-initialized. It had to be zero-initialized by user of the struct. Which is error prone if you forget the zero-initialization during each use of the struct type. > * If we just made it a class there would not be any zero initialization struct vs. class does not matter. > * If the class extends the struct then we (somehow) regain zero initialization > ?? zero-initialized: echo 'struct A { long a; }; void f() { A a=A(); }'|g++ -c -o 1.o -Wall -x c++ -;objdump -dC 1.o NOT zero-initialized: echo 'struct A { long a; }; void f() { A a ; }'|g++ -c -o 1.o -Wall -x c++ -;objdump -dC 1.o It is important if one calls the default constructor or not. With one inheritance one can enforce the zero-initialization by explicitly calling the inner `_CpuidInfo` default-constructor from the outer class `CpuidInfo`. > Maybe methods on a struct weren't so bad after all. I'd like to hear other views on this code change. (Just to be clear 2 reviews are needed for hotspot code changes anyway.) struct vs. class does not matter here. I am just mixing two refactorizations together. I can remove the zero-initialization (I can submit it as a separate merge request but then I do not need that part myself). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16093#discussion_r1369961265 From azafari at openjdk.org Tue Oct 24 10:48:01 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 24 Oct 2023 10:48:01 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v7] In-Reply-To: References: Message-ID: > The `find` method now is > ```C++ > template > int find(T* token, bool f(T*, E)) const { > ... > > Any other functions which use this are also changed. > Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge remote-tracking branch 'upstream/master' into _8314502 - first arg of `find` casted to `uint*` - Merge branch 'master' into _8314502 - changed the `E` param of find methods to `const E&`. - find_from_end and its caller are also updated. - 8314502: Change the comparator taking version of GrowableArray::find to be a template method - 8314502: GrowableArray: Make find with comparator take template ------------- Changes: https://git.openjdk.org/jdk/pull/15418/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=06 Stats: 23 lines in 10 files changed: 2 ins; 1 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/15418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15418/head:pull/15418 PR: https://git.openjdk.org/jdk/pull/15418 From azafari at openjdk.org Tue Oct 24 10:53:37 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 24 Oct 2023 10:53:37 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v6] In-Reply-To: References: <-ALHHMcYPfciG6g2sOT-XIEVTf1pA6XXa93eNXQamD4=.88329bf9-627b-4d78-93fc-299550fc2be0@github.com> <7kjUHAm2miHLca5yBML-XS86qncel6Bwne7gLbDitZI=.ed191be2-d263-4fed-807a-55489e86c0db@github.com> Message-ID: On Tue, 24 Oct 2023 09:37:56 GMT, David Holmes wrote: > > It is up to date with master and nothing to push. > > That doesn't seem possible since you last merged with master on September 26. Now it is up to date after sync'ing my fork. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1776975758 From jsjolen at openjdk.org Tue Oct 24 11:38:38 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 24 Oct 2023 11:38:38 GMT Subject: RFR: 8314644: Change "Rvalue references and move semantics" into an accepted feature [v2] In-Reply-To: <1umNl-OT5s7nHhWAsg1mUXwg6kPc_kOF_NQZmn6j8ik=.b46d688e-fc9e-40c4-8064-34d47da49d37@github.com> References: <1umNl-OT5s7nHhWAsg1mUXwg6kPc_kOF_NQZmn6j8ik=.b46d688e-fc9e-40c4-8064-34d47da49d37@github.com> Message-ID: On Tue, 24 Oct 2023 10:37:43 GMT, Johan Sj?len wrote: >> Hi, >> >> I'd like to propose that rvalue references and move semantics are now considered permitted in the style guide. This change would allow for move constructors to be written. This enables more performant code, if the move ctr is less expensive than the copy ctr, but also more correct code. For the latter part, look at "8314571: GrowableArray should move its old data and not copy it". Here we can avoid using copy assignment, instead using move constructors, which more accurately reflects what is happening: The old elements are in fact moved, and not copied. >> >> Two useful std functions will become available to us with this change: >> >> 1. `std::move`, for explicitly moving a value. This is a slightly more powerful `static_cast(T)`, in that it also handles `T&` corectly. >> 2. `std::forward`, which simplifies the usage of perfect forwarding. Perfect forwarding is a technique where in copying is minimized. To quote Scott Meyers ( https://cppandbeyond.com/2011/04/25/session-announcement-adventures-in-perfect-forwarding/ ): >> >>> Perfecting forwarding is an important C++0x technique built atop rvalue references. It allows move semantics to be automatically applied, even when the source and the destination of a move are separated by intervening function calls. Common examples include constructors and setter functions that forward arguments they receive to the data members of the class they are initializing or setting, as well as standard library functions like make_shared, which ?perfect-forwards? its arguments to the class constructor of whatever object the to-be-created shared_ptr is to point to. >> >> Looking forward to your feedback, thank you. >> Johan > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Add a (admittedly clunky) single asterisk > - Expand on the feature Hi, I've expanded the text on this a bit. I'm basically attempting to say that the usage of this should be limited and I give an example. I also use "we" to refer to "HotSpot developers as a whole" and "you" to refer to "the reader of this document." The latter is previously established as a style choice, but the former is not AFAICS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15386#issuecomment-1777034050 From jsjolen at openjdk.org Tue Oct 24 11:51:45 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 24 Oct 2023 11:51:45 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v7] In-Reply-To: References: Message-ID: <81yK2Yxh7AVOSjVoAzZwIlriUwHRfN5s5LLowgA-34o=.1ed62ff1-d3d6-4fc1-8e3e-6ca945d86468@github.com> > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge remote-tracking branch 'upstream/master' into move-nmt - Fix stefank suggestions - Merge remote-tracking branch 'origin/master' into move-nmt - Fix messed up include - Missed this include - Merge remote-tracking branch 'origin/master' into move-nmt - Fixed reviewed changes - Move NMT to its own subdirectory ------------- Changes: https://git.openjdk.org/jdk/pull/16276/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=06 Stats: 508 lines in 102 files changed: 214 ins; 219 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/16276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16276/head:pull/16276 PR: https://git.openjdk.org/jdk/pull/16276 From jsjolen at openjdk.org Tue Oct 24 11:51:46 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 24 Oct 2023 11:51:46 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 08:41:56 GMT, Stefan Karlsson wrote: > > For the gtest source files I separated the includes in a consistent manner, they all look like this pattern now: > > That's not what I see in the latest patch. Could you revert that separation and then we can consider that style change in a separate RFE? Sure, this could be done for all tests (not only NMT). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16276#issuecomment-1777053794 From mdoerr at openjdk.org Tue Oct 24 12:24:42 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 24 Oct 2023 12:24:42 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 02:47:55 GMT, Dean Long wrote: > The patch fails for me even though the locks are correct, because kptr->obj()->print_string() is not null-terminated. The patch needs to use strncmp or maybe something like kptr->obj()->klass()->name()->as_C_string(). Never mind, I have found a new solution. Thanks for looking into it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1777100373 From mdoerr at openjdk.org Tue Oct 24 12:24:46 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 24 Oct 2023 12:24:46 GMT Subject: RFR: 8316746: Top of lock-stack does not match the unlocked object [v3] In-Reply-To: References: Message-ID: On Thu, 28 Sep 2023 10:38:52 GMT, Martin Doerr wrote: >> I think we need to support "Top of lock-stack does not match the unlocked object" and take the slow path. Reason: see JBS issue. >> Currently only PPC64, x86_64 and aarch64 code. I'd like to get feedback before implementing it for other platforms (needed for all platforms). > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Pass may_be_unordered information to lightweight_unlock. > - Merge remote-tracking branch 'origin' into 8316746_lock_stack > - Add x86_64 and aarch64 implementation. > - 8316746: Top of lock-stack does not match the unlocked object Closing in favor of https://github.com/openjdk/jdk/pull/16345. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15903#issuecomment-1777100688 From mli at openjdk.org Tue Oct 24 13:20:47 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Oct 2023 13:20:47 GMT Subject: RFR: 8318222: RISC-V: C2 CmpU3 In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 15:45:39 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for CmpU3 and CmpUL3? > Thanks! > > ## Test > > ### functionality > pass jtreg test: > jdk/java/lang/Long/Unsigned.java, jdk/java/lang/Integer/Unsigned.java > > ### performance > #### Long > **before**: > Benchmark (size) Mode Cnt Score Error Units > Longs.compareUnsignedDirect 500 avgt 5 1454.789 ? 129.557 ns/op > Longs.compareUnsignedIndirect 500 avgt 5 1410.146 ? 120.017 ns/op > > **after**: > Benchmark (size) Mode Cnt Score Error Units > Longs.compareUnsignedDirect 500 avgt 5 1286.129 ? 8.441 ns/op > Longs.compareUnsignedIndirect 500 avgt 5 993.490 ? 0.840 ns/op > > #### Integer > **before**: > Benchmark (size) Mode Cnt Score Error Units > Integers.compareUnsignedDirect 500 avgt 5 1611.753 ? 0.700 ns/op > Integers.compareUnsignedIndirect 500 avgt 5 1775.093 ? 1.520 ns/op > > **after**: > Benchmark (size) Mode Cnt Score Error Units > Integers.compareUnsignedDirect 500 avgt 5 1159.351 ? 0.601 ns/op > Integers.compareUnsignedIndirect 500 avgt 5 776.185 ? 0.924 ns/op Thanks @robehn @RealFYang for your revewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16314#issuecomment-1777184332 From mli at openjdk.org Tue Oct 24 13:20:48 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Oct 2023 13:20:48 GMT Subject: Integrated: 8318222: RISC-V: C2 CmpU3 In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 15:45:39 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for CmpU3 and CmpUL3? > Thanks! > > ## Test > > ### functionality > pass jtreg test: > jdk/java/lang/Long/Unsigned.java, jdk/java/lang/Integer/Unsigned.java > > ### performance > #### Long > **before**: > Benchmark (size) Mode Cnt Score Error Units > Longs.compareUnsignedDirect 500 avgt 5 1454.789 ? 129.557 ns/op > Longs.compareUnsignedIndirect 500 avgt 5 1410.146 ? 120.017 ns/op > > **after**: > Benchmark (size) Mode Cnt Score Error Units > Longs.compareUnsignedDirect 500 avgt 5 1286.129 ? 8.441 ns/op > Longs.compareUnsignedIndirect 500 avgt 5 993.490 ? 0.840 ns/op > > #### Integer > **before**: > Benchmark (size) Mode Cnt Score Error Units > Integers.compareUnsignedDirect 500 avgt 5 1611.753 ? 0.700 ns/op > Integers.compareUnsignedIndirect 500 avgt 5 1775.093 ? 1.520 ns/op > > **after**: > Benchmark (size) Mode Cnt Score Error Units > Integers.compareUnsignedDirect 500 avgt 5 1159.351 ? 0.601 ns/op > Integers.compareUnsignedIndirect 500 avgt 5 776.185 ? 0.924 ns/op This pull request has now been integrated. Changeset: f9795d0d Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/f9795d0d09a82cafb3e79ad8667e505c194d745b Stats: 67 lines in 3 files changed: 63 ins; 0 del; 4 mod 8318222: RISC-V: C2 CmpU3 8318223: RISC-V: C2 CmpUL3 Reviewed-by: rehn, fyang ------------- PR: https://git.openjdk.org/jdk/pull/16314 From mli at openjdk.org Tue Oct 24 13:20:45 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Oct 2023 13:20:45 GMT Subject: RFR: 8318222: RISC-V: C2 CmpU3 In-Reply-To: <1KMHfxdz_PQ194Psrr8eVNzF2sAvwzntIEXAeeB9VQ4=.39cc3651-ef9a-40a8-ac14-6334e859d017@github.com> References: <1KMHfxdz_PQ194Psrr8eVNzF2sAvwzntIEXAeeB9VQ4=.39cc3651-ef9a-40a8-ac14-6334e859d017@github.com> Message-ID: <0d-8p7WNMB27Tc8fbK9goeMzrlVtfffTPWTvPqnEDDM=.5bf54afd-95d6-4284-893e-22e1239b2a0e@github.com> On Tue, 24 Oct 2023 03:10:24 GMT, Fei Yang wrote: > LGTM. May I ask on which platform was the JMH tested? It's `StarFive VisionFive 2 v1.3B` > Also I think it's safer to perform some regression tests like tier1-3. Thanks for reminding, I've run the hotspot compiler tests, and related jdk tests found by `grep -nr test/jdk -we Long.compareUnsigned -we Integer.compareUnsigned` ------------- PR Comment: https://git.openjdk.org/jdk/pull/16314#issuecomment-1777179822 From stuefe at openjdk.org Tue Oct 24 13:27:42 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 24 Oct 2023 13:27:42 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v7] In-Reply-To: <8m3vJwf85Nh4LSy-ry1oM3uJEkZifxnAhunZX6hWWso=.691eb01e-8337-4bf9-a044-91d54d6956bb@github.com> References: <8m3vJwf85Nh4LSy-ry1oM3uJEkZifxnAhunZX6hWWso=.691eb01e-8337-4bf9-a044-91d54d6956bb@github.com> Message-ID: On Fri, 20 Oct 2023 06:04:53 GMT, Liming Liu wrote: >> src/hotspot/share/runtime/os.cpp line 2108: >> >>> 2106: // granularity, so we can touch anywhere in a page. Touch at the >>> 2107: // beginning of each page to simplify iteration. >>> 2108: void* first = align_down(start, page_size); >> >> minor nit, since you are touching this, could you make it const too? (void* const) > > Touch needs a write anyway, and all related functions also do not use const here. So I would not add const for it. I was talking about void* const, not const void*. But nevermind, its not important. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1370151316 From dnsimon at openjdk.org Tue Oct 24 13:34:51 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Oct 2023 13:34:51 GMT Subject: RFR: 8317689: [JVMCI] include error message when CreateJavaVM in libgraal fails [v2] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 21:11:41 GMT, Doug Simon wrote: >> Creating a new libgraal isolate can fail for a number of reasons. Currently, all that one sees on such a failure is a numeric error code. For example: >> >> >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024) >> >> >> Native Image has been [enhanced](https://github.com/oracle/graal/blob/14ca57efd35941a3b60c6224285ad8153f77059c/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jni/functions/JNIInvocationInterface.java#L209-L214) to return an error message along with an error code by a non-standard `_createvm_errorstr` argument passed to the `CreateJavaVM` JNI invocation interface function: >> >> >> |--------------------|-----------------------------------------------------------------------------------| >> | _createvm_errorstr | extraInfo is a "const char**" value. | >> | | If CreateJavaVM returns non-zero, then extraInfo is assigned a newly malloc'ed | >> | | 0-terminated C string describing the error if a description is available, | >> | | otherwise extraInfo is set to null. | >> |--------------------|-----------------------------------------------------------------------------------| >> >> >> This PR updates JVMCI to take advantage of this Native Image enhancement. >> >> This is sample `-XX:+PrintCompilation` output from testing this PR on libgraal: >> >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) >> 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024, Image page size is incompatible with run-time page size. Rebuild image with -H:PageSize=[pagesize] to set appropriately.) > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > renamed _strerror to _createvm_errorstr Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16086#issuecomment-1777212793 From dnsimon at openjdk.org Tue Oct 24 13:34:53 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Oct 2023 13:34:53 GMT Subject: Integrated: 8317689: [JVMCI] include error message when CreateJavaVM in libgraal fails In-Reply-To: References: Message-ID: On Fri, 6 Oct 2023 22:25:48 GMT, Doug Simon wrote: > Creating a new libgraal isolate can fail for a number of reasons. Currently, all that one sees on such a failure is a numeric error code. For example: > > > 2096 20291 4 java.lang.CharacterData::of (136 bytes) > 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024) > > > Native Image has been [enhanced](https://github.com/oracle/graal/blob/14ca57efd35941a3b60c6224285ad8153f77059c/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jni/functions/JNIInvocationInterface.java#L209-L214) to return an error message along with an error code by a non-standard `_createvm_errorstr` argument passed to the `CreateJavaVM` JNI invocation interface function: > > > |--------------------|-----------------------------------------------------------------------------------| > | _createvm_errorstr | extraInfo is a "const char**" value. | > | | If CreateJavaVM returns non-zero, then extraInfo is assigned a newly malloc'ed | > | | 0-terminated C string describing the error if a description is available, | > | | otherwise extraInfo is set to null. | > |--------------------|-----------------------------------------------------------------------------------| > > > This PR updates JVMCI to take advantage of this Native Image enhancement. > > This is sample `-XX:+PrintCompilation` output from testing this PR on libgraal: > > 2096 20291 4 java.lang.CharacterData::of (136 bytes) > 2096 20291 4 java.lang.CharacterData::of (136 bytes) COMPILE SKIPPED: Error attaching to libjvmci (err: -1000000024, Image page size is incompatible with run-time page size. Rebuild image with -H:PageSize=[pagesize] to set appropriately.) This pull request has now been integrated. Changeset: 8879c78d Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/8879c78d62e3c1f325def56d131f62c479bfdaa9 Stats: 33 lines in 6 files changed: 20 ins; 0 del; 13 mod 8317689: [JVMCI] include error message when CreateJavaVM in libgraal fails Reviewed-by: phofer, thartmann, never ------------- PR: https://git.openjdk.org/jdk/pull/16086 From stuefe at openjdk.org Tue Oct 24 13:54:46 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 24 Oct 2023 13:54:46 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v7] In-Reply-To: <-p-qDbsbPtk_7h79043ct8sTCcM6d-TiD62zh4c7Q0Q=.9691fe3d-0a85-4f6f-a876-9761a240f866@github.com> References: <-p-qDbsbPtk_7h79043ct8sTCcM6d-TiD62zh4c7Q0Q=.9691fe3d-0a85-4f6f-a876-9761a240f866@github.com> Message-ID: On Fri, 20 Oct 2023 05:54:06 GMT, Liming Liu wrote: >> As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). >> >> Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: >> >> >> >> >> >> >> >> >> >> >> >>
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
> > Liming Liu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Make the jtreg test check the usage of THP I am still a bit worried about concurrent usage of the touched memory. Not sure what to do there other than testing. Other than that, the patch is mechanically fine. src/hotspot/share/runtime/os.cpp line 2119: > 2117: void os::pretouch_memory_common(void* first, void* last, size_t page_size) { > 2118: assert(is_aligned(first, page_size), "pointer " PTR_FORMAT " is not page-aligned by %zu", p2i(first), page_size); > 2119: assert(is_aligned(last, page_size), "pointer " PTR_FORMAT " is not page-aligned by %zu", p2i(last), page_size); New assertions, right? Affects all platforms, not only Linux. If possible, please don't change behavior on other platforms. Have you checked all users of this function? AFAICT calling this with unaligned pointers would have been possible and would have worked. test/hotspot/jtreg/runtime/os/TestTransparentHugePageUsage.java line 37: > 35: * -Xms24G -Xmx24G -XX:+AlwaysPreTouch > 36: * runtime.os.TestTransparentHugePageUsage > 37: */ Requiring the test to need 24G is a lot... why does it need to be so large? What does the test test? The problem was that pre-touching the memory would allocate small pages, and then later khugepaged would fold them into large pages at its own leisure. Your patch prevents that, so now huge pages form faster? So, the success of the patch can be described by timing? test/hotspot/jtreg/runtime/os/TestTransparentHugePageUsage.java line 74: > 72: // cover all cases considered to be failures, but we can > 73: // just say the non-usage of THP failes for sure. > 74: System.exit(1); Please throw RuntimeException for a test error. ------------- PR Review: https://git.openjdk.org/jdk/pull/15781#pullrequestreview-1694848326 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1370156676 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1370191445 PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1370161355 From shade at openjdk.org Tue Oct 24 14:39:40 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 24 Oct 2023 14:39:40 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v4] In-Reply-To: <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> Message-ID: On Mon, 23 Oct 2023 14:17:58 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Use public methods instead of Platform* If there are no other comments/complaints, I am going to integrate this soon. Last call! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1777359607 From jvernee at openjdk.org Tue Oct 24 15:09:57 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 24 Oct 2023 15:09:57 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - a -> an - add note to downcallHandle about passing heap segments by-reference ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16201/files - new: https://git.openjdk.org/jdk/pull/16201/files/2e00beff..bf850299 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=10-11 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From mli at openjdk.org Tue Oct 24 15:29:04 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Oct 2023 15:29:04 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL Message-ID: Hi, Can you review the change to add intrinsic for UDivI and UDivL? Thanks! ## Tests ### Functionality Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` ### Performance #### Long ** Before ** LongDivMod.testDivideUnsigned 1024 mixed avgt 2 19852.277 ns/op LongDivMod.testDivideUnsigned 1024 positive avgt 2 29155.681 ns/op LongDivMod.testDivideUnsigned 1024 negative avgt 2 6385.280 ns/op ** After ** LongDivMod.testDivideUnsigned 1024 mixed avgt 2 11776.806 ns/op LongDivMod.testDivideUnsigned 1024 positive avgt 2 16101.940 ns/op LongDivMod.testDivideUnsigned 1024 negative avgt 2 6433.223 ns/op #### Integer ** Before ** IntegerDivMod.testDivideUnsigned 1024 mixed avgt 2 23498.570 ns/op IntegerDivMod.testDivideUnsigned 1024 positive avgt 2 16875.614 ns/op IntegerDivMod.testDivideUnsigned 1024 negative avgt 2 30310.243 ns/op ** After ** IntegerDivMod.testDivideUnsigned 1024 mixed avgt 2 23327.997 ns/op IntegerDivMod.testDivideUnsigned 1024 positive avgt 2 16708.209 ns/op IntegerDivMod.testDivideUnsigned 1024 negative avgt 2 30162.153 ns/op ------------- Commit messages: - Modify tests to run for riscv64 - reuse riscv_enc_divuw and corrected_idiv - space and comments - Initial commit Changes: https://git.openjdk.org/jdk/pull/16346/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16346&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318723 Stats: 74 lines in 5 files changed: 62 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/16346.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16346/head:pull/16346 PR: https://git.openjdk.org/jdk/pull/16346 From psandoz at openjdk.org Tue Oct 24 15:31:12 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 24 Oct 2023 15:31:12 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v55] In-Reply-To: <6oWO6eKsZasmfvGT45D6hWW1ioDM1Qwej3KqSUoOxLM=.d30979bc-478f-4ce8-905c-16f91f4541d8@github.com> References: <6oWO6eKsZasmfvGT45D6hWW1ioDM1Qwej3KqSUoOxLM=.d30979bc-478f-4ce8-905c-16f91f4541d8@github.com> Message-ID: On Wed, 13 Sep 2023 14:14:43 GMT, Roman Kennke wrote: >> There's gtest a failure in the GHA run: >> >> [ RUN ] arrayOopDesc.double_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:51: Failure >> check_max_length_overflow(T_DOUBLE) evaluates to false, where >> T_DOUBLE evaluates to  >> >> [ FAILED ] arrayOopDesc.double_vm (0 ms) >> [ RUN ] arrayOopDesc.byte_vm >> [ OK ] arrayOopDesc.byte_vm (0 ms) >> [ RUN ] arrayOopDesc.short_vm >> [ OK ] arrayOopDesc.short_vm (0 ms) >> [ RUN ] arrayOopDesc.int_vm >> [ OK ] arrayOopDesc.int_vm (0 ms) >> [ RUN ] arrayOopDesc.long_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:67: Failure >> check_max_length_overflow(T_LONG) evaluates to false, where >> T_LONG evaluates to >> >> >> [ FAILED ] arrayOopDesc.long_vm (0 ms) > >> There's gtest a failure in the GHA run: >> >> ``` >> [ RUN ] arrayOopDesc.double_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:51: Failure >> check_max_length_overflow(T_DOUBLE) evaluates to false, where >> T_DOUBLE evaluates to ? >> >> [ FAILED ] arrayOopDesc.double_vm (0 ms) >> [ RUN ] arrayOopDesc.byte_vm >> [ OK ] arrayOopDesc.byte_vm (0 ms) >> [ RUN ] arrayOopDesc.short_vm >> [ OK ] arrayOopDesc.short_vm (0 ms) >> [ RUN ] arrayOopDesc.int_vm >> [ OK ] arrayOopDesc.int_vm (0 ms) >> [ RUN ] arrayOopDesc.long_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:67: Failure >> check_max_length_overflow(T_LONG) evaluates to false, where >> T_LONG evaluates to ? >> >> [ FAILED ] arrayOopDesc.long_vm (0 ms) >> ``` > > Aww, this max_array_length() method and 32bit builds. :-/ > We should re-write this method altogether and special-case it for !_LP64 and maybe simply make it a switch on the incoming type, with hard-coded values. This might be easier to understand than getting the logic absolutely right. Also, with this change, and even more so with upcoming Lilliput changes, this method is a little too conservative and we could offer somewhat increased array lengths. Alternatively, we could do what the comments suggests and fix up all the uses of the method to use sensible types (size_t?) and make it simple and obvious. @rkennke apologies for the delay. I spent some time yesterday pondering what to do. Since Lilliput will change array objects to be aligned on < 8 byte boundaries, then I don?t think we should in anyway rely on stability say at 4 byte boundaries for `byte[]`. We have to generally assume it could further reduce down to a smaller value e.g., 2 or even 1. In that respect I generally recommend for Lilliput we do the following (some of which is perhaps broader than this issue): 1. Unsafe -Document limitations e.g., in JEP and/or release note. 2. `ByteBuffer::alignedSlice` and `ByteBuffer::alignmentOffset` -Deprecate, not for removal, explaining limitations and referring to use of direct buffers, native memory segments, or heap memory segments covering `long[]` arrays. -Update implementations to replace hard coded unit size of 8 to a runtime query for `byte[]`. Can we use `Unsafe.arrayBaseOffset`? (This is the only recommended implementation update.) -Update documentation for implementation notes and for `throws UnsupportedOperationException`. 3. `MethodHandles::byteArrayViewVarHandle` and `MethodHandles::byteBufferViewVarHandle` -Deprecate, not for removal, explaining limitations and referring to use of direct buffers, native memory segments, or heap memory segments covering `long[]` arrays. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1777485823 From mcimadamore at openjdk.org Tue Oct 24 15:38:43 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 24 Oct 2023 15:38:43 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: Message-ID: <91QJMyPdnoSB8zosYyJrCv2s3-aKOItLRm7Q1pLfmko=.dce34b03-9db6-46f6-afa5-be64d4ccf1ce@github.com> On Tue, 24 Oct 2023 15:09:57 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - a -> an > - add note to downcallHandle about passing heap segments by-reference New javadoc note looks good ------------- Marked as reviewed by mcimadamore (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16201#pullrequestreview-1695254853 From jvernee at openjdk.org Tue Oct 24 16:03:01 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 24 Oct 2023 16:03:01 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v55] In-Reply-To: References: <6oWO6eKsZasmfvGT45D6hWW1ioDM1Qwej3KqSUoOxLM=.d30979bc-478f-4ce8-905c-16f91f4541d8@github.com> Message-ID: On Tue, 24 Oct 2023 15:27:38 GMT, Paul Sandoz wrote: > 2. `ByteBuffer::alignedSlice` and `ByteBuffer::alignmentOffset` > -Deprecate, not for removal, explaining limitations and referring to use of direct buffers, native memory segments, or heap memory segments covering `long[]` arrays. > -Update implementations to replace hard coded unit size of 8 to a runtime query for `byte[]`. Can we use `Unsafe.arrayBaseOffset`? (This is the only recommended implementation update.) > -Update documentation for implementation notes and for `throws UnsupportedOperationException`. The restrictions on `alignedSlice` are inherited from `alignmentOffset`. The documentation of the latter has this: * @throws UnsupportedOperationException * If the native platform does not guarantee stable alignment offset * values for the given unit size when managing the memory regions * of buffers of the same kind as this buffer (direct or * non-direct). For example, if garbage collection would result * in the moving of a memory region covered by a non-direct buffer * from one location to another and both locations have different * alignment characteristics. Since "guarantee" is used here, I think the current acceptance of `unitSize > 1` for heap buffers can be framed as a bug in the implementation. Since, there is no guarantee on the alignment beyond the natural alignment of the array elements (which for `byte` is 1). I think uniformly rejecting `unitSize > 1` for heap buffers in all configurations would be preferable, though, rather than relying on the alignment the VM chooses, which can change based on the used VM flags (?). > > 3. `MethodHandles::byteArrayViewVarHandle` and `MethodHandles::byteBufferViewVarHandle` > -Deprecate, not for removal, explaining limitations and referring to use of direct buffers, native memory segments, or heap memory segments covering `long[]` arrays. For both 2. and 3., I think deprecation is perhaps a bit too strong... `alignedSlice` and `alignmentOffset` are well-formed for off-heap buffers, and `byteArrayViewVarHandle` and `byteBufferViewVarHandle` work well with plain access modes. I think we don't want to discourage using these APIs for those well-formed use cases? Also, since we are already changing the implementation, and changing the observable behavior, maybe we should just change it to do the right thing once and for all (and update the spec accordingly)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1777543126 From rkennke at openjdk.org Tue Oct 24 16:08:58 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 24 Oct 2023 16:08:58 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v55] In-Reply-To: <6oWO6eKsZasmfvGT45D6hWW1ioDM1Qwej3KqSUoOxLM=.d30979bc-478f-4ce8-905c-16f91f4541d8@github.com> References: <6oWO6eKsZasmfvGT45D6hWW1ioDM1Qwej3KqSUoOxLM=.d30979bc-478f-4ce8-905c-16f91f4541d8@github.com> Message-ID: On Wed, 13 Sep 2023 14:14:43 GMT, Roman Kennke wrote: >> There's gtest a failure in the GHA run: >> >> [ RUN ] arrayOopDesc.double_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:51: Failure >> check_max_length_overflow(T_DOUBLE) evaluates to false, where >> T_DOUBLE evaluates to  >> >> [ FAILED ] arrayOopDesc.double_vm (0 ms) >> [ RUN ] arrayOopDesc.byte_vm >> [ OK ] arrayOopDesc.byte_vm (0 ms) >> [ RUN ] arrayOopDesc.short_vm >> [ OK ] arrayOopDesc.short_vm (0 ms) >> [ RUN ] arrayOopDesc.int_vm >> [ OK ] arrayOopDesc.int_vm (0 ms) >> [ RUN ] arrayOopDesc.long_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:67: Failure >> check_max_length_overflow(T_LONG) evaluates to false, where >> T_LONG evaluates to >> >> >> [ FAILED ] arrayOopDesc.long_vm (0 ms) > >> There's gtest a failure in the GHA run: >> >> ``` >> [ RUN ] arrayOopDesc.double_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:51: Failure >> check_max_length_overflow(T_DOUBLE) evaluates to false, where >> T_DOUBLE evaluates to ? >> >> [ FAILED ] arrayOopDesc.double_vm (0 ms) >> [ RUN ] arrayOopDesc.byte_vm >> [ OK ] arrayOopDesc.byte_vm (0 ms) >> [ RUN ] arrayOopDesc.short_vm >> [ OK ] arrayOopDesc.short_vm (0 ms) >> [ RUN ] arrayOopDesc.int_vm >> [ OK ] arrayOopDesc.int_vm (0 ms) >> [ RUN ] arrayOopDesc.long_vm >> /home/runner/work/jdk/jdk/test/hotspot/gtest/oops/test_arrayOop.cpp:67: Failure >> check_max_length_overflow(T_LONG) evaluates to false, where >> T_LONG evaluates to ? >> >> [ FAILED ] arrayOopDesc.long_vm (0 ms) >> ``` > > Aww, this max_array_length() method and 32bit builds. :-/ > We should re-write this method altogether and special-case it for !_LP64 and maybe simply make it a switch on the incoming type, with hard-coded values. This might be easier to understand than getting the logic absolutely right. Also, with this change, and even more so with upcoming Lilliput changes, this method is a little too conservative and we could offer somewhat increased array lengths. Alternatively, we could do what the comments suggests and fix up all the uses of the method to use sensible types (size_t?) and make it simple and obvious. > @rkennke apologies for the delay. I spent some time yesterday pondering what to do. > > Since Lilliput will change array objects to be aligned on < 8 byte boundaries, then I don?t think we should in anyway rely on stability say at 4 byte boundaries for `byte[]`. We have to generally assume it could further reduce down to a smaller value e.g., 2 or even 1. My current intention is to maintain stability at 4 byte boundaries. (In-fact, when object header are down to 4 bytes, and we still use 4 bytes for length, we will be back to 8-byte boundary for arrays.) I am not sure we can realistically reduce header sizes even more, tbh. > In that respect I generally recommend for Lilliput we do the following (some of which is perhaps broader than this issue): > > 1. Unsafe > -Document limitations e.g., in JEP and/or release note. Ok. > 2. `ByteBuffer::alignedSlice` and `ByteBuffer::alignmentOffset` > -Deprecate, not for removal, explaining limitations and referring to use of direct buffers, native memory segments, or heap memory segments covering `long[]` arrays. Why should we deprecate those methods? See also next point. > -Update implementations to replace hard coded unit size of 8 to a runtime query for `byte[]`. Can we use `Unsafe.arrayBaseOffset`? (This is the only recommended implementation update.) I believe that ByteBuffer et al already do use Unsafe.arrayBaseOffset(). I noted elsewhere that all relevant Buffer and VarHandles tests which test alignmentOffset() and alignedSlice() are passing with this change, and also with the greater Lilliput updates, because they already do the right thing, afaict. > -Update documentation for implementation notes and for `throws UnsupportedOperationException`. Ok. > 3. `MethodHandles::byteArrayViewVarHandle` and `MethodHandles::byteBufferViewVarHandle` > -Deprecate, not for removal, explaining limitations and referring to use of direct buffers, native memory segments, or heap memory segments covering `long[]` arrays. OK, but see above. BTW, another compatibility-related issue might be JVMCI/Graal. Graal JIT would need some updates for the changed object/array layout. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1777559251 From lmesnik at openjdk.org Tue Oct 24 17:04:41 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 24 Oct 2023 17:04:41 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v6] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 07:49:30 GMT, Leo Korinth wrote: >> This pull request renames `createJavaProcessBuilder` to `createLimitedTestJavaProcessBuilder` and renames `createTestJvm` to `createTestJavaProcessBuilder`. Both are implemented through a private `createJavaProcessBuilder`. It also updates the java doc. >> >> This is so that it should be harder to by mistake use `createLimitedTestJavaProcessBuilder` that is problematic because it will not forward JVM flags to the tested JVM. > > Leo Korinth has updated the pull request incrementally with six additional commits since the last revision: > > - static import > - copyright > - whitespace > - whitespace > - sed > - fix test/lib/jdk/test/lib/process/ProcessTools.java Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15452#pullrequestreview-1695447658 From psandoz at openjdk.org Tue Oct 24 17:40:58 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 24 Oct 2023 17:40:58 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v61] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 10:35:29 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge branch 'master' into JDK-8139457 > - Fix ARM build > - Merge remote-tracking branch 'upstream/master' into JDK-8139457 > - Various cleanups > - RISC changes > - Move gap init into allocate_header() (x86) > - Fix gtest failure on x86 > - Merge remote-tracking branch 'upstream/master' into JDK-8139457 > - Fix comments > - Fix inconsistencies in argument naming C1_MacroAssembler::allocate_array() > - ... and 80 more: https://git.openjdk.org/jdk/compare/9bfa0829...7eaca124 You are correct that the access methods function correctly (and with a tweak so can the align* methods when Lilliput is enabled). However, the change in object array layout can result in exceptions that previously would never occur and that could break existing code. The likelihood is the four methods referenced above are used for exotic atomic access, since plain accesses are already supported using direct array access or using methods on buffer. It's hard to know what percentages of those exotic accesses operate on `byte[]` and further will break when Lilliput is enabled. So out of an abundance of caution I suggested we deprecate (not for removal), on the presumption that existing code can keep running if Lilliput is disabled or object/array alignment is explicitly configured (via a VM flag). Unfortunately the deprecation signal is wider than we would like, since the methods function correctly for direct buffers (and for plain accesses, although i think that is less important). However, I argue it gives an opportunity to explain the situation and to point to better alternatives in a way that is much more visible than `@see` and release notes (which i think we should do even if we don't use the deprecate signal). Jorn, you proposal to uniformly reject for `unitSize > 1` for methods `alignedSlice` and `alignmentOffset` gives no workaround for the user whose code is now broken. It's hard to assess the impact. In these known unknown cases i often err on the side of caution. One approach we could take is to uniformly reject only when Lilliput is enabled. I suggest to follow up on JVMCI as a separate issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1777714766 From jvernee at openjdk.org Tue Oct 24 17:47:55 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 24 Oct 2023 17:47:55 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v61] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 17:37:28 GMT, Paul Sandoz wrote: > Jorn, you proposal to uniformly reject for `unitSize > 1` for methods `alignedSlice` and `alignmentOffset` gives no workaround for the user whose code is now broken. It's hard to assess the impact. In these known unknown cases i often err on the side of caution. One approach we could take is to uniformly reject only when Lilliput is enabled. Ok, I think I see where you're coming from now. If we change the array element alignment, we currently only break the users code in a subset of use cases: only when Lilliput or `-XX:-UseCompressedClassPointers` is used. So, let's not break the other users, but let's warn them of the dangers through the deprecation. I think that is a fair way of going about this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1777726878 From psandoz at openjdk.org Tue Oct 24 17:54:08 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 24 Oct 2023 17:54:08 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v61] In-Reply-To: References: Message-ID: <3bwqPhGMR5ZaPhQOCtxpzHfXFKiodI8tgJyKVn0Mcgo=.540de0dd-23e2-46b6-8dc6-b6718c626218@github.com> On Tue, 24 Oct 2023 17:45:12 GMT, Jorn Vernee wrote: > > Jorn, you proposal to uniformly reject for `unitSize > 1` for methods `alignedSlice` and `alignmentOffset` gives no workaround for the user whose code is now broken. It's hard to assess the impact. In these known unknown cases i often err on the side of caution. One approach we could take is to uniformly reject only when Lilliput is enabled. > > Ok, I think I see where you're coming from now. If we change the array element alignment, we currently only break the users code in a subset of use cases: only when Lilliput or `-XX:-UseCompressedClassPointers` is used. So, let's not break the other users, but let's warn them of the dangers through the deprecation. > Yes. We could appeal Joe (Mr. CSR) on whether it is an appropriate use of deprecation or not, his review of the CSR will be valuable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1777736045 From rriggs at openjdk.org Tue Oct 24 19:42:43 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Tue, 24 Oct 2023 19:42:43 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v6] In-Reply-To: References: Message-ID: <6qyfRp98A2DA7Q7YhscmdmokkIvVVn9HxB_XjRdM47g=.615d7a65-e109-41db-b955-31c0a934debd@github.com> On Tue, 24 Oct 2023 07:49:30 GMT, Leo Korinth wrote: >> This pull request renames `createJavaProcessBuilder` to `createLimitedTestJavaProcessBuilder` and renames `createTestJvm` to `createTestJavaProcessBuilder`. Both are implemented through a private `createJavaProcessBuilder`. It also updates the java doc. >> >> This is so that it should be harder to by mistake use `createLimitedTestJavaProcessBuilder` that is problematic because it will not forward JVM flags to the tested JVM. > > Leo Korinth has updated the pull request incrementally with six additional commits since the last revision: > > - static import > - copyright > - whitespace > - whitespace > - sed > - fix test/lib/jdk/test/lib/process/ProcessTools.java test/lib/jdk/test/lib/process/ProcessTools.java line 506: > 504: */ > 505: public static ProcessBuilder createTestJavaProcessBuilder(List command) { > 506: return createTestJavaProcessBuilder(command.toArray(String[]::new)); The javadoc shoul d describe all of the options being added to the ProcessBuilder. They were inadequated described previously and still are. The other options (seem to be from the code), test.noclasspath, java.class.path, and test.thread.factory. The description of test.thread.factory and addTestThreadFactoryArgs method seems inadequately described. test/lib/jdk/test/lib/process/ProcessTools.java line 527: > 525: * Create ProcessBuilder using the java launcher from the jdk to > 526: * be tested. > 527: * As above, should described the limited options that are added to the ProcessBuilder, the same as for `reateTestJavaProcessBuilder(...)` test/lib/jdk/test/lib/process/ProcessTools.java line 549: > 547: * Create ProcessBuilder using the java launcher from the jdk to > 548: * be tested. > 549: * As above, should described the limited options that are added to the ProcessBuilder, the same as for reateTestJavaProcessBuilder(...) test/lib/jdk/test/lib/process/ProcessTools.java line 599: > 597: */ > 598: public static OutputAnalyzer executeTestJvm(String... cmds) throws Exception { > 599: ProcessBuilder pb = createTestJavaProcessBuilder(cmds); This should also describe *all* of the options being set in the ProcessBuilder before executing the process. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15452#discussion_r1370728371 PR Review Comment: https://git.openjdk.org/jdk/pull/15452#discussion_r1370729609 PR Review Comment: https://git.openjdk.org/jdk/pull/15452#discussion_r1370729925 PR Review Comment: https://git.openjdk.org/jdk/pull/15452#discussion_r1370730637 From xgong at openjdk.org Wed Oct 25 01:27:38 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 25 Oct 2023 01:27:38 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: <94xEhtV9YjxUS5QN2oHOWCzwhFaKi05PO9o3Y5tieDI=.ecd425b8-a7c3-4c4a-9e7b-1ae099b92b52@github.com> References: <94xEhtV9YjxUS5QN2oHOWCzwhFaKi05PO9o3Y5tieDI=.ecd425b8-a7c3-4c4a-9e7b-1ae099b92b52@github.com> Message-ID: On Tue, 24 Oct 2023 09:31:13 GMT, David Holmes wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Disable sleef by default >> - Merge 'jdk:master' into JDK-8312425 >> - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8496: > >> 8494: // Get sleef stub routine addresses >> 8495: char ebuf[1024]; >> 8496: void* libsleef = os::dll_load(UseSleefLib, ebuf, sizeof ebuf); > > Shouldn't this check that UseSleefLib has been set to something other than "" ? (To save the failing `dll_load` call.) Yeah, it's better to do that. Currently it returns "nullptr" without any errors. But I agree that having a pre-check is better. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1370993053 From dholmes at openjdk.org Wed Oct 25 04:01:41 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 25 Oct 2023 04:01:41 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v7] In-Reply-To: References: Message-ID: <12Jiijy5Y9tn-_eHyE01oDMVUomgXbEiuEgDPgY8GU0=.af4948be-6026-4b35-bcd1-b281855e3ede@github.com> On Tue, 24 Oct 2023 10:48:01 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge remote-tracking branch 'upstream/master' into _8314502 > - first arg of `find` casted to `uint*` > - Merge branch 'master' into _8314502 > - changed the `E` param of find methods to `const E&`. > - find_from_end and its caller are also updated. > - 8314502: Change the comparator taking version of GrowableArray::find to be a template method > - 8314502: GrowableArray: Make find with comparator take template src/hotspot/share/utilities/growableArray.hpp line 213: > 211: > 212: template > 213: int find(T* token, bool f(T*, const E&)) const { What is the advantage of a const reference here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1371111621 From qamai at openjdk.org Wed Oct 25 04:09:39 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 25 Oct 2023 04:09:39 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v7] In-Reply-To: <12Jiijy5Y9tn-_eHyE01oDMVUomgXbEiuEgDPgY8GU0=.af4948be-6026-4b35-bcd1-b281855e3ede@github.com> References: <12Jiijy5Y9tn-_eHyE01oDMVUomgXbEiuEgDPgY8GU0=.af4948be-6026-4b35-bcd1-b281855e3ede@github.com> Message-ID: On Wed, 25 Oct 2023 03:59:06 GMT, David Holmes wrote: >> Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Merge remote-tracking branch 'upstream/master' into _8314502 >> - first arg of `find` casted to `uint*` >> - Merge branch 'master' into _8314502 >> - changed the `E` param of find methods to `const E&`. >> - find_from_end and its caller are also updated. >> - 8314502: Change the comparator taking version of GrowableArray::find to be a template method >> - 8314502: GrowableArray: Make find with comparator take template > > src/hotspot/share/utilities/growableArray.hpp line 213: > >> 211: >> 212: template >> 213: int find(T* token, bool f(T*, const E&)) const { > > What is the advantage of a const reference here? You can bind a non-const reference to a const one but not the other way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1371115254 From dholmes at openjdk.org Wed Oct 25 04:11:40 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 25 Oct 2023 04:11:40 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v4] In-Reply-To: <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> Message-ID: On Mon, 23 Oct 2023 14:17:58 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Use public methods instead of Platform* Okay. I will file a RFE to make `supports_cx8` a requirement and open a discussion about that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1778474046 From jwaters at openjdk.org Wed Oct 25 05:57:38 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 25 Oct 2023 05:57:38 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v3] In-Reply-To: References: Message-ID: <8YiWcWXdNa0KjJ1HpSMIF-MwPtEAJhWxh194DWAjCT0=.b998c8a3-ae11-428e-a66a-e419bedb1538@github.com> On Tue, 24 Oct 2023 09:46:51 GMT, Julian Waters wrote: >> On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. >> >> The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. >> >> Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method >> >> All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. >> >> This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into noreturn > - Minor Style Change in os_windows.cpp > - 8304939 Bumping, anyone? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1778562612 From fyang at openjdk.org Wed Oct 25 07:01:36 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Oct 2023 07:01:36 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 15:21:08 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UDivI and UDivL? > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` > > ### Performance > NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. > > #### Long > ** Before ** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 2 19852.277 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 2 29155.681 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 2 6385.280 ns/op > > > ** After ** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 2 11776.806 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 2 16101.940 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 2 6433.223 ns/op > > > #### Integer > ** Before ** > > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 2 23498.570 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 2 16875.614 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 2 30310.243 ns/op > > > ** After ** > > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 2 23327.997 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 2 16708.209 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 2 30162.153 ns/op src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2436: > 2434: } else { > 2435: Label Lltz, Ldone; > 2436: bltz(rs2, Lltz); I am not quite sure what this `bltz` branch is for. Is this a minor performance tunning here? And How would this make a difference then if that's true? I didn't see much difference from the LongDivMod.testDivideUnsigned `negative` jmh test result. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1371241844 From david.holmes at oracle.com Wed Oct 25 07:12:47 2023 From: david.holmes at oracle.com (David Holmes) Date: Wed, 25 Oct 2023 17:12:47 +1000 Subject: RFC: 8318776: Require supports_cx8 to always be true Message-ID: <517d7c08-e9a6-481a-89e5-2533c5d41724@oracle.com> From https://bugs.openjdk.org/browse/JDK-8318776 Regardless of platform size (32-bit or 64-bit) the Java language has always required that the underlying platform (or the VM) provides a means of performing atomic load and store of 64-bit values, for volatile long and double support. Since Java 5 the java.util.concurrent.atomic package introduced APIs that provide a range of atomic operations, the most fundamental being a compare-and-swap (CAS), also known as a compare-exchange, out of which other atomic operations can be constructed if there is no direct platform support. This capability was later extended to the VarHandle API as well. While all platforms needed a mechanism for 64-bit load and store, not all platforms support a 64-bit CAS, internally known as cmpxchg8. To address that the supports_cx8 flag was introduced so that on platforms without cmpxchg8 native support, it could be emulated via other techniques e.g. locking. (Note this is not without its own issues as all accesses to the field must be done in a way that is consistent with the use of locking by cmpxchg8 - word-tearing is a real risk). Internal to the VM we also have use of lock-free algorithms and atomic operations, with the latter defined via atomic.hpp. Originally in that code we needed to check supports_cx8 for platforms without 64-bit support, but in practice we tended to avoid using 64-bit fields in such cases so we could avoid the complexity of introducing lock-based emulation. Unfortunately, when the atomic interface in the VM was templatized and redesigned, it appears that the fact cmpxchg8 may not be available was overlooked and supports_cx8 is not consulted. Consequently if someone introduced an atomic operation on a 64-bit field they would get a linkage error on platforms without cmpxchg8 - so again if this happened we tended to back away from using a 64-bit field. Along the way the access API in the VM was introduced, which also provided atomic ops on oops and did consult supports_cx8 with a lock-based fallback. We have now reached a point where there are cases where we do want 64-bit atomic operations but we don't want the complexity of dealing with platforms that don't support it. So we want to require that supports_cx8 always be assumed true (the VM could abort at runtime if run on a platform where it is not true) and we can then proceed with 64-bit atomics in the VM and also remove all the lock-based fallbacks in the access API and in the Java APIs. The OpenJDK has limited support for 32-bit platforms these days: PPC32 was dropped a long time ago; Windows 32-bit is now a deprecated port (but supports cmpxchg8 anyway); leaving only ARM32 as a platform of potential concern. But even then we support cmpxchg8 in all known modern implementations, as described in os_cpu/linux_arm/atomic_linux_arm.hpp: /* * Atomic long operations on 32-bit ARM * ARM v7 supports LDREXD/STREXD synchronization instructions so no problem. * ARM < v7 does not have explicit 64 atomic load/store capability. * However, gcc emits LDRD/STRD instructions on v5te and LDM/STM on v5t * when loading/storing 64 bits. * For non-MP machines (which is all we support for ARM < v7) * under current Linux distros these instructions appear atomic. * See section A3.5.3 of ARM Architecture Reference Manual for ARM v7. * Also, for cmpxchg64, if ARM < v7 we check for cmpxchg64 support in the * Linux kernel using _kuser_helper_version. See entry-armv.S in the Linux * kernel source or kernel_user_helpers.txt in Linux Doc. */ So the practical reality is that we do not expect to encounter any mainstream OpenJDK platform where we don't in fact have support for cmpxchg8. ------- Before I proceed with this does anyone have any strong and reasonable objections? Is there some platform support aspect that has been overlooked? Note the JDK part could be (probably should be) done as a follow up RFE to simplify the review and approval process. Thanks, David From duke at openjdk.org Wed Oct 25 07:22:14 2023 From: duke at openjdk.org (Liming Liu) Date: Wed, 25 Oct 2023 07:22:14 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v8] In-Reply-To: References: Message-ID: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Make the test use a smaller heap and exit properly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15781/files - new: https://git.openjdk.org/jdk/pull/15781/files/ed2c9da7..d1a33373 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=06-07 Stats: 51 lines in 1 file changed: 28 ins; 20 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From stefank at openjdk.org Wed Oct 25 07:27:32 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 25 Oct 2023 07:27:32 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v3] In-Reply-To: References: <5o5B7LbCQN_C9xzd1EvrvTp04-6Atr0gih5WH69LeK4=.3a977034-8fe9-4da8-a167-f5dad3a97d75@github.com> Message-ID: On Tue, 5 Sep 2023 18:05:34 GMT, Roger Riggs wrote: >> I have created an alternative that uses enums to force the user to make a decision: https://github.com/openjdk/jdk/compare/master...lkorinth:jdk:+process_tools . Another alternative is to do the same but instead using an enum (I think it is not as good). A third alternative is to use the current pull request with a better name. >> >> What do you prefer? Do you have a better alternative? Do someone still think the current code is good? I think what we have today is inferior to all these improvements, and I would like to make it harder to develop bad test cases. > >> What do you prefer? Do you have a better alternative? Do someone still think the current code is good? I think what we have today is inferior to all these improvements, and I would like to make it harder to develop bad test ca > > The current API (name) is fine and fit for purpose; it does not promise or hide extra functionality under a simple name. > > There needs to be an explicit intention in the test(s) to support after the fact that arbitrary flags can be added. > @AlanBateman's proposal for naming [above](https://github.com/openjdk/jdk/pull/15452#issuecomment-1700459277) (or similar) would capture more clearly that test options are propagated to the child process. > Every test writer should be aware that additional command line options may be mixed in. > > There are many cases in which the ProcessTools APIs are not used to create child processes and do not need to be used in writing tests. They provide some convenience but also add a dependency and another API layer to work through in the case of failures. > > As far as I'm aware, there is no general guidance or design pattern outside of hotspot tests to propagate flags or use ProcessTools. Adding that as a requirement will need a different level of communication and change. @RogerRiggs You seem to know what you want w.r.t. the extra java doc comments. Could you help write those? Could we also do that as a separate RFE? I think that would make it easier to get this PR and the javadoc update through the door. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15452#issuecomment-1778669353 From duke at openjdk.org Wed Oct 25 07:32:46 2023 From: duke at openjdk.org (Liming Liu) Date: Wed, 25 Oct 2023 07:32:46 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v7] In-Reply-To: References: <-p-qDbsbPtk_7h79043ct8sTCcM6d-TiD62zh4c7Q0Q=.9691fe3d-0a85-4f6f-a876-9761a240f866@github.com> Message-ID: On Tue, 24 Oct 2023 13:28:26 GMT, Thomas Stuefe wrote: >> Liming Liu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Make the jtreg test check the usage of THP > > src/hotspot/share/runtime/os.cpp line 2119: > >> 2117: void os::pretouch_memory_common(void* first, void* last, size_t page_size) { >> 2118: assert(is_aligned(first, page_size), "pointer " PTR_FORMAT " is not page-aligned by %zu", p2i(first), page_size); >> 2119: assert(is_aligned(last, page_size), "pointer " PTR_FORMAT " is not page-aligned by %zu", p2i(last), page_size); > > New assertions, right? Affects all platforms, not only Linux. If possible, please don't change behavior on other platforms. > > Have you checked all users of this function? AFAICT calling this with unaligned pointers would have been possible and would have worked. The assertions were suggested by Kim Barrett [here](https://github.com/openjdk/jdk/pull/15781#discussion_r1340872840). The function is private in os, and supposed to be called by pd_pretouch_memory. I'm fine without the assertions personally. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1371278390 From luhenry at openjdk.org Wed Oct 25 07:37:28 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 25 Oct 2023 07:37:28 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 06:55:18 GMT, Fei Yang wrote: >> Hi, >> Can you review the change to add intrinsic for UDivI and UDivL? >> Thanks! >> >> >> ## Tests >> >> ### Functionality >> Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` >> >> ### Performance >> NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. >> >> #### Long >> ** Before ** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 2 19852.277 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 2 29155.681 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 2 6385.280 ns/op >> >> >> ** After ** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 2 11776.806 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 2 16101.940 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 2 6433.223 ns/op >> >> >> #### Integer >> ** Before ** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 2 23498.570 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 2 16875.614 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 2 30310.243 ns/op >> >> >> ** After ** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 2 23327.997 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 2 16708.209 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 2 30162.153 ns/op > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2436: > >> 2434: } else { >> 2435: Label Lltz, Ldone; >> 2436: bltz(rs2, Lltz); > > I am not quite sure what this `bltz` branch is for. Is this a minor performance tunning here? And How would this make a difference then if that's true? I didn't see much difference from the LongDivMod.testDivideUnsigned `negative` jmh test result. +1. It's also the only test case where there is a regression on the JMH numbers, or at least not a clear improvement (before: 6385.280, after: 6433.223) On your JMH numbers, how many iterations have you run for each benchmark? I don't see the standard deviation which would be useful to better understand noise. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1371283500 From luhenry at openjdk.org Wed Oct 25 07:43:37 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 25 Oct 2023 07:43:37 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 15:21:08 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UDivI and UDivL? > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` > > ### Performance > NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. > > #### Long > ** Before ** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 2 19852.277 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 2 29155.681 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 2 6385.280 ns/op > > > ** After ** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 2 11776.806 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 2 16101.940 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 2 6433.223 ns/op > > > #### Integer > ** Before ** > > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 2 23498.570 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 2 16875.614 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 2 30310.243 ns/op > > > ** After ** > > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 2 23327.997 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 2 16708.209 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 2 30162.153 ns/op src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 244: > 242: // idiv variant which deals with MINLONG as dividend and -1 as divisor > 243: int corrected_idivl(Register result, Register rs1, Register rs2, > 244: bool want_remainder, bool is_signed = true); Could you not set the default value of `is_signed` to `true`, to make it clear which case it is at the callsite. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1371287339 From stuefe at openjdk.org Wed Oct 25 07:45:43 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 25 Oct 2023 07:45:43 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v7] In-Reply-To: References: <-p-qDbsbPtk_7h79043ct8sTCcM6d-TiD62zh4c7Q0Q=.9691fe3d-0a85-4f6f-a876-9761a240f866@github.com> Message-ID: <7gRzpm2TJKurZ-tF2EPgXEDHQxp2Iaurx5kip3d9ZJI=.6cfe5388-772b-491a-82af-8afce2ff7e42@github.com> On Wed, 25 Oct 2023 07:29:48 GMT, Liming Liu wrote: >> src/hotspot/share/runtime/os.cpp line 2119: >> >>> 2117: void os::pretouch_memory_common(void* first, void* last, size_t page_size) { >>> 2118: assert(is_aligned(first, page_size), "pointer " PTR_FORMAT " is not page-aligned by %zu", p2i(first), page_size); >>> 2119: assert(is_aligned(last, page_size), "pointer " PTR_FORMAT " is not page-aligned by %zu", p2i(last), page_size); >> >> New assertions, right? Affects all platforms, not only Linux. If possible, please don't change behavior on other platforms. >> >> Have you checked all users of this function? AFAICT calling this with unaligned pointers would have been possible and would have worked. > > The assertions were suggested by Kim Barrett [here](https://github.com/openjdk/jdk/pull/15781#discussion_r1340872840). The function is private in os, and supposed to be called by pd_pretouch_memory. I'm fine without the assertions personally. Okay, in that case its fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1371290527 From duke at openjdk.org Wed Oct 25 07:45:46 2023 From: duke at openjdk.org (Liming Liu) Date: Wed, 25 Oct 2023 07:45:46 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v7] In-Reply-To: References: <-p-qDbsbPtk_7h79043ct8sTCcM6d-TiD62zh4c7Q0Q=.9691fe3d-0a85-4f6f-a876-9761a240f866@github.com> Message-ID: On Tue, 24 Oct 2023 13:49:55 GMT, Thomas Stuefe wrote: >> Liming Liu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Make the jtreg test check the usage of THP > > test/hotspot/jtreg/runtime/os/TestTransparentHugePageUsage.java line 37: > >> 35: * -Xms24G -Xmx24G -XX:+AlwaysPreTouch >> 36: * runtime.os.TestTransparentHugePageUsage >> 37: */ > > Requiring the test to need 24G is a lot... why does it need to be so large? > > What does the test test? The problem was that pre-touching the memory would allocate small pages, and then later khugepaged would fold them into large pages at its own leisure. Your patch prevents that, so now huge pages form faster? So, the success of the patch can be described by timing? I changed the size to 1G which means two thp on aarch64 with 64KB page sizes. The test now checks the usage of thp, and will fail when the policy is madvise and the method of pretouch is atomic-add on aarch64. Timing is a factor, but have to be checked manually from the log, as there is no standard on how fast it should be. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1371292758 From stuefe at openjdk.org Wed Oct 25 07:52:44 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 25 Oct 2023 07:52:44 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v7] In-Reply-To: References: <-p-qDbsbPtk_7h79043ct8sTCcM6d-TiD62zh4c7Q0Q=.9691fe3d-0a85-4f6f-a876-9761a240f866@github.com> Message-ID: On Wed, 25 Oct 2023 07:42:23 GMT, Liming Liu wrote: >> test/hotspot/jtreg/runtime/os/TestTransparentHugePageUsage.java line 37: >> >>> 35: * -Xms24G -Xmx24G -XX:+AlwaysPreTouch >>> 36: * runtime.os.TestTransparentHugePageUsage >>> 37: */ >> >> Requiring the test to need 24G is a lot... why does it need to be so large? >> >> What does the test test? The problem was that pre-touching the memory would allocate small pages, and then later khugepaged would fold them into large pages at its own leisure. Your patch prevents that, so now huge pages form faster? So, the success of the patch can be described by timing? > > I changed the size to 1G which means two thp on aarch64 with 64KB page sizes. The test now checks the usage of thp, and will fail when the policy is madvise and the method of pretouch is atomic-add on aarch64. Timing is a factor, but have to be checked manually from the log, as there is no standard on how fast it should be. Ah, ok, now I get it. You use the size to pick out the heap from the mappings. The problem here is that 1G is a really common size, and you may e.g. find the class space first, which is 1G in default size. May I suggest two alternatives? Alternative 1: call the JVM with "-Xlog:pagesize". That will print out the heap location too, at least for G1, the default allocator: thomas at starfish:/shared/projects/openjdk/jdk-jdk/output-fastdebug$ ./images/jdk/bin/java -Xlog:pagesize ... [0.012s][info][pagesize] Heap: min=8M max=16064M base=0x0000000414000000 size=16064M page_size=4K You can then parse the heap base from the output and use that to find the mapping. Alternative 2: Pass a really crooked size to -Xmx. Not as safe as alternative 1, but should work reasonably well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1371299789 From stuefe at openjdk.org Wed Oct 25 07:52:45 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 25 Oct 2023 07:52:45 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v7] In-Reply-To: References: <-p-qDbsbPtk_7h79043ct8sTCcM6d-TiD62zh4c7Q0Q=.9691fe3d-0a85-4f6f-a876-9761a240f866@github.com> Message-ID: On Wed, 25 Oct 2023 07:47:57 GMT, Thomas Stuefe wrote: >> I changed the size to 1G which means two thp on aarch64 with 64KB page sizes. The test now checks the usage of thp, and will fail when the policy is madvise and the method of pretouch is atomic-add on aarch64. Timing is a factor, but have to be checked manually from the log, as there is no standard on how fast it should be. > > Ah, ok, now I get it. You use the size to pick out the heap from the mappings. The problem here is that 1G is a really common size, and you may e.g. find the class space first, which is 1G in default size. > > May I suggest two alternatives? > > > Alternative 1: > > call the JVM with "-Xlog:pagesize". That will print out the heap location too, at least for G1, the default allocator: > > > thomas at starfish:/shared/projects/openjdk/jdk-jdk/output-fastdebug$ ./images/jdk/bin/java -Xlog:pagesize > ... > [0.012s][info][pagesize] Heap: min=8M max=16064M base=0x0000000414000000 size=16064M page_size=4K > > > You can then parse the heap base from the output and use that to find the mapping. > > Alternative 2: > > Pass a really crooked size to -Xmx. Not as safe as alternative 1, but should work reasonably well. If you use Alternativre 1, you also can scan for UseTransparentHugePages=1 in the output and replace the mxBean call with that: [0.001s][info][pagesize] UseLargePages=1, UseTransparentHugePages=1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15781#discussion_r1371302522 From shade at openjdk.org Wed Oct 25 08:32:48 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Oct 2023 08:32:48 GMT Subject: RFR: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms [v4] In-Reply-To: <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> <3jYeAkN2merJQyqdcuq7zbiAgTX1Md6EJqu9WZ_nGkA=.8da47a48-94cb-41f0-a4f2-378e611ebc0b@github.com> Message-ID: On Mon, 23 Oct 2023 14:17:58 GMT, Aleksey Shipilev wrote: >> See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. >> >> Unfortunately, we cannot test these apart from the existing gtest. >> >> Additional testing: >> - [x] linux-x86-server-fastdebug, atomic tests pass >> - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Use public methods instead of Platform* Thanks all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16252#issuecomment-1778771140 From shade at openjdk.org Wed Oct 25 08:32:50 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Oct 2023 08:32:50 GMT Subject: Integrated: 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms In-Reply-To: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> References: <4e1-HB2PjjXqNd_3QQfuxkzCywjEUBfLcbKL7hFd3Og=.fcfb0435-57c2-48cb-a4b9-f9d60fe1fffa@github.com> Message-ID: On Wed, 18 Oct 2023 18:58:58 GMT, Aleksey Shipilev wrote: > See the bug for rationale. Looks like there is enough infrastructure to achieve what we want without significant fan-out. I checked all `atomic_*.hpp` headers for unimplemented `PlatformAdd<8>` and `PlatformXchg<8>`, and only these seem to be affected. > > Unfortunately, we cannot test these apart from the existing gtest. > > Additional testing: > - [x] linux-x86-server-fastdebug, atomic tests pass > - [x] linux-arm-server-fastdebug, atomic tests pass (with #16269 applied) This pull request has now been integrated. Changeset: ba7d08b8 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/ba7d08b8199172058bd369d880d2d6a9f9649319 Stats: 108 lines in 5 files changed: 97 ins; 8 del; 3 mod 8316961: Fallback implementations for 64-bit Atomic::{add,xchg} on 32-bit platforms Reviewed-by: eosterlund, dholmes, kbarrett, simonis ------------- PR: https://git.openjdk.org/jdk/pull/16252 From lkorinth at openjdk.org Wed Oct 25 08:44:29 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 25 Oct 2023 08:44:29 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v7] In-Reply-To: References: Message-ID: > This pull request renames `createJavaProcessBuilder` to `createLimitedTestJavaProcessBuilder` and renames `createTestJvm` to `createTestJavaProcessBuilder`. Both are implemented through a private `createJavaProcessBuilder`. It also updates the java doc. > > This is so that it should be harder to by mistake use `createLimitedTestJavaProcessBuilder` that is problematic because it will not forward JVM flags to the tested JVM. Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: fix copyright year and indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15452/files - new: https://git.openjdk.org/jdk/pull/15452/files/2f57a32d..4cc3865a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15452&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15452&range=05-06 Stats: 23 lines in 1 file changed: 0 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/15452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15452/head:pull/15452 PR: https://git.openjdk.org/jdk/pull/15452 From mli at openjdk.org Wed Oct 25 09:16:38 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Oct 2023 09:16:38 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 07:34:25 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2436: >> >>> 2434: } else { >>> 2435: Label Lltz, Ldone; >>> 2436: bltz(rs2, Lltz); >> >> I am not quite sure what this `bltz` branch is for. Is this a minor performance tunning here? And How would this make a difference then if that's true? I didn't see much difference from the LongDivMod.testDivideUnsigned `negative` jmh test result. > > +1. It's also the only test case where there is a regression on the JMH numbers, or at least not a clear improvement (before: 6385.280, after: 6433.223) > > On your JMH numbers, how many iterations have you run for each benchmark? I don't see the standard deviation which would be useful to better understand noise. `For the algorithm details, check j.l.Long::divideUnsigned` in the jdk lib source, it mentions this algorithm, I also pointed to it in this patch. It's not related to the difference between negative and positive test cases, it's related to the cost of divxx instructions, compared to the lines between 2440 ~ 2443 in src/hotspot/cpu/riscv/macroAssembler_riscv.cpp, the divu cost for negative value is still very high. int_def ALU_COST ( 100, 1 * DEFAULT_COST); int_def BRANCH_COST ( 200, 2 * DEFAULT_COST); int_def IDIVDI_COST ( 6600, 66 * DEFAULT_COST); I have also re-run the benchmark with more warmup (5) and iteration (10), please check the data in pr desc. I also attach the diff between v1 and v2 intrinsic. v2 is this patch. v1 is diff based on v2, it just use riscv divxx directly without optimization for negative value brong by the algorithm (i.e. without the bltz and related other codes). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1371415242 From mli at openjdk.org Wed Oct 25 09:16:39 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Oct 2023 09:16:39 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 09:09:21 GMT, Hamlin Li wrote: >> +1. It's also the only test case where there is a regression on the JMH numbers, or at least not a clear improvement (before: 6385.280, after: 6433.223) >> >> On your JMH numbers, how many iterations have you run for each benchmark? I don't see the standard deviation which would be useful to better understand noise. > > `For the algorithm details, check j.l.Long::divideUnsigned` in the jdk lib source, it mentions this algorithm, I also pointed to it in this patch. > > It's not related to the difference between negative and positive test cases, it's related to the cost of divxx instructions, compared to the lines between 2440 ~ 2443 in src/hotspot/cpu/riscv/macroAssembler_riscv.cpp, the divu cost for negative value is still very high. > > > int_def ALU_COST ( 100, 1 * DEFAULT_COST); > int_def BRANCH_COST ( 200, 2 * DEFAULT_COST); > int_def IDIVDI_COST ( 6600, 66 * DEFAULT_COST); > > > I have also re-run the benchmark with more warmup (5) and iteration (10), please check the data in pr desc. > I also attach the diff between v1 and v2 intrinsic. v2 is this patch. v1 is diff based on v2, it just use riscv divxx directly without optimization for negative value brong by the algorithm (i.e. without the bltz and related other codes). I don't know why the previous jmh data has no `error` part, maybe because it's too low to show. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1371421207 From eosterlund at openjdk.org Wed Oct 25 09:35:06 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 25 Oct 2023 09:35:06 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v5] In-Reply-To: References: Message-ID: > In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). > In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. > > The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. > > I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Update barrierSetNMethod.cpp Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14543/files - new: https://git.openjdk.org/jdk/pull/14543/files/f3c15b91..87fa122e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14543&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14543&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14543.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14543/head:pull/14543 PR: https://git.openjdk.org/jdk/pull/14543 From aph at openjdk.org Wed Oct 25 09:35:08 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 25 Oct 2023 09:35:08 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v4] In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 09:35:55 GMT, Erik ?sterlund wrote: >> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). >> In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. >> >> The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. >> >> I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Add comment src/hotspot/share/gc/shared/barrierSetNMethod.cpp line 190: > 188: > 189: // In case a concurrent thread disarmed the nmethod, we need to ensure the new instructions > 190: // are made visible using a cross modify fence. Note that this is synchronous cross modifying Suggestion: // are made visible, by using a cross modify fence. Note that this is synchronous cross modifying ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14543#discussion_r1371438691 From aph at openjdk.org Wed Oct 25 09:39:41 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 25 Oct 2023 09:39:41 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v5] In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 09:35:06 GMT, Erik ?sterlund wrote: >> In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). >> In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. >> >> The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. >> >> I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Update barrierSetNMethod.cpp > > Co-authored-by: Andrew Haley Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14543#pullrequestreview-1696832389 From simonis at openjdk.org Wed Oct 25 11:30:44 2023 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 25 Oct 2023 11:30:44 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v27] In-Reply-To: References: <0AHhZD1JncFwC7z5rp1uufF4FAMrKR8mQXdDuwKNL4s=.02114c12-5a6b-4d3e-8b7a-421bb3eb8d47@github.com> Message-ID: <2leJ4ArjJxwon1DwmNr470VEWeXBt6AH780uIY5orcU=.00e89ec2-6a4d-450c-9d8d-58765d5916b1@github.com> On Thu, 5 Oct 2023 03:04:39 GMT, Jonathan Joo wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment and change if defined to ifdef > > Resolved comments and sanity checks pass on all builds: https://github.com/jjoo172/jdk/actions/runs/6411637099 > > I believe this PR should be RFR once again. @jjoo172, https://github.com/openjdk/jdk/pull/16252 has now been integrated. Can you please merge it in and use the 64-bit Atomics for incrementing the counters? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1779058633 From simonis at openjdk.org Wed Oct 25 11:53:45 2023 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 25 Oct 2023 11:53:45 GMT Subject: RFR: 8318811: Compiler directives parser swallows a character after line comments Message-ID: Currently, the following valid compiler directive file: [{ match: "*::*", c2: { Exclude: true } // c1 only for startup }] will be rejected by the parser: Syntax error on line 4 byte 2: Expected value separator or object end (one of ',}'). At ']'. }] Parsing of compiler directives failed This is because `JSON::skip_line_comment()`, in contradiction to its specification, does **not** "*return the first token after the line comment without consuming it*" but does consumes it. The fix is trivial: --- a/src/hotspot/share/utilities/json.cpp +++ b/src/hotspot/share/utilities/json.cpp @@ -580,7 +580,7 @@ u_char JSON::skip_line_comment() { return 0; } next(); - return next(); + return peek(); } ------------- Commit messages: - 8318811: Compiler directives parser swallows a character after line comments Changes: https://git.openjdk.org/jdk/pull/16359/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16359&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318811 Stats: 21 lines in 2 files changed: 20 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16359.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16359/head:pull/16359 PR: https://git.openjdk.org/jdk/pull/16359 From shade at openjdk.org Wed Oct 25 12:39:35 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Oct 2023 12:39:35 GMT Subject: RFR: 8318811: Compiler directives parser swallows a character after line comments In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 11:46:10 GMT, Volker Simonis wrote: > Currently, the following valid compiler directive file: > > [{ > match: "*::*", > c2: { Exclude: true } // c1 only for startup > }] > > will be rejected by the parser: > > Syntax error on line 4 byte 2: Expected value separator or object end (one of ',}'). > At ']'. > }] > > Parsing of compiler directives failed > > > This is because `JSON::skip_line_comment()`, in contradiction to its specification, does **not** "*return the first token after the line comment without consuming it*" but does consumes it. > > The fix is trivial: > > --- a/src/hotspot/share/utilities/json.cpp > +++ b/src/hotspot/share/utilities/json.cpp > @@ -580,7 +580,7 @@ u_char JSON::skip_line_comment() { > return 0; > } > next(); > - return next(); > + return peek(); > } This looks okay and matching what `JSON::skip_block_comment` -- which is specified similarly -- is doing. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16359#pullrequestreview-1697220929 From jkratochvil at openjdk.org Wed Oct 25 12:47:47 2023 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Wed, 25 Oct 2023 12:47:47 GMT Subject: RFR: 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo [v3] In-Reply-To: References: Message-ID: > In OpenJDK project CRaC I had a [need to fetch new CpuidInfo without affecting the existing one](https://github.com/openjdk/crac/commit/ed4ad9ba31b77732dcede2eb743b2f389ec9a0fe#diff-6ed856c57ddbe33e49883adb7c52ec51ed377e5f697dfd6d8bea505a97bfc5a5R2743). > Which led me to encapsulate the object more and I think this no-functionality-change patch is even appropriate for JDK. > I am sure interested primarily to reduce the CRaC patchset boilerplate. Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Remove the automatic zero-initialization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16093/files - new: https://git.openjdk.org/jdk/pull/16093/files/92568384..beee275a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16093&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16093&range=01-02 Stats: 9 lines in 2 files changed: 1 ins; 6 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16093/head:pull/16093 PR: https://git.openjdk.org/jdk/pull/16093 From zgu at openjdk.org Wed Oct 25 12:51:58 2023 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 25 Oct 2023 12:51:58 GMT Subject: RFR: 8317466: Enable interpreter oopMapCache for concurrent GCs [v3] In-Reply-To: References: Message-ID: <3oEH8zFRDSOC9kcENyJXxj0Gb2rbk2Dimnzr__4d4jE=.ccfde890-e51b-4b1c-83c8-34516307c5cb@github.com> > Interpreter oop maps are computed lazily during GC root scan and they are expensive to compute. > > GCs uses a small hash table per instance class to cache computed oop maps during STW root scan, but not for concurrent root scan. > > This patch is intended to enable `OopMapCache` for concurrent GCs. > > Test: > tier1 and tier2 fastdebug and release on MacOSX, Linux 86_84 and Linux 86_32. Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Refactor ShenandoahOperation to be symmetric to other GCs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16074/files - new: https://git.openjdk.org/jdk/pull/16074/files/015d4fb3..b63a105e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16074&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16074&range=01-02 Stats: 6 lines in 2 files changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16074.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16074/head:pull/16074 PR: https://git.openjdk.org/jdk/pull/16074 From jvernee at openjdk.org Wed Oct 25 13:13:37 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 25 Oct 2023 13:13:37 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: Message-ID: <4qn-CzvnxaLi75tXebhHFx50TZAaK092MvUwlmEZxsQ=.dd48a7ff-7f9f-4ea4-93a2-636c6147bcfd@github.com> On Tue, 24 Oct 2023 15:09:57 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - a -> an > - add note to downcallHandle about passing heap segments by-reference src/hotspot/share/prims/foreignGlobals.cpp line 150: > 148: if (signature[i] != T_VOID) { > 149: out_regs.push(as_VMStorage(pair.first(), signature[i])); > 150: } Since we only care about registers-sized values. It is safe to ignore upper halves of `T_LONG` and `T_DOUBLE` (i.e. `T_VOID`). src/hotspot/share/prims/nativeEntryPoint.cpp line 55: > 53: for (int i = 0, bt_idx = 0; i < pcount; i++) { > 54: oop type_oop = java_lang_invoke_MethodType::ptype(type, i); > 55: BasicType bt = java_lang_Class::as_BasicType(type_oop); We can now see `T_OBJECT` here as well, so we just look at the basic type. src/hotspot/share/prims/nativeEntryPoint.cpp line 60: > 58: if (reg_oop != nullptr) { > 59: input_regs.push(ForeignGlobals::parse_vmstorage(reg_oop)); > 60: } Some registers are `null`, which indicates that the corresponding value passed to the downcall stub should not be forwarded to the native call, but is instead used directly in the downcall stub (e.g. the offset of an oop). src/hotspot/share/prims/upcallLinker.cpp line 182: > 180: return (jlong) UpcallLinker::make_upcall_stub( > 181: mh_j, entry, out_sig_bt, total_out_args, ret_type, > 182: abi, conv, needs_return_buffer, checked_cast(ret_buf_size)); Note that we no longer pass the input signature here, which is instead derived from the output signature in the implementation of `make_upcall_stub`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1371738783 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1371734701 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1371736101 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1371734122 From luhenry at openjdk.org Wed Oct 25 13:17:35 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 25 Oct 2023 13:17:35 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 09:13:59 GMT, Hamlin Li wrote: >> `For the algorithm details, check j.l.Long::divideUnsigned` in the jdk lib source, it mentions this algorithm, I also pointed to it in this patch. >> >> It's not related to the difference between negative and positive test cases, it's related to the cost of divxx instructions, compared to the lines between 2440 ~ 2443 in src/hotspot/cpu/riscv/macroAssembler_riscv.cpp, the divu cost for negative value is still very high. >> >> >> int_def ALU_COST ( 100, 1 * DEFAULT_COST); >> int_def BRANCH_COST ( 200, 2 * DEFAULT_COST); >> int_def IDIVDI_COST ( 6600, 66 * DEFAULT_COST); >> >> >> I have also re-run the benchmark with more warmup (5) and iteration (10), please check the data in pr desc. >> I also attach the diff between v1 and v2 intrinsic. v2 is this patch. v1 is diff based on v2, it just use riscv divxx directly without optimization for negative value brong by the algorithm (i.e. without the bltz and related other codes). > > I don't know why the previous jmh data has no `error` part, maybe because it's too low to show. IIUC, with the branch, the results are `6376.674 ? 16.869 ns/op`, and without the branch, they are `29518.033 ? 49.056`, correct? If so, the branch makes more sense, at least of the board you've tested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1371743833 From jsjolen at openjdk.org Wed Oct 25 13:21:40 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 25 Oct 2023 13:21:40 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 08:41:56 GMT, Stefan Karlsson wrote: >> Hi, >> >> Thank you for looking through these changes. I applied your comments and also did a run through to look for incorrectly ordered includes. For the gtest source files I separated the includes in a consistent manner, they all look like this pattern now: >> >> ```c++ >> #include "precompiled.hpp" >> #include "memory/allocation.hpp" >> #include "nmt/mallocHeader.inline.hpp" >> #include "nmt/memTracker.hpp" >> #include "runtime/os.hpp" >> >> #include "testutils.hpp" >> #include "unittest.hpp" > >> For the gtest source files I separated the includes in a consistent manner, they all look like this pattern now: > > That's not what I see in the latest patch. Could you revert that separation and then we can consider that style change in a separate RFE? @stefank, I believe that I've applied your suggestions, would you mind having another look? Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/16276#issuecomment-1779260132 From mli at openjdk.org Wed Oct 25 13:42:37 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Oct 2023 13:42:37 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 13:14:19 GMT, Ludovic Henry wrote: >> I don't know why the previous jmh data has no `error` part, maybe because it's too low to show. > > IIUC, with the branch, the results are `6376.674 ? 16.869 ns/op`, and without the branch, they are `29518.033 ? 49.056`, correct? If so, the branch makes more sense, at least of the board you've tested. Yes, for negtive divisor, unsigned div can go through a quick path which is must faster than built-in instructions. And this is demonstrated in the div cost in riscv.ad, and also verified by benchmark tests run on the board. ( I'm not sure if in the future built-in div will be faster, if it turns out in the future, we should also need to redefine the div cost in riscv.ad, and re-visit this intrinsic. ) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1371781221 From tschatzl at openjdk.org Wed Oct 25 14:21:05 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 25 Oct 2023 14:21:05 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 Message-ID: The JEP covers the idea very well, so I'm only covering some implementation details here: * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: `GC(6) Pause Young (Normal) (Evacuation Failure) 1M->1M(22M) 36.16ms` there is that new tag `(Pinned)` that indicates that one or more regions that were pinned were encountered during gc. E.g. `GC(6) Pause Young (Normal) (Pinned) (Evacuation Failure) 1M->1M(22M) 36.16ms` `Pinned` and `Evacuation Failure` tags are not exclusive. GC might have encountered both pinned regions and evacuation failed regions in the same collection or even in the same region. (I am open to a better name for the `(Pinned)` tag) Testing: tier1-8 ------------- Commit messages: - Improve somewhat unstable test - Fix typo in src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/g1/HeapRegion.java so that resourcehogs/serviceability/sa/ClhsdbRegionDetailsScanOopsForG1.java does not fail - Fix minimal build - Region pinning in G1/JEP-423 Changes: https://git.openjdk.org/jdk/pull/16342/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318706 Stats: 1547 lines in 54 files changed: 922 ins; 422 del; 203 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From tschatzl at openjdk.org Wed Oct 25 14:21:06 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 25 Oct 2023 14:21:06 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 09:56:57 GMT, Thomas Schatzl wrote: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... The new [TestPinnedOldObjectsEvacuation.java](https://github.com/openjdk/jdk/pull/16342/files#diff-b141a3be9b9c2ba59c78f42ee1af4f65f04a32a15db245c1ed68711953939258) test isn't stable, otherwise passes tier1-8. No perf changes. I'm opening this PR for review even if this is the case, this is not a blocker for review, and fix it later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16342#issuecomment-1779381807 From fjiang at openjdk.org Wed Oct 25 14:48:48 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 25 Oct 2023 14:48:48 GMT Subject: RFR: 8318827: RISC-V: Improve readability of fclass result testing Message-ID: Hi, please consider. Currently, we test results of `fclass` instruction with hard-coded bits which has bad readability. This patch adds an enumeration of the flcass mask bits for ease of use. Testing: - [ ] tier1 with release build ------------- Commit messages: - Add FCLASS_MASK enum for better readability Changes: https://git.openjdk.org/jdk/pull/16362/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16362&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318827 Stats: 32 lines in 4 files changed: 19 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16362.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16362/head:pull/16362 PR: https://git.openjdk.org/jdk/pull/16362 From luhenry at openjdk.org Wed Oct 25 14:50:36 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 25 Oct 2023 14:50:36 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 13:39:35 GMT, Hamlin Li wrote: >> IIUC, with the branch, the results are `6376.674 ? 16.869 ns/op`, and without the branch, they are `29518.033 ? 49.056`, correct? If so, the branch makes more sense, at least of the board you've tested. > > Yes, for negtive divisor, unsigned div can go through a quick path which is must faster than built-in instructions. > And this is demonstrated in the div cost in riscv.ad, and also verified by benchmark tests run on the board. > > ( I'm not sure if in the future built-in div will be faster, if it turns out in the future, we should also need to redefine the div cost in riscv.ad, and re-visit this intrinsic. ) Ok, so let's keep the version with branch. You should add a comment at https://github.com/openjdk/jdk/pull/16346/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR2435 explaining just that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1371898530 From duke at openjdk.org Wed Oct 25 15:03:45 2023 From: duke at openjdk.org (Elif Aslan) Date: Wed, 25 Oct 2023 15:03:45 GMT Subject: Integrated: 8318608: Enable parallelism in vmTestbase/nsk/stress/threads tests In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 22:30:42 GMT, Elif Aslan wrote: > The commit includes changes to unblock parallelism for more `hotspot:tier4` tests. in `test/hotspot/jtreg/vmTestbase/nsk/stress/thread `tests. > > Below are the before and after test run comparisons: > > Before: > time,count > 15, 1 > 33, 1 > 48, 1 > 66, 1 > 72, 1 > 77, 1 > > Mean 51.83s > Standard deviation 22.24s > Total elapsed time 1m 17s > > After: > time,count > 19, 1 > 23, 1 > 29, 1 > 34, 1 > 48, 1 > 53, 1 > > Mean 34.33s > Standard deviation 12.43s > Total elapsed time 0m 53s This pull request has now been integrated. Changeset: cee44a62 Author: Elif Aslan Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/cee44a625594fd805a05c4a69033eb677a5a6f17 Stats: 24 lines in 1 file changed: 0 ins; 24 del; 0 mod 8318608: Enable parallelism in vmTestbase/nsk/stress/threads tests Reviewed-by: lmesnik, shade ------------- PR: https://git.openjdk.org/jdk/pull/16327 From luhenry at openjdk.org Wed Oct 25 15:24:35 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 25 Oct 2023 15:24:35 GMT Subject: RFR: 8318827: RISC-V: Improve readability of fclass result testing In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 14:42:07 GMT, Feilong Jiang wrote: > Hi, please consider. > > Currently, we test results of `fclass` instruction with hard-coded bits which has bad readability. This patch adds an enumeration of the flcass mask bits for ease of use. > > Testing: > > - [ ] tier1 with release build src/hotspot/cpu/riscv/assembler_riscv.hpp line 1094: > 1092: FCLASS_NAN = FCLASS_SNAN | FCLASS_QNAN, > 1093: FCLASS_FINITE = FCLASS_ZERO | FCLASS_SUBNORM | FCLASS_NORM, > 1094: }; We use lower-case and no prefix for similar enums (ex: https://github.com/openjdk/jdk/blob/069bb87693ec7944f08062e3a7e460653324e6cb/src/hotspot/cpu/riscv/assembler_riscv.hpp#L1119-L1145) Suggestion: enum fclass_mask { minf = 1 << 0, // negative infinite mnorm = 1 << 1, // negative normal number msubnorm = 1 << 2, // negative subnormal number mzero = 1 << 3, // negative zero pzero = 1 << 4, // positive zero psubnorm = 1 << 5, // positive subnormal number pnorm = 1 << 6, // positive normal number pinf = 1 << 7, // positive infinite snan = 1 << 8, // signaling NaN qnan = 1 << 9, // quiet NaN zero = mzero | pzero, subnorm = msubnorm | psubnorm, norm = mnorm | pnorm, inf = minf | pinf, nan = snan | qnan, finite = zero | subnorm | norm, }; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16362#discussion_r1371949559 From mli at openjdk.org Wed Oct 25 15:37:37 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Oct 2023 15:37:37 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 07:37:48 GMT, Ludovic Henry wrote: >> Hi, >> Can you review the change to add intrinsic for UDivI and UDivL? >> Thanks! >> >> >> ## Tests >> >> ### Functionality >> Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` >> >> ### Performance >> ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) >> >> #### Long >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op >> >> >> **After v1** >> (This is a simpler version, please check the diff from `After v2` below) >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22668.228 ? 74.161 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 15966.320 ? 14.985 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 29518.033 ? 49.056 ns/op >> >> >> **After v2** >> (This is the current patch, **This version has a huge regression for negative values!!!**) >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 11432.738 ? 95.785 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 15969.044 ? 19.492 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6376.674 ? 16.869 ns/op >> >> >> ##### Diff of v1 from v2 >> >> diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> index b96f7611133..dfb40e171e7 100644 >> --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> @@ -2432,16 +2432,7 @@ int MacroAssembler::corrected_idivq(Register result, Register rs1, Register rs2, >> if (is_signed) { >> div(result, rs1, rs2); >> } else { >> - Label Lltz, Ldone; >> - bltz(rs2, Lltz); >> divu(result, rs1, rs2); >> - j(Ldone); >> - bind(Lltz); // For the algorithm details, check j.l.Long::divideUnsigned >> - sub(result, rs1, rs2); >> - notr(result, result); >> - andr(result, result, rs1); >> - srli(result, result, 63... > > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 244: > >> 242: // idiv variant which deals with MINLONG as dividend and -1 as divisor >> 243: int corrected_idivl(Register result, Register rs1, Register rs2, >> 244: bool want_remainder, bool is_signed = true); > > Could you not set the default value of `is_signed` to `true`, to make it clear which case it is at the callsite. The reason I use a default value for `is_signed` is because both corrected_idivx are also used in cpu/riscv/c1_LIRAssembler_arith_riscv.cpp, which I dont' want to touch in this pr. But if you insist, I can remove the default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1371967043 From stefank at openjdk.org Wed Oct 25 16:58:42 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 25 Oct 2023 16:58:42 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v7] In-Reply-To: <81yK2Yxh7AVOSjVoAzZwIlriUwHRfN5s5LLowgA-34o=.1ed62ff1-d3d6-4fc1-8e3e-6ca945d86468@github.com> References: <81yK2Yxh7AVOSjVoAzZwIlriUwHRfN5s5LLowgA-34o=.1ed62ff1-d3d6-4fc1-8e3e-6ca945d86468@github.com> Message-ID: On Tue, 24 Oct 2023 11:51:45 GMT, Johan Sj?len wrote: >> I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? >> >> 1. Moved all the nmt source code from services/ to nmt/ >> 2. Renamed all the include statements and sorted them >> 3. Fixed the include guards > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge remote-tracking branch 'upstream/master' into move-nmt > - Fix stefank suggestions > - Merge remote-tracking branch 'origin/master' into move-nmt > - Fix messed up include > - Missed this include > - Merge remote-tracking branch 'origin/master' into move-nmt > - Fixed reviewed changes > - Move NMT to its own subdirectory I've approved the patch, but I think you should revert the two gtest changes I mention below. I have given suggestions for two cleanups that you might want to do. src/hotspot/share/memory/resourceArea.inline.hpp line 30: > 28: #include "memory/resourceArea.hpp" > 29: > 30: #include "nmt/memTracker.hpp" Another new line that should not be here. Bonus points if it gets removed. src/hotspot/share/nmt/mallocHeader.cpp line 29: > 27: #include "nmt/mallocHeader.inline.hpp" > 28: > 29: #include "nmt/mallocSiteTable.hpp" There should be no newline between these two includes. If you make another round of changes I think it would be good to get rid of it. test/hotspot/gtest/nmt/test_nmt_buffer_overflow_detection.cpp line 33: > 31: #include "testutils.hpp" > 32: #include "unittest.hpp" > 33: It is inclear to me if there is an ordering requirement between testutils and unittest. I'd prefer if you didn't change those unit test header files in this patch and make these cleanups separately. test/hotspot/gtest/nmt/test_nmt_locationprinting.cpp line 35: > 33: // Uncomment to get test output > 34: //#define LOG_PLEASE > 35: There seems to be an ordering requirement between LOG_PLEASE and the inclusion of LOG_PLEASE. I think you break it with this change. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16276#pullrequestreview-1697824750 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1372058081 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1372057335 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1372053129 PR Review Comment: https://git.openjdk.org/jdk/pull/16276#discussion_r1372051716 From aph at openjdk.org Wed Oct 25 17:55:36 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 25 Oct 2023 17:55:36 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 14:48:15 GMT, Ludovic Henry wrote: >> Yes, for negtive divisor, unsigned div can go through a quick path which is must faster than built-in instructions. >> And this is demonstrated in the div cost in riscv.ad, and also verified by benchmark tests run on the board. >> >> ( I'm not sure if in the future built-in div will be faster, if it turns out in the future, we should also need to redefine the div cost in riscv.ad, and re-visit this intrinsic. ) > > Ok, so let's keep the version with branch. You should add a comment at https://github.com/openjdk/jdk/pull/16346/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR2435 explaining just that. But you're putting a conditional branch in the way of the common cases, and you're greatly increasing icache pressure, for the sake of a rare case. How does that make any sense? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1372124307 From jjoo at openjdk.org Wed Oct 25 18:30:07 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Wed, 25 Oct 2023 18:30:07 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v31] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 31 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Add call to publish in parallel gc and update counter names - Add Copyright header to test and formatting changes - Fix test - add comment and change if defined to ifdef - Remove header and fix long to jlong - Update logic to use cmpxchg rather than add - Fix build issues - Fix logic for publishing total cpu time and convert atomic jlong to long - Fix more broken headers for sanity checks - ... and 21 more: https://git.openjdk.org/jdk/compare/a3b77dde...b57aa467 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/19fe9b3a..b57aa467 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=29-30 Stats: 128020 lines in 3548 files changed: 59061 ins; 26067 del; 42892 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From qamai at openjdk.org Wed Oct 25 18:57:26 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 25 Oct 2023 18:57:26 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 17:53:15 GMT, Andrew Haley wrote: >> Ok, so let's keep the version with branch. You should add a comment at https://github.com/openjdk/jdk/pull/16346/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR2435 explaining just that. > > But you're putting a conditional branch in the way of the common cases, and you're greatly increasing icache pressure, for the sake of a rare case. How does that make any sense? I agree with @theRealAph here, this seems like a micro optimisation for the sake of microbenchmark that will not be beneficial in general. And if a considerable portion of divisors does lie in this range, the optimisation can always be applied from the caller side. Furthermore, by doing so we will even have the benefit of branch profiling, which will help achieve better results. Another note is that I do not know any compiler that does this premature optimisation. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1372194253 From jjoo at openjdk.org Wed Oct 25 20:37:46 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Wed, 25 Oct 2023 20:37:46 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v32] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Use 64-bit atomic add for incrementing counters ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/b57aa467..ebafa2b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=30-31 Stats: 13 lines in 1 file changed: 0 ins; 13 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From lmesnik at openjdk.org Wed Oct 25 21:14:55 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 25 Oct 2023 21:14:55 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions Message-ID: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. A few tests start failing. The test serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. The test java/lang/Thread/virtual/ThreadAPI.java tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. Test test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. ------------- Commit messages: - 8318839: Update test thread factory to catch all exceptions Changes: https://git.openjdk.org/jdk/pull/16369/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16369&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318839 Stats: 18 lines in 4 files changed: 15 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16369.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16369/head:pull/16369 PR: https://git.openjdk.org/jdk/pull/16369 From jvernee at openjdk.org Wed Oct 25 21:22:23 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 25 Oct 2023 21:22:23 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: Message-ID: <6Ywn5vjecuN2UtqWT6IdM_EyK4xP4foammAeoJCvxx8=.2ab7ba6f-fb37-4620-96b1-9e39b4e939e4@github.com> On Tue, 24 Oct 2023 15:09:57 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - a -> an > - add note to downcallHandle about passing heap segments by-reference src/hotspot/cpu/x86/downcallLinker_x86_64.cpp line 110: > 108: __ mov(rsp, r12); // restore sp > 109: __ reinit_heapbase(); > 110: } This is a minor cleanup to share this code for the three use sites below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1371750211 From manc at openjdk.org Wed Oct 25 21:25:38 2023 From: manc at openjdk.org (Man Cao) Date: Wed, 25 Oct 2023 21:25:38 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v32] In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 20:37:46 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Use 64-bit atomic add for incrementing counters The change LGTM except a small suggestion. Could some one from hotspot-gc-dev also take a look as @albertnetymk added the tag? src/hotspot/share/gc/shared/stringdedup/stringDedupProcessor.cpp line 200: > 198: log_statistics(); > 199: if (UsePerfData && os::is_thread_cpu_time_supported()) { > 200: ThreadTotalCPUTimeClosure tttc(_concurrent_dedup_thread_cpu_time, true); I think it is better to not classify StringDedup thread as a GC thread, so remove the "true" parameter. Although StringDedup requests are initiated during GC (search `_string_dedup_requests.add`), they are not part of the GC. ------------- Marked as reviewed by manc (Committer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1698265647 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1372333757 From rriggs at openjdk.org Wed Oct 25 22:00:31 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 25 Oct 2023 22:00:31 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v7] In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 08:44:29 GMT, Leo Korinth wrote: >> This pull request renames `createJavaProcessBuilder` to `createLimitedTestJavaProcessBuilder` and renames `createTestJvm` to `createTestJavaProcessBuilder`. Both are implemented through a private `createJavaProcessBuilder`. It also updates the java doc. >> >> This is so that it should be harder to by mistake use `createLimitedTestJavaProcessBuilder` that is problematic because it will not forward JVM flags to the tested JVM. > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright year and indentation Suggestions to complete the descriptions of the createXXXJavaProcessBuilder methods. test/lib/jdk/test/lib/process/ProcessTools.java line 505: > 503: * @return The ProcessBuilder instance representing the java command. > 504: */ > 505: public static ProcessBuilder createTestJavaProcessBuilder(List command) { Include the same description of other properties that are included in creating the ProcessBuilder... ``` * Unless the "test.noclasspath" property is "true" * the classpath property "java.class.path" is appended to the command line and * the environment of the ProcessBuilder is modified to remove "CLASSPATH". * If the property "test.thread.factory" is provided the command args are * updated and appended to invoke ProcessTools main() and provide the * name of the thread factory. * The "-Dtest.thread.factory" is appended to the arguments with the thread factory value. * The remaining command args are scanned for unsupported options and * are appended to the ProcessBuilder. test/lib/jdk/test/lib/process/ProcessTools.java line 520: > 518: * @return The ProcessBuilder instance representing the java command. > 519: */ > 520: public static ProcessBuilder createTestJavaProcessBuilder(String... command) { Include the same description of other properties that are included in creating the ProcessBuilder... * Unless the "test.noclasspath" property is "true" * the classpath property "java.class.path" is appended to the command line and * the environment of the ProcessBuilder is modified to remove "CLASSPATH". * If the property "test.thread.factory" is provided the command args are * updated and appended to invoke ProcessTools main() and provide the * name of the thread factory. * The "-Dtest.thread.factory" is appended to the arguments with the thread factory value. * The remaining command args are scanned for unsupported options and * are appended to the ProcessBuilder. test/lib/jdk/test/lib/process/ProcessTools.java line 538: > 536: * it in combination with @requires vm.flagless JTREG > 537: * anotation as to not waste energy and test resources. > 538: * Consider adding this description of what this method does. Suggestion: * Unless the "test.noclasspath" property is "true" * the classpath property "java.class.path" is appended to the command line and * the environment of the ProcessBuilder is modified to remove "CLASSPATH". * If the property "test.thread.factory" is provided the command args are * updated and appended to invoke ProcessTools main() and provide the * name of the thread factory. * The "-Dtest.thread.factory" is appended to the arguments with the thread factory value. * The remaining command args are scanned for unsupported options and * are appended to the ProcessBuilder. test/lib/jdk/test/lib/process/ProcessTools.java line 560: > 558: * it in combination with @requires vm.flagless JTREG > 559: * anotation as to not waste energy and test resources. > 560: * Suggestion: * Unless the "test.noclasspath" property is "true" * the classpath property "java.class.path" is appended to the command line and * the environment of the ProcessBuilder is modified to remove "CLASSPATH". * If the property "test.thread.factory" is provided the command args are * updated and appended to invoke ProcessTools main() and provide the * name of the thread factory. * The "-Dtest.thread.factory" is appended to the arguments with the thread factory value. * The remaining command args are scanned for unsupported options and * are appended to the ProcessBuilder. ------------- PR Review: https://git.openjdk.org/jdk/pull/15452#pullrequestreview-1698308785 PR Review Comment: https://git.openjdk.org/jdk/pull/15452#discussion_r1372364800 PR Review Comment: https://git.openjdk.org/jdk/pull/15452#discussion_r1372364171 PR Review Comment: https://git.openjdk.org/jdk/pull/15452#discussion_r1372361862 PR Review Comment: https://git.openjdk.org/jdk/pull/15452#discussion_r1372362333 From phh at openjdk.org Wed Oct 25 22:32:29 2023 From: phh at openjdk.org (Paul Hohensee) Date: Wed, 25 Oct 2023 22:32:29 GMT Subject: RFR: 8318811: Compiler directives parser swallows a character after line comments In-Reply-To: References: Message-ID: <_BPO8cSQtS8QIopz409yokVYKED1UHAlj5O5RoLwdck=.6698bb72-f181-4f7b-ac9f-33655c467272@github.com> On Wed, 25 Oct 2023 11:46:10 GMT, Volker Simonis wrote: > Currently, the following valid compiler directive file: > > [{ > match: "*::*", > c2: { Exclude: true } // c1 only for startup > }] > > will be rejected by the parser: > > Syntax error on line 4 byte 2: Expected value separator or object end (one of ',}'). > At ']'. > }] > > Parsing of compiler directives failed > > > This is because `JSON::skip_line_comment()`, in contradiction to its specification, does **not** "*return the first token after the line comment without consuming it*" but does consumes it. > > The fix is trivial: > > --- a/src/hotspot/share/utilities/json.cpp > +++ b/src/hotspot/share/utilities/json.cpp > @@ -580,7 +580,7 @@ u_char JSON::skip_line_comment() { > return 0; > } > next(); > - return next(); > + return peek(); > } Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16359#pullrequestreview-1698344513 From dholmes at openjdk.org Thu Oct 26 02:27:34 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 26 Oct 2023 02:27:34 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v7] In-Reply-To: References: <12Jiijy5Y9tn-_eHyE01oDMVUomgXbEiuEgDPgY8GU0=.af4948be-6026-4b35-bcd1-b281855e3ede@github.com> Message-ID: On Wed, 25 Oct 2023 04:07:12 GMT, Quan Anh Mai wrote: >> src/hotspot/share/utilities/growableArray.hpp line 213: >> >>> 211: >>> 212: template >>> 213: int find(T* token, bool f(T*, const E&)) const { >> >> What is the advantage of a const reference here? > > You can bind a non-const reference to a const one but not the other way. Sorry I was unclear: what is the advantage of a reference here? Is it just to avoid copying ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1372500458 From david.holmes at oracle.com Thu Oct 26 02:28:22 2023 From: david.holmes at oracle.com (David Holmes) Date: Thu, 26 Oct 2023 12:28:22 +1000 Subject: RFC: 8318776: Require supports_cx8 to always be true In-Reply-To: <517d7c08-e9a6-481a-89e5-2533c5d41724@oracle.com> References: <517d7c08-e9a6-481a-89e5-2533c5d41724@oracle.com> Message-ID: Adding in porters-dev On 25/10/2023 5:12 pm, David Holmes wrote: > From? https://bugs.openjdk.org/browse/JDK-8318776 > > Regardless of platform size (32-bit or 64-bit) the Java language has > always required that the underlying platform (or the VM) provides a > means of performing atomic load and store of 64-bit values, for volatile > long and double support. > > Since Java 5 the java.util.concurrent.atomic package introduced APIs > that provide a range of atomic operations, the most fundamental being a > compare-and-swap (CAS), also known as a compare-exchange, out of which > other atomic operations can be constructed if there is no direct > platform support. This capability was later extended to the VarHandle > API as well. > > While all platforms needed a mechanism for 64-bit load and store, not > all platforms support a 64-bit CAS, internally known as cmpxchg8. To > address that the supports_cx8 flag was introduced so that on platforms > without cmpxchg8 native support, it could be emulated via other > techniques e.g. locking. (Note this is not without its own issues as all > accesses to the field must be done in a way that is consistent with the > use of locking by cmpxchg8 - word-tearing is a real risk). > > Internal to the VM we also have use of lock-free algorithms and atomic > operations, with the latter defined via atomic.hpp. Originally in that > code we needed to check supports_cx8 for platforms without 64-bit > support, but in practice we tended to avoid using 64-bit fields in such > cases so we could avoid the complexity of introducing lock-based emulation. > > Unfortunately, when the atomic interface in the VM was templatized and > redesigned, it appears that the fact cmpxchg8 may not be available was > overlooked and supports_cx8 is not consulted. Consequently if someone > introduced an atomic operation on a 64-bit field they would get a > linkage error on platforms without cmpxchg8 - so again if this happened > we tended to back away from using a 64-bit field. > > Along the way the access API in the VM was introduced, which also > provided atomic ops on oops and did consult supports_cx8 with a > lock-based fallback. > > We have now reached a point where there are cases where we do want > 64-bit atomic operations but we don't want the complexity of dealing > with platforms that don't support it. So we want to require that > supports_cx8 always be assumed true (the VM could abort at runtime if > run on a platform where it is not true) and we can then proceed with > 64-bit atomics in the VM and also remove all the lock-based fallbacks in > the access API and in the Java APIs. > > The OpenJDK has limited support for 32-bit platforms these days: PPC32 > was dropped a long time ago; Windows 32-bit is now a deprecated port > (but supports cmpxchg8 anyway); leaving only ARM32 as a platform of > potential concern. But even then we support cmpxchg8 in all known modern > implementations, as described in os_cpu/linux_arm/atomic_linux_arm.hpp: > > /* > ?* Atomic long operations on 32-bit ARM > ?* ARM v7 supports LDREXD/STREXD synchronization instructions so no > problem. > ?* ARM < v7 does not have explicit 64 atomic load/store capability. > ?* However, gcc emits LDRD/STRD instructions on v5te and LDM/STM on v5t > ?* when loading/storing 64 bits. > ?* For non-MP machines (which is all we support for ARM < v7) > ?* under current Linux distros these instructions appear atomic. > ?* See section A3.5.3 of ARM Architecture Reference Manual for ARM v7. > ?* Also, for cmpxchg64, if ARM < v7 we check for cmpxchg64 support in the > ?* Linux kernel using _kuser_helper_version. See entry-armv.S in the Linux > ?* kernel source or kernel_user_helpers.txt in Linux Doc. > ?*/ > > So the practical reality is that we do not expect to encounter any > mainstream OpenJDK platform where we don't in fact have support for > cmpxchg8. > > ------- > > Before I proceed with this does anyone have any strong and reasonable > objections? Is there some platform support aspect that has been overlooked? > > Note the JDK part could be (probably should be) done as a follow up RFE > to simplify the review and approval process. > > Thanks, > David From dholmes at openjdk.org Thu Oct 26 02:59:32 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 26 Oct 2023 02:59:32 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v3] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 09:46:51 GMT, Julian Waters wrote: >> On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. >> >> The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. >> >> Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method >> >> All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. >> >> This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into noreturn > - Minor Style Change in os_windows.cpp > - 8304939 On my to-do list ... > moved out of the os::win32 This doesn't seem to add any value. Seemed fine to me in os::win32 versus os class. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1780336868 PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1780337907 From dholmes at openjdk.org Thu Oct 26 03:12:36 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 26 Oct 2023 03:12:36 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v3] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 09:46:51 GMT, Julian Waters wrote: >> On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. >> >> The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. >> >> Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method >> >> All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. >> >> This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into noreturn > - Minor Style Change in os_windows.cpp > - 8304939 Supporting no_return seems a reasonable thing to do. General complaint is that there is too much incidental change in the PR that to me adds no value. Some specific objections below. Thanks src/hotspot/os/posix/os_posix.cpp line 894: > 892: sleep: // sleep forever ... > 893: ::sleep(100); // ... 100 seconds at a time > 894: goto sleep; I don't recall now why this was written the way it was, but I certainly do not understand why you rewrote it this way with a goto! src/hotspot/os/windows/os_windows.cpp line 3935: > 3933: sleep: // sleep forever ... > 3934: Sleep(100000); // ... 100 seconds at a time > 3935: goto sleep; Again there needs to be a very strong justification for doing it this way. src/hotspot/os/windows/os_windows.cpp line 4118: > 4116: } > 4117: > 4118: static void exit_process_or_thread(Ept what, int code) { exit_code was clearer and avoids changes below src/hotspot/os/windows/vmError_windows.cpp line 71: > 69: void VMError::raise_fail_fast(void* exrecord, void* context) { > 70: DWORD flags = (exrecord == nullptr) ? FAIL_FAST_GENERATE_EXCEPTION_ADDRESS : 0; > 71: RaiseFailFastException(static_cast(exrecord), static_cast(context), flags); Curious that MS do not declare this as no_return themselves. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16303#pullrequestreview-1698641909 PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1372520252 PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1372521965 PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1372522384 PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1372525405 From dholmes at openjdk.org Thu Oct 26 03:21:32 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 26 Oct 2023 03:21:32 GMT Subject: RFR: 8317697: refactor-encapsulate x86 VM_Version::CpuidInfo [v3] In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 12:47:47 GMT, Jan Kratochvil wrote: >> In OpenJDK project CRaC I had a [need to fetch new CpuidInfo without affecting the existing one](https://github.com/openjdk/crac/commit/ed4ad9ba31b77732dcede2eb743b2f389ec9a0fe#diff-6ed856c57ddbe33e49883adb7c52ec51ed377e5f697dfd6d8bea505a97bfc5a5R2743). >> Which led me to encapsulate the object more and I think this no-functionality-change patch is even appropriate for JDK. >> I am sure interested primarily to reduce the CRaC patchset boilerplate. > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Remove the automatic zero-initialization Okay this basic refactoring seems simpler/clearer to me. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16093#pullrequestreview-1698654611 From jwaters at openjdk.org Thu Oct 26 04:26:34 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 26 Oct 2023 04:26:34 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 02:58:53 GMT, David Holmes wrote: >> Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into noreturn >> - Minor Style Change in os_windows.cpp >> - 8304939 > > src/hotspot/os/posix/os_posix.cpp line 894: > >> 892: sleep: // sleep forever ... >> 893: ::sleep(100); // ... 100 seconds at a time >> 894: goto sleep; > > I don't recall now why this was written the way it was, but I certainly do not understand why you rewrote it this way with a goto! ah, when I was searching for functions to implement the noreturn with and stumbled across this one, I thought it could do with a goto instead of a while true since the intent seemed to be clearer. I can revert this if need be > src/hotspot/os/windows/os_windows.cpp line 4118: > >> 4116: } >> 4117: >> 4118: static void exit_process_or_thread(Ept what, int code) { > > exit_code was clearer and avoids changes below Noted, will return the name to exit_code ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1372564378 PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1372564195 From jwaters at openjdk.org Thu Oct 26 04:29:29 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 26 Oct 2023 04:29:29 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 02:56:31 GMT, David Holmes wrote: > > moved out of the os::win32 > > This doesn't seem to add any value. Seemed fine to me in os::win32 versus os class. I noticed that the method wasn't used anywhere outside of os_windows.cpp, and since I was rewriting it to be noreturn, I figured that a little refactoring of scope (from os::win32 to os_windows.cpp file scope) could help here ------------- PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1780395197 From jpai at openjdk.org Thu Oct 26 06:13:37 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Thu, 26 Oct 2023 06:13:37 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: On Wed, 25 Oct 2023 21:08:01 GMT, Leonid Mesnik wrote: > The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. > Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. > > A few tests start failing. > > The test > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java > has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. > > The test > java/lang/Thread/virtual/ThreadAPI.java > tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. > > Test > test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. > > Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. Hello Leonid, looking at the changes in this PR, what's being proposed is that when jtreg lauches tests through a virtual thread, then this wrapping code will set a JVM level `UncaughtExceptionHandler` by calling `Thread.setDefaultUncaughtExceptionHandler(...)`. The implementation of this `UncaughtExceptionHandler` calls `System.exit(1)`. Wouldn't that kill the test VM? I think that would then impact everything else including jtreg report generation and such for the test, isn't it? I had a look at https://bugs.openjdk.org/browse/JDK-8318839 but it doesn't have enough details to help understand what currently happens when a test launched through a virtual thread from jtreg throws an uncaught exception. How/what gets reported for that test execution? test/jdk/java/util/concurrent/tck/ThreadTest.java line 79: > 77: */ > 78: public void testGetAndSetDefaultUncaughtExceptionHandler() { > 79: assertNull(Thread.getDefaultUncaughtExceptionHandler()); I think this is a very broad change in this test case and either shouldn't be done or should be done conditionally (I can't think of the right condition now because I haven't fully grasped the context of this PR). test/jtreg_test_thread_factory/src/share/classes/Virtual.java line 37: > 35: // The virtual threads don't belong to any group and need global handler. > 36: Thread.setDefaultUncaughtExceptionHandler((t, e) -> { > 37: if (e instanceof ThreadDeath) { `ThreadDeath` has been deprecated for removal since Java 20, so this should no longer be needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1780470987 PR Review Comment: https://git.openjdk.org/jdk/pull/16369#discussion_r1372631138 PR Review Comment: https://git.openjdk.org/jdk/pull/16369#discussion_r1372630017 From fyang at openjdk.org Thu Oct 26 06:23:32 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 26 Oct 2023 06:23:32 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 15:21:08 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UDivI and UDivL? > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` > > ### Performance > ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) > > #### Long > **Before** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op > > > **After v1** > (This is a simpler version, please check the diff from `After v2` below) > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22668.228 ? 74.161 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 15966.320 ? 14.985 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 29518.033 ? 49.056 ns/op > > > **After v2** > (This is the current patch, **This version has a huge regression for negative values!!!**) > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 11432.738 ? 95.785 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 15969.044 ? 19.492 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 6376.674 ? 16.869 ns/op > > > ##### Diff of v1 from v2 > > diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp > index b96f7611133..dfb40e171e7 100644 > --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp > +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp > @@ -2432,16 +2432,7 @@ int MacroAssembler::corrected_idivq(Register result, Register rs1, Register rs2, > if (is_signed) { > div(result, rs1, rs2); > } else { > - Label Lltz, Ldone; > - bltz(rs2, Lltz); > divu(result, rs1, rs2); > - j(Ldone); > - bind(Lltz); // For the algorithm details, check j.l.Long::divideUnsigned > - sub(result, rs1, rs2); > - notr(result, result); > - andr(result, result, rs1); > - srli(result, result, 63); > - bind(Ldone); > } > } else { > rem(result, rs1, rs2); // result = rs1 % rs2; > > > > #### Integer > **B... So I tried this on Hifive Unmatched board. Unforunately, JMH test shows some regression for the LongDivMod.testDivideUnsigned `negative` case. Before: LongDivMod.testDivideUnsigned 1024 mixed avgt 15 24909.748 ? 17.915 ns/op LongDivMod.testDivideUnsigned 1024 positive avgt 15 36257.181 ? 33.615 ns/op LongDivMod.testDivideUnsigned 1024 negative avgt 15 6720.904 ? 8.522 ns/op <==== After: LongDivMod.testDivideUnsigned 1024 mixed avgt 15 13650.002 ? 52.788 ns/op LongDivMod.testDivideUnsigned 1024 positive avgt 15 18784.942 ? 18.258 ns/op LongDivMod.testDivideUnsigned 1024 negative avgt 15 7168.625 ? 17.019 ns/op <==== ------------- PR Comment: https://git.openjdk.org/jdk/pull/16346#issuecomment-1780482379 From alanb at openjdk.org Thu Oct 26 08:37:34 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 26 Oct 2023 08:37:34 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: On Wed, 25 Oct 2023 21:08:01 GMT, Leonid Mesnik wrote: > The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. > Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. > > A few tests start failing. > > The test > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java > has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. > > The test > java/lang/Thread/virtual/ThreadAPI.java > tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. > > Test > test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. > > Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. Having a UHE invoke System.exit is surprising. Are you saying that this is only for cases where a test launches a child VM with the thread factory set? Stepping back a bit. ThreadGroup is legacy and we eventually want it to go away. We've been deprecating and degrading it very slowly over many releases. So I think jtreg will eventually need to change. Right now, it creates AgentVMThreadGroup for agent VM mode so it controls the UHE where it starts the "main thread". I think it will eventually need to change this to set the system-wide UHE but this means it would handling uncaught exception thrown by "system threads", we may have to audit some of the exception handling if things come out of the woodwork. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1780660022 From mli at openjdk.org Thu Oct 26 09:03:01 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Oct 2023 09:03:01 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL [v2] In-Reply-To: References: Message-ID: > Hi, > Can you review the change to add intrinsic for UDivI and UDivL? > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` > > ### Performance > ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) > > #### Long > **Before** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op > > > **After v1** > (This is a simpler version, please check the diff from `After v2` below) > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22668.228 ? 74.161 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 15966.320 ? 14.985 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 29518.033 ? 49.056 ns/op > > > **After v2** > (This is the current patch, **This version has a huge regression for negative values!!!**) > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 11432.738 ? 95.785 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 15969.044 ? 19.492 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 6376.674 ? 16.869 ns/op > > > ##### Diff of v1 from v2 > > diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp > index b96f7611133..dfb40e171e7 100644 > --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp > +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp > @@ -2432,16 +2432,7 @@ int MacroAssembler::corrected_idivq(Register result, Register rs1, Register rs2, > if (is_signed) { > div(result, rs1, rs2); > } else { > - Label Lltz, Ldone; > - bltz(rs2, Lltz); > divu(result, rs1, rs2); > - j(Ldone); > - bind(Lltz); // For the algorithm details, check j.l.Long::divideUnsigned > - sub(result, rs1, rs2); > - notr(result, result); > - andr(result, result, rs1); > - srli(result, result, 63); > - bind(Ldone); > } > } else { > rem(result, rs1, rs2); // result = rs1 % rs2; > > > > #### Integer > **B... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: use divu instead of quick path opt; use explicit param for is_signed instead of default value(true) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16346/files - new: https://git.openjdk.org/jdk/pull/16346/files/1d276b31..319be70a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16346&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16346&range=00-01 Stats: 28 lines in 5 files changed: 4 ins; 9 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/16346.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16346/head:pull/16346 PR: https://git.openjdk.org/jdk/pull/16346 From mli at openjdk.org Thu Oct 26 09:06:32 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Oct 2023 09:06:32 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL [v2] In-Reply-To: References: Message-ID: <0SG1LFhb6ycUkbi0U0F6RhApA4Q52br2Pjf_ribOLXI=.602dc2c1-ae06-4b2d-99b3-f630f365ab37@github.com> On Wed, 25 Oct 2023 17:53:15 GMT, Andrew Haley wrote: >> Ok, so let's keep the version with branch. You should add a comment at https://github.com/openjdk/jdk/pull/16346/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR2435 explaining just that. > > But you're putting a conditional branch in the way of the common cases, and you're greatly increasing icache pressure, for the sake of a rare case. How does that make any sense? Thanks @theRealAph @merykitty for the comments, I agree with you. Have used the divu instead of introducing a cond branch here. And we will consider the regression of negative (also mixed) test cases as rare case, so just make sure the common case (positve one) get optimized. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1372829871 From mli at openjdk.org Thu Oct 26 09:06:35 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Oct 2023 09:06:35 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL [v2] In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 15:34:43 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 244: >> >>> 242: // idiv variant which deals with MINLONG as dividend and -1 as divisor >>> 243: int corrected_idivl(Register result, Register rs1, Register rs2, >>> 244: bool want_remainder, bool is_signed = true); >> >> Could you not set the default value of `is_signed` to `true`, to make it clear which case it is at the callsite. > > The reason I use a default value for `is_signed` is because both corrected_idivx are also used in cpu/riscv/c1_LIRAssembler_arith_riscv.cpp, which I dont' want to touch in this pr. > But if you insist, I can remove the default. I have removed the default values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1372831046 From mli at openjdk.org Thu Oct 26 09:09:32 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Oct 2023 09:09:32 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 06:20:30 GMT, Fei Yang wrote: >> Hi, >> Can you review the change to add intrinsic for UDivI and UDivL? >> Thanks! >> >> >> ## Tests >> >> ### Functionality >> Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` >> >> ### Performance >> ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) >> >> #### Long >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op >> >> >> **After v1** >> (This is a simpler version, please check the diff from `After v2` below) >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22668.228 ? 74.161 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 15966.320 ? 14.985 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 29518.033 ? 49.056 ns/op >> >> >> **After v2** >> (This is the current patch, **This version has a huge regression for negative values!!!**) >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 11432.738 ? 95.785 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 15969.044 ? 19.492 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6376.674 ? 16.869 ns/op >> >> >> ##### Diff of v1 from v2 >> >> diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> index b96f7611133..dfb40e171e7 100644 >> --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> @@ -2432,16 +2432,7 @@ int MacroAssembler::corrected_idivq(Register result, Register rs1, Register rs2, >> if (is_signed) { >> div(result, rs1, rs2); >> } else { >> - Label Lltz, Ldone; >> - bltz(rs2, Lltz); >> divu(result, rs1, rs2); >> - j(Ldone); >> - bind(Lltz); // For the algorithm details, check j.l.Long::divideUnsigned >> - sub(result, rs1, rs2); >> - notr(result, result); >> - andr(result, result, rs1); >> - srli(result, result, 63... > > So I tried this on Hifive Unmatched board. Unforunately, JMH test shows some regression for the LongDivMod.testDivideUnsigned `negative` case. > > Before: > > LongDivMod.testDivideUnsigned 1024 mixed avgt 15 24909.748 ? 17.915 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 15 36257.181 ? 33.615 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 15 6720.904 ? 8.522 ns/op <==== > > > After: > > LongDivMod.testDivideUnsigned 1024 mixed avgt 15 13650.002 ? 52.788 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 15 18784.942 ? 18.258 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 15 7168.625 ? 17.019 ns/op <==== Thanks @RealFYang for testing, I have used the divu instead of introducing the cond branch, and will consider negative case as rare case, only make sure positive get optimized. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16346#issuecomment-1780711483 From aph at openjdk.org Thu Oct 26 09:14:31 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Oct 2023 09:14:31 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: <460H-XH6Dc21M7PD7AkmCtYqVRFDmcbAxhIi9d7Se6Y=.69b83f35-3b02-4a04-8292-6b2523f6da33@github.com> On Thu, 26 Oct 2023 06:20:30 GMT, Fei Yang wrote: > So I tried this on Hifive Unmatched board. Unforunately, JMH test shows some regression for the LongDivMod.testDivideUnsigned `negative` case. But that case is going to be rare.The larger a number it is, the less common it is. The uniform distribution of this benchmark, in which 0 is as common as 0xb43a61c853a2af20, is grossly unrepresentative of real-world divisors. In practice, numbers follow some kind of log-normal distribution. Don't fall into the trap of optimizing for a benchmark. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16346#issuecomment-1780720717 From mli at openjdk.org Thu Oct 26 09:43:33 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Oct 2023 09:43:33 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL [v2] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 09:03:01 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for UDivI and UDivL? >> Thanks! >> >> >> ## Tests >> >> ### Functionality >> Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` >> >> ### Performance >> ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) >> >> #### Long >> **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** >> >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19684.873 ? 21.882 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28853.041 ? 6.425 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6367.239 ? 16.011 ns/op >> >> >> **After** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22622.133 ? 7.158 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 15957.272 ? 3.174 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 29499.721 ? 10.404 ns/op >> >> >> #### Integer >> **Before** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23397.267 ? 36.980 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16792.414 ? 5.869 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30184.357 ? 55.464 ns/op >> >> >> **After** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23210.437 ? 4.463 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16622.342 ? 4.047 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30013.414 ? 48.695 ns/op >> >> >> >> /************ following is just backup: quick path for negative divisor *************/ >> #### Long >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op >> >> >> **After v1** >> (This is a simp... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > use divu instead of quick path opt; use explicit param for is_signed instead of default value(true) `Don't fall into the trap of optimizing for a benchmark.` Agree I have updated the test result too, not too much change as previous v1 implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16346#issuecomment-1780766080 PR Comment: https://git.openjdk.org/jdk/pull/16346#issuecomment-1780767853 From jsjolen at openjdk.org Thu Oct 26 09:51:15 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 26 Oct 2023 09:51:15 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v8] In-Reply-To: References: Message-ID: > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Stefan's suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16276/files - new: https://git.openjdk.org/jdk/pull/16276/files/70b39e41..bb72984b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16276&range=06-07 Stats: 6 lines in 4 files changed: 2 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16276/head:pull/16276 PR: https://git.openjdk.org/jdk/pull/16276 From jsjolen at openjdk.org Thu Oct 26 09:51:16 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 26 Oct 2023 09:51:16 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v7] In-Reply-To: <81yK2Yxh7AVOSjVoAzZwIlriUwHRfN5s5LLowgA-34o=.1ed62ff1-d3d6-4fc1-8e3e-6ca945d86468@github.com> References: <81yK2Yxh7AVOSjVoAzZwIlriUwHRfN5s5LLowgA-34o=.1ed62ff1-d3d6-4fc1-8e3e-6ca945d86468@github.com> Message-ID: On Tue, 24 Oct 2023 11:51:45 GMT, Johan Sj?len wrote: >> I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? >> >> 1. Moved all the nmt source code from services/ to nmt/ >> 2. Renamed all the include statements and sorted them >> 3. Fixed the include guards > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge remote-tracking branch 'upstream/master' into move-nmt > - Fix stefank suggestions > - Merge remote-tracking branch 'origin/master' into move-nmt > - Fix messed up include > - Missed this include > - Merge remote-tracking branch 'origin/master' into move-nmt > - Fixed reviewed changes > - Move NMT to its own subdirectory Thanks Stefan, good catch on the ordering requirements. I'll integrate this ASAP. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16276#issuecomment-1780775333 From stefank at openjdk.org Thu Oct 26 10:03:34 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 26 Oct 2023 10:03:34 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v8] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 09:51:15 GMT, Johan Sj?len wrote: >> I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? >> >> 1. Moved all the nmt source code from services/ to nmt/ >> 2. Renamed all the include statements and sorted them >> 3. Fixed the include guards > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Stefan's suggestions Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16276#pullrequestreview-1699205834 From fyang at openjdk.org Thu Oct 26 10:07:33 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 26 Oct 2023 10:07:33 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL [v2] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 09:03:01 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for UDivI and UDivL? >> Thanks! >> >> >> ## Tests >> >> ### Functionality >> Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` >> >> ### Performance >> ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) >> >> #### Long >> **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** >> >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19684.873 ? 21.882 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28853.041 ? 6.425 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6367.239 ? 16.011 ns/op >> >> >> **After** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22622.133 ? 7.158 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 15957.272 ? 3.174 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 29499.721 ? 10.404 ns/op >> >> >> #### Integer >> **Before** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23397.267 ? 36.980 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16792.414 ? 5.869 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30184.357 ? 55.464 ns/op >> >> >> **After** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23210.437 ? 4.463 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16622.342 ? 4.047 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30013.414 ? 48.695 ns/op >> >> >> >> /************ following is just backup: quick path for negative divisor *************/ >> #### Long >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op >> >> >> **After v1** >> (This is a simp... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > use divu instead of quick path opt; use explicit param for is_signed instead of default value(true) Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/riscv.ad line 6736: > 6734: "and $dst, $dst, $src1\n\t" > 6735: "srli $dst, $dst, 63\n\t" > 6736: "Ldone:\t#@UdivL" You might want to update this part to reflect the latest changes. ------------- PR Review: https://git.openjdk.org/jdk/pull/16346#pullrequestreview-1699212752 PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1372914398 From aph at openjdk.org Thu Oct 26 10:12:42 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Oct 2023 10:12:42 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v12] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: s/Denormal/Subnormal/g ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/7cba08d0..91eb9bb7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=10-11 Stats: 27 lines in 6 files changed: 1 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From mli at openjdk.org Thu Oct 26 10:13:46 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Oct 2023 10:13:46 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL [v3] In-Reply-To: References: Message-ID: > Hi, > Can you review the change to add intrinsic for UDivI and UDivL? > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` > > ### Performance > ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) > > #### Long > **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** > > **Before** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19684.873 ? 21.882 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 28853.041 ? 6.425 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 6367.239 ? 16.011 ns/op > > > **After** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22622.133 ? 7.158 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 15957.272 ? 3.174 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 29499.721 ? 10.404 ns/op > > > #### Integer > **Before** > > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23397.267 ? 36.980 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16792.414 ? 5.869 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30184.357 ? 55.464 ns/op > > > **After** > > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23210.437 ? 4.463 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16622.342 ? 4.047 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30013.414 ? 48.695 ns/op > > > > /************ following is just backup: quick path for negative divisor *************/ > #### Long > **Before** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op > > > **After v1** > (This is a simpler version, please check the diff from `After v2` below) > > LongDivMod.testDivideUnsigned 1024 ... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Fix format for UdivL ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16346/files - new: https://git.openjdk.org/jdk/pull/16346/files/319be70a..ba16fb12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16346&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16346&range=01-02 Stats: 11 lines in 1 file changed: 0 ins; 10 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16346.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16346/head:pull/16346 PR: https://git.openjdk.org/jdk/pull/16346 From mli at openjdk.org Thu Oct 26 10:13:49 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Oct 2023 10:13:49 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL [v2] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 10:04:56 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> use divu instead of quick path opt; use explicit param for is_signed instead of default value(true) > > src/hotspot/cpu/riscv/riscv.ad line 6736: > >> 6734: "and $dst, $dst, $src1\n\t" >> 6735: "srli $dst, $dst, 63\n\t" >> 6736: "Ldone:\t#@UdivL" > > You might want to update this part to reflect the latest changes. Thanks for catching it. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16346#discussion_r1372921549 From aph at openjdk.org Thu Oct 26 10:30:12 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Oct 2023 10:30:12 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v13] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: - Merge from head - s/Denormal/Subnormal/g - Review feedback - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 - Comments only. - Remove change to RestoreMXCSROnJNICalls - Review feedback - Add TestDenormalDouble.java - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 - Fix LLVM - ... and 26 more: https://git.openjdk.org/jdk/compare/3cea892b...393a4384 ------------- Changes: https://git.openjdk.org/jdk/pull/10661/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=12 Stats: 286 lines in 10 files changed: 284 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Thu Oct 26 10:30:12 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Oct 2023 10:30:12 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Thu, 19 Oct 2023 09:31:36 GMT, Andrew Haley wrote: >> src/hotspot/os/linux/os_linux.cpp line 1853: >> >>> 1851: >>> 1852: #ifndef IA32 >>> 1853: // Quickly test to make sure denormals are correctly handled. >> >> Nit: I recommend using "subnormal" rather than "denormal" for general terminology on this point. While "denormal" was used in the original IEEE 754 standard from 1985, subsequent iterations of the standard using "subnormal" The term "subnormal" has also been used for the last several editions of JLS and JVMS. > > I've long avoided "subnormal" because > > subnormal in British English, adjective > 3. [old-fashioned, offensive] a person of low intelligence All done. Good to go? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1372936648 From aph at openjdk.org Thu Oct 26 10:40:18 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Oct 2023 10:40:18 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v14] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Remove dead code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/393a4384..c554746b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=12-13 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Thu Oct 26 10:41:45 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Oct 2023 10:41:45 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: <0YkU44VdBbj08LTg9zw27GoMHMs1GNEz8U2nUb1wkpc=.29b8c5a2-2384-4566-9c5f-e483a1437ef2@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <0YkU44VdBbj08LTg9zw27GoMHMs1GNEz8U2nUb1wkpc=.29b8c5a2-2384-4566-9c5f-e483a1437ef2@github.com> Message-ID: On Wed, 18 Oct 2023 19:13:40 GMT, Vladimir Ivanov wrote: > > Meta-question and apologies if this was covered before, but why is this logic being added to stubRoutines.cpp? > > Because tha'ts where @iwanowww asked me to put it. I don't much care. Hi @iwanowww , do you have any suggestion about where to move the remaining code that's in stubRoutines.cpp? @dholmes-ora ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1780862336 From stuefe at openjdk.org Thu Oct 26 11:48:33 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 26 Oct 2023 11:48:33 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v2] In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 08:58:35 GMT, Aleksey Shipilev wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> fix -UseCCP case > > I see a few other places where `UseCompressedClassPointers` is used: `oopDesc::load_klass_raw` and `oopDesc::has_klass_gap`. Both seem to be only used from the diagnostic code, so their performance might not be relevant. Should those keep being omitted? > > All other uses I see are in either compiler code (where we can afford reading the flag directly) or non-perf-critical parts of runtime. @shipilev , @theRealAph how to continue? Should I scrap this? The improvements are there, albeit bit hard to measure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15389#issuecomment-1780954266 From shade at openjdk.org Thu Oct 26 11:53:34 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Oct 2023 11:53:34 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v8] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 13:42:26 GMT, Thomas Stuefe wrote: >> Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. >> >> Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. >> >> >> 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 >> 8b7b69: 0f b6 00 movzbl (%rax),%eax >> 8b7b6c: 84 c0 test %al,%al >> 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> >> 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi >> 8b7b7e: 8b 0a mov (%rdx),%ecx >> 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> >> 8b7b87: 48 d3 e7 shl %cl,%rdi >> 8b7b8a: 48 03 3a add (%rdx),%rdi >> >> >> Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. >> >> >> 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> >> 8ba309: 48 8b 08 mov (%rax),%rcx >> 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? >> 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi >> 8ba318: 48 d3 e7 shl %cl,%rdi # shift >> 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base >> 8ba31e: 48 01 cf add %rcx,%rdi # add base >> 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx >> >> --- >> >> Performance measurements: >> >> G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. >> >> I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. >> >> --- >> >> Future extensions: >> >> This patch uses the fact that the encoding base is aligned to metaspace reser... > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - simplify assert > - add comment This is a "papercut" issue: even though minor, it adds up on critical path. It would be more important with Lilliput landing. So I would still like to see this patch in. It's just because it is minor, it does not get enough attention review-wise. Keep it open. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15389#issuecomment-1780959761 From simonis at openjdk.org Thu Oct 26 12:10:38 2023 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 26 Oct 2023 12:10:38 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v32] In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 20:37:46 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Use 64-bit atomic add for incrementing counters Thanks for your patience :) Looks good to me. ------------- Marked as reviewed by simonis (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1699438077 From aph at openjdk.org Thu Oct 26 12:18:37 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Oct 2023 12:18:37 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v8] In-Reply-To: References: Message-ID: On Tue, 10 Oct 2023 13:42:26 GMT, Thomas Stuefe wrote: >> Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. >> >> Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. >> >> >> 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 >> 8b7b69: 0f b6 00 movzbl (%rax),%eax >> 8b7b6c: 84 c0 test %al,%al >> 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> >> 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi >> 8b7b7e: 8b 0a mov (%rdx),%ecx >> 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> >> 8b7b87: 48 d3 e7 shl %cl,%rdi >> 8b7b8a: 48 03 3a add (%rdx),%rdi >> >> >> Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. >> >> >> 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> >> 8ba309: 48 8b 08 mov (%rax),%rcx >> 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? >> 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi >> 8ba318: 48 d3 e7 shl %cl,%rdi # shift >> 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base >> 8ba31e: 48 01 cf add %rcx,%rdi # add base >> 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx >> >> --- >> >> Performance measurements: >> >> G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. >> >> I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. >> >> --- >> >> Future extensions: >> >> This patch uses the fact that the encoding base is aligned to metaspace reser... > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - simplify assert > - add comment src/hotspot/share/oops/compressedKlass.hpp line 68: > 66: // - Bit 8 UseCompressedClassPointers > 67: // - Bits [16-64] the base. > 68: static uint64_t _combo; Suggestion: static uint64_t _compressionInfo; Otherwise, IMO this patch is good to go. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1373063223 From luhenry at openjdk.org Thu Oct 26 12:22:36 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 26 Oct 2023 12:22:36 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 10:13:46 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for UDivI and UDivL? >> Thanks! >> >> >> ## Tests >> >> ### Functionality >> Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` >> >> ### Performance >> ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) >> >> #### Long >> **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** >> >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19684.873 ? 21.882 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28853.041 ? 6.425 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6367.239 ? 16.011 ns/op >> >> >> **After** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22622.133 ? 7.158 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 15957.272 ? 3.174 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 29499.721 ? 10.404 ns/op >> >> >> #### Integer >> **Before** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23397.267 ? 36.980 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16792.414 ? 5.869 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30184.357 ? 55.464 ns/op >> >> >> **After** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23210.437 ? 4.463 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16622.342 ? 4.047 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30013.414 ? 48.695 ns/op >> >> >> >> /************ following is just backup: quick path for negative divisor *************/ >> #### Long >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op >> >> >> **After v1** >> (This is a simp... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix format for UdivL Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16346#pullrequestreview-1699487384 From aturbanov at openjdk.org Thu Oct 26 12:33:44 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Thu, 26 Oct 2023 12:33:44 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 15:09:57 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - a -> an > - add note to downcallHandle about passing heap segments by-reference test/jdk/java/foreign/critical/TestStressAllowHeap.java line 105: > 103: try { > 104: doStep(handle, sequence); > 105: } catch(Throwable t) { nit Suggestion: } catch (Throwable t) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1373092754 From ayang at openjdk.org Thu Oct 26 12:52:42 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 26 Oct 2023 12:52:42 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v32] In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 20:37:46 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Use 64-bit atomic add for incrementing counters src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp line 138: > 136: _vtime_accum = (os::elapsedVTime() - _vtime_start); > 137: > 138: cm()->update_concurrent_mark_threads_cpu_time(); Is there some overlapping btw this and the existing `_vtime_accum`. If so, can they be consolidated somehow? I believe the purpose of calling `update_concurrent_mark_threads_cpu_time` in multiple places is to get more up-to-date conc-cpu-time. Reading through the JBS ticket, I don't see the motivation for maintaining such a "fresh" value. Finally, is CSR required for this feature? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1373116720 From aph at openjdk.org Thu Oct 26 13:52:35 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Oct 2023 13:52:35 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 10:13:46 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for UDivI and UDivL? >> Thanks! >> >> >> ## Tests >> >> ### Functionality >> Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` >> >> ### Performance >> ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) >> >> #### Long >> **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** >> >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19684.873 ? 21.882 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28853.041 ? 6.425 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6367.239 ? 16.011 ns/op >> >> >> **After** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22622.133 ? 7.158 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 15957.272 ? 3.174 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 29499.721 ? 10.404 ns/op >> >> >> #### Integer >> **Before** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23397.267 ? 36.980 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16792.414 ? 5.869 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30184.357 ? 55.464 ns/op >> >> >> **After** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23210.437 ? 4.463 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16622.342 ? 4.047 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30013.414 ? 48.695 ns/op >> >> >> >> /************ following is just backup: quick path for negative divisor *************/ >> #### Long >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op >> >> >> **After v1** >> (This is a simp... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix format for UdivL Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16346#pullrequestreview-1699716686 From fjiang at openjdk.org Thu Oct 26 14:03:59 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 26 Oct 2023 14:03:59 GMT Subject: RFR: 8318827: RISC-V: Improve readability of fclass result testing [v2] In-Reply-To: References: Message-ID: <3TMmjVhSLSFkzuV5pftlyi6ii4TnkuFcjpBJ58wDdSg=.75d17a93-f200-4b8a-9312-d21af2b035af@github.com> > Hi, please consider. > > Currently, we test results of `fclass` instruction with hard-coded bits which has bad readability. This patch adds an enumeration of the flcass mask bits for ease of use. > > Testing: > > - [ ] tier1 with release build Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: adjust enum name style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16362/files - new: https://git.openjdk.org/jdk/pull/16362/files/069bb876..593913ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16362&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16362&range=00-01 Stats: 24 lines in 4 files changed: 0 ins; 0 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/16362.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16362/head:pull/16362 PR: https://git.openjdk.org/jdk/pull/16362 From jsjolen at openjdk.org Thu Oct 26 14:06:48 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 26 Oct 2023 14:06:48 GMT Subject: RFR: 8318447: Move NMT source code to own subdirectory [v8] In-Reply-To: References: Message-ID: <7AjGEXTYBkbRhxV8OiYMhbo-hCALbZI_Oa2ee9Pkk34=.035bcce9-eaec-4a65-a520-758b3723a598@github.com> On Thu, 26 Oct 2023 09:51:15 GMT, Johan Sj?len wrote: >> I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? >> >> 1. Moved all the nmt source code from services/ to nmt/ >> 2. Renamed all the include statements and sorted them >> 3. Fixed the include guards > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Stefan's suggestions Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16276#issuecomment-1781194359 From jsjolen at openjdk.org Thu Oct 26 14:06:49 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 26 Oct 2023 14:06:49 GMT Subject: Integrated: 8318447: Move NMT source code to own subdirectory In-Reply-To: References: Message-ID: <1VfmMoDMZl4JslQojSgYubYs8wLQJpEhiJlmaJTTzvA=.ef7284a5-49e3-40fb-b2e1-0e3fa8ae5a98@github.com> On Thu, 19 Oct 2023 20:06:50 GMT, Johan Sj?len wrote: > I think that NMT is deserving of its own subdirectory. Can we do a review of the changes before I fix the merge conflicts? > > 1. Moved all the nmt source code from services/ to nmt/ > 2. Renamed all the include statements and sorted them > 3. Fixed the include guards This pull request has now been integrated. Changeset: 9864951d Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/9864951dceb0ddc4479ced04b6d5a2363f1e307d Stats: 506 lines in 102 files changed: 212 ins; 219 del; 75 mod 8318447: Move NMT source code to own subdirectory Reviewed-by: stefank, dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/16276 From fjiang at openjdk.org Thu Oct 26 14:09:35 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 26 Oct 2023 14:09:35 GMT Subject: RFR: 8318827: RISC-V: Improve readability of fclass result testing [v2] In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 15:22:10 GMT, Ludovic Henry wrote: >> Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: >> >> adjust enum name style > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 1094: > >> 1092: FCLASS_NAN = FCLASS_SNAN | FCLASS_QNAN, >> 1093: FCLASS_FINITE = FCLASS_ZERO | FCLASS_SUBNORM | FCLASS_NORM, >> 1094: }; > > We use lower-case and no prefix for similar enums (ex: https://github.com/openjdk/jdk/blob/069bb87693ec7944f08062e3a7e460653324e6cb/src/hotspot/cpu/riscv/assembler_riscv.hpp#L1119-L1145) > > Suggestion: > > enum fclass_mask { > minf = 1 << 0, // negative infinite > mnorm = 1 << 1, // negative normal number > msubnorm = 1 << 2, // negative subnormal number > mzero = 1 << 3, // negative zero > pzero = 1 << 4, // positive zero > psubnorm = 1 << 5, // positive subnormal number > pnorm = 1 << 6, // positive normal number > pinf = 1 << 7, // positive infinite > snan = 1 << 8, // signaling NaN > qnan = 1 << 9, // quiet NaN > zero = mzero | pzero, > subnorm = msubnorm | psubnorm, > norm = mnorm | pnorm, > inf = minf | pinf, > nan = snan | qnan, > finite = zero | subnorm | norm, > }; Changed the enum to lowercase. But I think we should keep the `flcass_` prefix to make sure it?s only used for `flcass`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16362#discussion_r1373240773 From aph at openjdk.org Thu Oct 26 14:17:02 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Oct 2023 14:17:02 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v15] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Duh ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/c554746b..2ca6f8c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From simonis at openjdk.org Thu Oct 26 14:34:40 2023 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 26 Oct 2023 14:34:40 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v32] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 12:49:21 GMT, Albert Mingkun Yang wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> Use 64-bit atomic add for incrementing counters > > src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp line 138: > >> 136: _vtime_accum = (os::elapsedVTime() - _vtime_start); >> 137: >> 138: cm()->update_concurrent_mark_threads_cpu_time(); > > Is there some overlapping btw this and the existing `_vtime_accum`. If so, can they be consolidated somehow? > > I believe the purpose of calling `update_concurrent_mark_threads_cpu_time` in multiple places is to get more up-to-date conc-cpu-time. Reading through the JBS ticket, I don't see the motivation for maintaining such a "fresh" value. > > Finally, is CSR required for this feature? @albertnetymk, the hsperf counters are a non-public API and the new counters have been added to the non-standard `sun.threads.cpu_time` name space which is "[unstable and unsupported](https://github.com/openjdk/jdk/blob/9864951dceb0ddc4479ced04b6d5a2363f1e307d/src/hotspot/share/runtime/perfData.cpp#L56)" so I don't think a CSR is required. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1373278962 From fyang at openjdk.org Thu Oct 26 15:03:38 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 26 Oct 2023 15:03:38 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 10:13:46 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for UDivI and UDivL? >> Thanks! >> >> >> ## Tests >> >> ### Functionality >> Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` >> >> ### Performance >> ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) >> >> #### Long >> **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** >> >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19684.873 ? 21.882 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28853.041 ? 6.425 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6367.239 ? 16.011 ns/op >> >> >> **After** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22622.133 ? 7.158 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 15957.272 ? 3.174 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 29499.721 ? 10.404 ns/op >> >> >> #### Integer >> **Before** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23397.267 ? 36.980 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16792.414 ? 5.869 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30184.357 ? 55.464 ns/op >> >> >> **After** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23210.437 ? 4.463 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16622.342 ? 4.047 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30013.414 ? 48.695 ns/op >> >> >> >> /************ following is just backup: quick path for negative divisor *************/ >> #### Long >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op >> >> >> **After v1** >> (This is a simp... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix format for UdivL Updated change LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16346#pullrequestreview-1699887036 From vkempik at openjdk.org Thu Oct 26 15:12:34 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 26 Oct 2023 15:12:34 GMT Subject: RFR: 8318827: RISC-V: Improve readability of fclass result testing [v2] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 14:05:58 GMT, Feilong Jiang wrote: >> src/hotspot/cpu/riscv/assembler_riscv.hpp line 1094: >> >>> 1092: FCLASS_NAN = FCLASS_SNAN | FCLASS_QNAN, >>> 1093: FCLASS_FINITE = FCLASS_ZERO | FCLASS_SUBNORM | FCLASS_NORM, >>> 1094: }; >> >> We use lower-case and no prefix for similar enums (ex: https://github.com/openjdk/jdk/blob/069bb87693ec7944f08062e3a7e460653324e6cb/src/hotspot/cpu/riscv/assembler_riscv.hpp#L1119-L1145) >> >> Suggestion: >> >> enum fclass_mask { >> minf = 1 << 0, // negative infinite >> mnorm = 1 << 1, // negative normal number >> msubnorm = 1 << 2, // negative subnormal number >> mzero = 1 << 3, // negative zero >> pzero = 1 << 4, // positive zero >> psubnorm = 1 << 5, // positive subnormal number >> pnorm = 1 << 6, // positive normal number >> pinf = 1 << 7, // positive infinite >> snan = 1 << 8, // signaling NaN >> qnan = 1 << 9, // quiet NaN >> zero = mzero | pzero, >> subnorm = msubnorm | psubnorm, >> norm = mnorm | pnorm, >> inf = minf | pinf, >> nan = snan | qnan, >> finite = zero | subnorm | norm, >> }; > > Changed the enum to lowercase. But I think we should keep the `flcass_` prefix to make sure it?s only used for `flcass`. just use them as Assembler::fclass_mask::msubnorm ( when enum itself is given the name) and it's pretty clear what type it describes ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16362#discussion_r1373336563 From lkorinth at openjdk.org Thu Oct 26 15:33:38 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Thu, 26 Oct 2023 15:33:38 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v7] In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 08:44:29 GMT, Leo Korinth wrote: >> This pull request renames `createJavaProcessBuilder` to `createLimitedTestJavaProcessBuilder` and renames `createTestJvm` to `createTestJavaProcessBuilder`. Both are implemented through a private `createJavaProcessBuilder`. It also updates the java doc. >> >> This is so that it should be harder to by mistake use `createLimitedTestJavaProcessBuilder` that is problematic because it will not forward JVM flags to the tested JVM. > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright year and indentation Would it be okay if we handle the new method documentation in a separate pull request? After applying your changes, I also noted that the existing description `The command line will be like: {test.jdk}/bin/java {test.vm.opts} {test.java.opts} cmds` is not only incorrect (or at least incomplete), but now also clashes with the added description. I then removed the sentence, but after I did that I also found out that similar wording exist in `executeTestJvm` and I fear that if I continue to pull strings, I will create more and more changes that you will have opinions on. Is it all right if we push what we have now, and that I create a new pull requests with these improvements in documentation that are actually not related to the changes in this pull request? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15452#issuecomment-1781359450 From fjiang at openjdk.org Thu Oct 26 15:35:49 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 26 Oct 2023 15:35:49 GMT Subject: RFR: 8318827: RISC-V: Improve readability of fclass result testing [v3] In-Reply-To: References: Message-ID: > Hi, please consider. > > Currently, we test results of `fclass` instruction with hard-coded bits which has bad readability. This patch adds an enumeration of the flcass mask bits for ease of use. > > Testing: > > - [ ] tier1 with release build Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fclass-mask - remove 'fclass_' prefix - adjust enum name style - Add FCLASS_MASK enum for better readability ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16362/files - new: https://git.openjdk.org/jdk/pull/16362/files/593913ef..8ffc88a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16362&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16362&range=01-02 Stats: 4762 lines in 256 files changed: 2465 ins; 1682 del; 615 mod Patch: https://git.openjdk.org/jdk/pull/16362.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16362/head:pull/16362 PR: https://git.openjdk.org/jdk/pull/16362 From stuefe at openjdk.org Thu Oct 26 15:44:45 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 26 Oct 2023 15:44:45 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v15] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Thu, 26 Oct 2023 14:17:02 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Duh This looks good to me. One suggestion: to reduce code duplication and to make the code a bit safer against accidental returns prior to fesetenv, I would have used a mark object like this: struct FenvResetMark { fenv_t _fenv; FenvResetMark() { fegetenv(&_fenv); } ~FenvResetMark() { fesetenv(&_fenv); } }; and would have placed it above dlopen calls. Up to you. --- About the dlopen calls in the JDK, at SAP we were faced with similar problems for other libc APIs (how to apply a fix to all of them). Some of these issues we solved by redirecting all calls to libjvm. Others we solved manually, in-place, with a lot of duplication. None of these sound appealing, but I like the redirect-to-libjvm route somewhat, if Oracle can be convinced. A third option would be to use an interposition library with LD_PRELOAD. One that overwrites dlopen and redirects to the real one. I don't see this to be a practical solution but it may be valid for testing. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10661#pullrequestreview-1699974743 From stuefe at openjdk.org Thu Oct 26 16:02:47 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 26 Oct 2023 16:02:47 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v15] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <3pqaeZsqg2XlpZvNMJ9qjAUq5PLPnOR3wCrDesXsvHM=.c6a7de5f-c93c-435a-8fed-78fe5abab22b@github.com> On Thu, 26 Oct 2023 14:17:02 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Duh One more thought, it would be good to add the FTZ_mode_enabled check to `os::run_periodic_checks()`. We already do signal handler checks there, and it is the right place to check for "global things third party native code may mess up". It runs when one uses `-XX:CheckJNICalls`. If a native library messes with fenv, one will get a delayed assertion, with a hs-err file that lists all the shared objects. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1781408815 From stuefe at openjdk.org Thu Oct 26 16:11:04 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 26 Oct 2023 16:11:04 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v9] In-Reply-To: References: Message-ID: > Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. > > Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. > > > 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 > 8b7b69: 0f b6 00 movzbl (%rax),%eax > 8b7b6c: 84 c0 test %al,%al > 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> > 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi > 8b7b7e: 8b 0a mov (%rdx),%ecx > 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> > 8b7b87: 48 d3 e7 shl %cl,%rdi > 8b7b8a: 48 03 3a add (%rdx),%rdi > > > Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. > > > 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> > 8ba309: 48 8b 08 mov (%rax),%rcx > 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? > 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi > 8ba318: 48 d3 e7 shl %cl,%rdi # shift > 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base > 8ba31e: 48 01 cf add %rcx,%rdi # add base > 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx > > --- > > Performance measurements: > > G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. > > I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. > > --- > > Future extensions: > > This patch uses the fact that the encoding base is aligned to metaspace reserve alignment (16 Mb). We only use 16 of those 24 bits of alignment shadow and could us... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Renamed _combo - Merge branch 'master' into optimize-narrow-klass-decoding-in-c++ - simplify assert - add comment - Update src/hotspot/share/oops/compressedKlass.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/compressedKlass.cpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/compressedKlass.cpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/compressedKlass.cpp Co-authored-by: Aleksey Shipil?v - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ - ... and 6 more: https://git.openjdk.org/jdk/compare/9864951d...56cde2a9 ------------- Changes: https://git.openjdk.org/jdk/pull/15389/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=08 Stats: 64 lines in 3 files changed: 38 ins; 13 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/15389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15389/head:pull/15389 PR: https://git.openjdk.org/jdk/pull/15389 From stuefe at openjdk.org Thu Oct 26 16:11:07 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 26 Oct 2023 16:11:07 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v4] In-Reply-To: References: Message-ID: On Mon, 4 Sep 2023 09:28:10 GMT, Andrew Haley wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ >> - APH feedback >> - Merge branch 'master' into optimize-narrow-klass-decoding-in-c++ >> - fix -UseCCP case >> - use 16 bit alignment >> - with raw bit ops > > So, I was wondering if there is there some reason to do all this manually? It looks like an obvious candidate for bitfields. Thanks @theRealAph, did the rename you suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15389#issuecomment-1781423723 From mli at openjdk.org Thu Oct 26 16:13:55 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Oct 2023 16:13:55 GMT Subject: RFR: 8318723: RISC-V: C2 UDivL [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 10:13:46 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for UDivI and UDivL? >> Thanks! >> >> >> ## Tests >> >> ### Functionality >> Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` >> >> ### Performance >> ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) >> >> #### Long >> **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** >> >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19684.873 ? 21.882 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28853.041 ? 6.425 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6367.239 ? 16.011 ns/op >> >> >> **After** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22622.133 ? 7.158 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 15957.272 ? 3.174 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 29499.721 ? 10.404 ns/op >> >> >> #### Integer >> **Before** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23397.267 ? 36.980 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16792.414 ? 5.869 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30184.357 ? 55.464 ns/op >> >> >> **After** >> >> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23210.437 ? 4.463 ns/op >> IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16622.342 ? 4.047 ns/op >> IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30013.414 ? 48.695 ns/op >> >> >> >> /************ following is just backup: quick path for negative divisor *************/ >> #### Long >> **Before** >> >> LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op >> LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op >> LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op >> >> >> **After v1** >> (This is a simp... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix format for UdivL @RealFYang @theRealAph @luhenry @merykitty Thanks for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16346#issuecomment-1781424921 From mli at openjdk.org Thu Oct 26 16:13:56 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Oct 2023 16:13:56 GMT Subject: Integrated: 8318723: RISC-V: C2 UDivL In-Reply-To: References: Message-ID: <0y7vjdBlraK_BAdhfQedH-c9PTRxN-jGAsYWGVgd1Lk=.0dc069d4-a715-4fb3-b7e1-feb15246a677@github.com> On Tue, 24 Oct 2023 15:21:08 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UDivI and UDivL? > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we divideUnsigned` and `grep -nr test/hotspot/ -we divideUnsigned` > > ### Performance > ( NOTE: there are another 2 related issues: https://bugs.openjdk.org/browse/JDK-8318225, https://bugs.openjdk.org/browse/JDK-8318226, the pr of which will be subseqently sent out after this one finished. ) > > #### Long > **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** > > **Before** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19684.873 ? 21.882 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 28853.041 ? 6.425 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 6367.239 ? 16.011 ns/op > > > **After** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 22622.133 ? 7.158 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 15957.272 ? 3.174 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 29499.721 ? 10.404 ns/op > > > #### Integer > **Before** > > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23397.267 ? 36.980 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16792.414 ? 5.869 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30184.357 ? 55.464 ns/op > > > **After** > > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 10 23210.437 ? 4.463 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 10 16622.342 ? 4.047 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 10 30013.414 ? 48.695 ns/op > > > > /************ following is just backup: quick path for negative divisor *************/ > #### Long > **Before** > > LongDivMod.testDivideUnsigned 1024 mixed avgt 10 19704.317 ? 64.078 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 10 28856.859 ? 14.901 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 10 6364.974 ? 2.465 ns/op > > > **After v1** > (This is a simpler version, please check the diff from `After v2` below) > > LongDivMod.testDivideUnsigned 1024 ... This pull request has now been integrated. Changeset: 40a3c35a Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/40a3c35aa5614be4505013d4e92ddb1b556a3622 Stats: 70 lines in 7 files changed: 47 ins; 0 del; 23 mod 8318723: RISC-V: C2 UDivL 8318224: RISC-V: C2 UDivI Reviewed-by: fyang, luhenry, aph ------------- PR: https://git.openjdk.org/jdk/pull/16346 From duke at openjdk.org Thu Oct 26 16:25:41 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Thu, 26 Oct 2023 16:25:41 GMT Subject: RFR: 8306561: Gtests crash with SIGSEGV in print_pointer_information on AIX Message-ID: MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. ------------- Commit messages: - 8306561: copyright and problem listing - JDK-8306561: test canary Changes: https://git.openjdk.org/jdk/pull/16381/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16381&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306561 Stats: 9 lines in 2 files changed: 5 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16381/head:pull/16381 PR: https://git.openjdk.org/jdk/pull/16381 From rriggs at openjdk.org Thu Oct 26 16:29:40 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 26 Oct 2023 16:29:40 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v7] In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 08:44:29 GMT, Leo Korinth wrote: >> This pull request renames `createJavaProcessBuilder` to `createLimitedTestJavaProcessBuilder` and renames `createTestJvm` to `createTestJavaProcessBuilder`. Both are implemented through a private `createJavaProcessBuilder`. It also updates the java doc. >> >> This is so that it should be harder to by mistake use `createLimitedTestJavaProcessBuilder` that is problematic because it will not forward JVM flags to the tested JVM. > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright year and indentation ok, to correct the javadoc in a subsequent PR. ------------- Marked as reviewed by rriggs (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15452#pullrequestreview-1700061972 From stuefe at openjdk.org Thu Oct 26 16:30:31 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 26 Oct 2023 16:30:31 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 16:11:00 GMT, Thomas Obermeier wrote: > MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. > > We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. > > As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. @TOatGithub I changed the description in JBS since this is not AIX specific. Could you adjust the PR title please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1781454177 From stuefe at openjdk.org Thu Oct 26 16:38:30 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 26 Oct 2023 16:38:30 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 16:11:00 GMT, Thomas Obermeier wrote: > MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. > > We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. > > As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. I would fix it another way. The underlying assumption in MallocHeader::looks_valid() is that the header resides fully in readable memory. The caller must make sure of that. The better way to fix this would be in print_pointer_information(). It must make sure, before calling MallocHeader::looks_valid(), that the header is contained fully in readable memory. Something like: - if (!os::is_readable_pointer(here)) { + if (!os::is_readable_pointer(here) || !os::is_readable_pointer(here + sizeof(MallocHeader) { or something similar. ------------- Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16381#pullrequestreview-1700078017 From aph at openjdk.org Thu Oct 26 16:46:04 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Oct 2023 16:46:04 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v16] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Remove accidental include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/2ca6f8c4..bd51efde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From duke at openjdk.org Thu Oct 26 16:48:31 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Thu, 26 Oct 2023 16:48:31 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 16:11:00 GMT, Thomas Obermeier wrote: > MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. > > We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. > > As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. Agreed. Would you still prefer to blow up if the caller hands down bad pointers? In other words: should MallocHeader::looks_valid() not do my additional check? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1781482111 From stuefe at openjdk.org Thu Oct 26 17:11:31 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 26 Oct 2023 17:11:31 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 16:11:00 GMT, Thomas Obermeier wrote: > MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. > > We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. > > As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. Yes, let it blow up then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1781515319 From vkempik at openjdk.org Thu Oct 26 17:11:34 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 26 Oct 2023 17:11:34 GMT Subject: RFR: 8318827: RISC-V: Improve readability of fclass result testing [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 15:35:49 GMT, Feilong Jiang wrote: >> Hi, please consider. >> >> Currently, we test results of `fclass` instruction with hard-coded bits which has bad readability. This patch adds an enumeration of the flcass mask bits for ease of use. >> >> Testing: >> >> - [ ] tier1 with release build > > Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fclass-mask > - remove 'fclass_' prefix > - adjust enum name style > - Add FCLASS_MASK enum for better readability Marked as reviewed by vkempik (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16362#pullrequestreview-1700130830 From luhenry at openjdk.org Thu Oct 26 17:11:35 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 26 Oct 2023 17:11:35 GMT Subject: RFR: 8318827: RISC-V: Improve readability of fclass result testing [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 15:35:49 GMT, Feilong Jiang wrote: >> Hi, please consider. >> >> Currently, we test results of `fclass` instruction with hard-coded bits which has bad readability. This patch adds an enumeration of the flcass mask bits for ease of use. >> >> Testing: >> >> - [ ] tier1 with release build > > Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fclass-mask > - remove 'fclass_' prefix > - adjust enum name style > - Add FCLASS_MASK enum for better readability Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16362#pullrequestreview-1700132271 From vlivanov at openjdk.org Thu Oct 26 17:45:44 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 26 Oct 2023 17:45:44 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <0YkU44VdBbj08LTg9zw27GoMHMs1GNEz8U2nUb1wkpc=.29b8c5a2-2384-4566-9c5f-e483a1437ef2@github.com> Message-ID: On Thu, 26 Oct 2023 10:39:04 GMT, Andrew Haley wrote: > do you have any suggestion about where to move the remaining code that's in stubRoutines.cpp? As it is now, `globalDefinitions.hpp/cpp` looks the most appropriate one to me. You could create new header file for it (under `share/utilities/`), but introducing one just for a single function may be an overkill. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1781559472 From vlivanov at openjdk.org Thu Oct 26 17:45:46 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 26 Oct 2023 17:45:46 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v16] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Thu, 26 Oct 2023 16:46:04 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Remove accidental include make/test/JtregNativeHotspot.gmk line 854: > 852: BUILD_HOTSPOT_JTREG_EXECUTABLES_LIBS_exeFPRegs := -ldl > 853: BUILD_HOTSPOT_JTREG_LIBRARIES_LIBS_libAsyncGetCallTraceTest := -ldl > 854: BUILD_HOTSPOT_JTREG_LIBRARIES_LDFLAGS_libfast-math := -ffast-math Is the flag redundant by now? The test explicitly works with corresponding platform-specific registers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1373523640 From coleenp at openjdk.org Thu Oct 26 19:49:47 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 26 Oct 2023 19:49:47 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v4] In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 19:53:54 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into method_entry_8301997 > - Added asserts for getters and fixed printing > - Removed dead code in interpreters > - Removed unused structures, improved set_method_handle and appendix_if_resolved > - Removed some comments and relocated code > - 8301997: Move method resolution information out of the cpCache This is really great work and a wonderful cleanup of all these special cases we've had to hack into this over the years. My comments are alignment nits, plus not including instanceKlass.hpp if you don't need to. Really nice. src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2470: > 2468: __ load_unsigned_short(method_or_table_index, Address(cache, in_bytes(ResolvedMethodEntry::table_index_offset()))); > 2469: __ bind(Done); > 2470: } I like how you broke this up. There's a little duplication but I don't see a nice way of doing away with it and this makes it clear which parts of the ResolvedMethodEntry is meaningful for each bytecode. If someday in the future we wanted to further specialize for invokeinterface for example, your structure makes that easy to do. src/hotspot/cpu/x86/templateTable_x86.cpp line 3800: > 3798: rbx, // Method* > 3799: rdx // flags > 3800: ); small nit: can you put ); on the line before? And in the call below it to load_resolved_method_entry_special_or_static? src/hotspot/share/interpreter/rewriter.hpp line 88: > 86: return _cp_cache_map.length() - _first_iteration_cp_cache_limit; > 87: } > 88: I'm so pleased this special case went away. We still need to add entries for invokespecial of JVM_CONSTANT_InterfaceMethodref but we don't need to keep track of where the original entries ended anymore with your changes. src/hotspot/share/interpreter/templateTable.hpp line 288: > 286: Register klass, > 287: Register method_or_table_index, > 288: Register flags); Nit: can you line up these arguments? You did move these. thank you. src/hotspot/share/oops/cpCache.cpp line 247: > 245: > 246: void ConstantPoolCache::set_itable_call(Bytecodes::Code invoke_code, > 247: int method_index, Nit: align parameters here. src/hotspot/share/oops/resolvedMethodEntry.cpp line 35: > 33: return !_method->is_old() && !_method->is_obsolete(); // old is always set for old and obsolete > 34: } else { > 35: return true; nit: indent 2. src/hotspot/share/oops/resolvedMethodEntry.hpp line 29: > 27: > 28: #include "interpreter/bytecodes.hpp" > 29: #include "oops/instanceKlass.hpp" Can you forward declare InstanceKlass rather than including it here? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15455#pullrequestreview-1700425315 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1373666390 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1373678584 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1373695735 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1373697254 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1373704477 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1373732369 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1373735034 From manc at openjdk.org Thu Oct 26 20:36:40 2023 From: manc at openjdk.org (Man Cao) Date: Thu, 26 Oct 2023 20:36:40 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v32] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 14:31:23 GMT, Volker Simonis wrote: >> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp line 138: >> >>> 136: _vtime_accum = (os::elapsedVTime() - _vtime_start); >>> 137: >>> 138: cm()->update_concurrent_mark_threads_cpu_time(); >> >> Is there some overlapping btw this and the existing `_vtime_accum`. If so, can they be consolidated somehow? >> >> I believe the purpose of calling `update_concurrent_mark_threads_cpu_time` in multiple places is to get more up-to-date conc-cpu-time. Reading through the JBS ticket, I don't see the motivation for maintaining such a "fresh" value. >> >> Finally, is CSR required for this feature? > > @albertnetymk, the hsperf counters are a non-public API and the new counters have been added to the non-standard `sun.threads.cpu_time` name space which is "[unstable and unsupported](https://github.com/openjdk/jdk/blob/9864951dceb0ddc4479ced04b6d5a2363f1e307d/src/hotspot/share/runtime/perfData.cpp#L56)" so I don't think a CSR is required. > Is there some overlapping btw this and the existing _vtime_accum. If so, can they be consolidated somehow? We don't really like `_vtime_accum` for monitoring: - `os::elapsedVTime()` could silently fall back from CPU time to wall time, according to os_linux.cpp. We'd rather to have true CPU time, or nothing. Mixing up CPU time and wall time is confusing to users. - `_vtime_accum` only tracks the time consumed by the concurrent marking main thread, but not the concurrent worker threads. There's a `G1ConcurrentMarkThread::vtime_accum()` that seems to account for concurrent worker threads. It looks only used for logging. It might be possible to replace the existing logging code with the value of the `sun.threads.cpu_time.gc_conc_mark` hsperf counter. However, it is better to do that separately, and I'll create an RFE. > I believe the purpose of calling update_concurrent_mark_threads_cpu_time in multiple places is to get more up-to-date conc-cpu-time. Reading through the JBS ticket, I don't see the motivation for maintaining such a "fresh" value. Yes. The reason is that a concurrent mark cycle could take several minutes for a large heap. For tools like [AHS](https://mail.openjdk.org/pipermail/hotspot-dev/2022-September/064190.html) that reads these CPU hsperf counters, they could read these CPU data every 1 to 5 seconds. The current mechanism still needs to wait for the whole `G1ConcurrentMark::mark_from_roots()` to complete, which could still take minutes. I can create a separate RFE to make it update more frequently inside `G1CMConcurrentMarkingTask::work()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1373784494 From jjoo at openjdk.org Thu Oct 26 21:01:53 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 26 Oct 2023 21:01:53 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v33] In-Reply-To: References: Message-ID: <2VxMwwQKNjq7EeXnyQ7fo_aXYj2EGCJyW2MrAUkoSQs=.902af266-1470-43ed-8e4c-a8b7611cc154@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Remove StringDedup from GC thread list ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/ebafa2b5..2fc508f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=31-32 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From lmesnik at openjdk.org Thu Oct 26 22:37:33 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 26 Oct 2023 22:37:33 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: On Thu, 26 Oct 2023 06:09:34 GMT, Jaikiran Pai wrote: >> The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. >> Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. >> >> A few tests start failing. >> >> The test >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java >> has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. >> >> The test >> java/lang/Thread/virtual/ThreadAPI.java >> tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. >> >> Test >> test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. >> >> Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. > > test/jtreg_test_thread_factory/src/share/classes/Virtual.java line 37: > >> 35: // The virtual threads don't belong to any group and need global handler. >> 36: Thread.setDefaultUncaughtExceptionHandler((t, e) -> { >> 37: if (e instanceof ThreadDeath) { > > `ThreadDeath` has been deprecated for removal since Java 20, so this should no longer be needed. It is still used in tests and we should ignore it like jtreg doing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16369#discussion_r1373886743 From manc at openjdk.org Thu Oct 26 22:46:36 2023 From: manc at openjdk.org (Man Cao) Date: Thu, 26 Oct 2023 22:46:36 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v33] In-Reply-To: <2VxMwwQKNjq7EeXnyQ7fo_aXYj2EGCJyW2MrAUkoSQs=.902af266-1470-43ed-8e4c-a8b7611cc154@github.com> References: <2VxMwwQKNjq7EeXnyQ7fo_aXYj2EGCJyW2MrAUkoSQs=.902af266-1470-43ed-8e4c-a8b7611cc154@github.com> Message-ID: On Thu, 26 Oct 2023 21:01:53 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Remove StringDedup from GC thread list Marked as reviewed by manc (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1700652459 From manc at openjdk.org Thu Oct 26 22:46:38 2023 From: manc at openjdk.org (Man Cao) Date: Thu, 26 Oct 2023 22:46:38 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v32] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 20:33:31 GMT, Man Cao wrote: >> @albertnetymk, the hsperf counters are a non-public API and the new counters have been added to the non-standard `sun.threads.cpu_time` name space which is "[unstable and unsupported](https://github.com/openjdk/jdk/blob/9864951dceb0ddc4479ced04b6d5a2363f1e307d/src/hotspot/share/runtime/perfData.cpp#L56)" so I don't think a CSR is required. > >> Is there some overlapping btw this and the existing _vtime_accum. If so, can they be consolidated somehow? > > We don't really like `_vtime_accum` for monitoring: > - `os::elapsedVTime()` could silently fall back from CPU time to wall time, according to os_linux.cpp. We'd rather to have true CPU time, or nothing. Mixing up CPU time and wall time is confusing to users. > - `_vtime_accum` only tracks the time consumed by the concurrent marking main thread, but not the concurrent worker threads. > > There's a `G1ConcurrentMarkThread::vtime_accum()` that seems to account for concurrent worker threads. It looks only used for logging. It might be possible to replace the existing logging code with the value of the `sun.threads.cpu_time.gc_conc_mark` hsperf counter. However, it is better to do that separately, and I'll create an RFE. > >> I believe the purpose of calling update_concurrent_mark_threads_cpu_time in multiple places is to get more up-to-date conc-cpu-time. Reading through the JBS ticket, I don't see the motivation for maintaining such a "fresh" value. > > Yes. The reason is that a concurrent mark cycle could take many seconds or even minutes for a large heap. For tools like [AHS](https://mail.openjdk.org/pipermail/hotspot-dev/2022-September/064190.html) that reads these CPU hsperf counters, they could read these CPU data every 1 to 5 seconds. > > The current mechanism still needs to wait for the whole `G1ConcurrentMark::mark_from_roots()` to complete, which could still take minutes. I can create a separate RFE to make it update more frequently inside `G1CMConcurrentMarkingTask::work()`. I've created https://bugs.openjdk.org/browse/JDK-8318937 and https://bugs.openjdk.org/browse/JDK-8318941. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1373893351 From lmesnik at openjdk.org Thu Oct 26 23:04:31 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 26 Oct 2023 23:04:31 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: On Wed, 25 Oct 2023 21:08:01 GMT, Leonid Mesnik wrote: > The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. > Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. > > A few tests start failing. > > The test > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java > has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. > > The test > java/lang/Thread/virtual/ThreadAPI.java > tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. > > Test > test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. > > Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. > Hello Leonid, looking at the changes in this PR, what's being proposed is that when jtreg launches tests through a virtual thread, then this wrapping code will set a JVM level UncaughtExceptionHandler by calling Thread.setDefaultUncaughtExceptionHandler(...). The implementation of this UncaughtExceptionHandler calls System.exit(1). Wouldn't that kill the test VM? I think that would then impact everything else including jtreg report generation and such for the test, isn't it? The jtreg correctly reports such failures. It is expected that JVM might fail. The only difference is that the reason for failure is System.exit(1) and the exception. > I had a look at https://bugs.openjdk.org/browse/JDK-8318839 but it doesn't have enough details to help understand what currently happens when a test launched through a virtual thread from jtreg throws an uncaught exception. How/what gets reported for that test execution? The jtreg correctly catches and reports failures thrown by the main virtual thread. However, it ignores exceptions thrown by any other threads started by the test. For platform threads, jtreg uses ThreafGroup (AgentVMThreadGroup or MainThreadGroup) to report failures in other threads. However, there is no way to use such an approach to catch exceptions for all test threads when virtual threads are used. See for details: https://github.com/openjdk/jtreg/blob/ef3865581bdfc55c6315a8538222fc3a91b2b872/src/share/classes/com/sun/javatest/regtest/agent/MainWrapper.java#L72 I have a PR to implement the global exception handling here: https://github.com/openjdk/jtreg/pull/172 Now it is the only proposal and might take a long time to complete it. So the current fix helps us to find issues with the virtual thread test factory until jtreg is fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1782021953 From lmesnik at openjdk.org Thu Oct 26 23:08:32 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 26 Oct 2023 23:08:32 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: <0pXMnIBqKI4R89gdQ6JOTWTzFkKc2ncwdzbzc6o2nlA=.3288cbe5-e389-4b95-ad79-2283a363d7d2@github.com> On Thu, 26 Oct 2023 08:34:39 GMT, Alan Bateman wrote: > Having a UHE invoke System.exit is surprising. Are you saying that this is only for cases where a test launches a child VM with the thread factory set? It is for cases when the test is started in a virtual thread. I don't see a better way to process unexpected exceptions right now. I don't want to complicate the interface between jtreg and the plugin for this temporary fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1782028651 From lmesnik at openjdk.org Thu Oct 26 23:11:33 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 26 Oct 2023 23:11:33 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: <7vUtv8R_TYLX6ECx5YiBSh7Vc59LF2rOFXjUNXE2Fpc=.b4dec742-93d8-405b-8594-45092abf9315@github.com> On Thu, 26 Oct 2023 08:34:39 GMT, Alan Bateman wrote: > Stepping back a bit. ThreadGroup is legacy and we eventually want it to go away. We've been deprecating and degrading it very slowly over many releases. So I think jtreg will eventually need to change. Right now, it creates AgentVMThreadGroup for agent VM mode so it controls the UHE where it starts the "main thread". I think it will eventually need to change this to set the system-wide UHE but this means it would handling uncaught exception thrown by "system threads", we may have to audit some of the exception handling if things come out of the woodwork. I think we shouldn't allow any unhandled exceptions in system threads also. This might just hide issues. The tests that might erase such exceptions should be written to for VM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1782033148 From ysuenaga at openjdk.org Fri Oct 27 03:51:44 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Fri, 27 Oct 2023 03:51:44 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: <6Ywn5vjecuN2UtqWT6IdM_EyK4xP4foammAeoJCvxx8=.2ab7ba6f-fb37-4620-96b1-9e39b4e939e4@github.com> References: <6Ywn5vjecuN2UtqWT6IdM_EyK4xP4foammAeoJCvxx8=.2ab7ba6f-fb37-4620-96b1-9e39b4e939e4@github.com> Message-ID: On Wed, 25 Oct 2023 13:18:33 GMT, Jorn Vernee wrote: >> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >> >> - a -> an >> - add note to downcallHandle about passing heap segments by-reference > > src/hotspot/cpu/x86/downcallLinker_x86_64.cpp line 110: > >> 108: __ mov(rsp, r12); // restore sp >> 109: __ reinit_heapbase(); >> 110: } > > This is a minor cleanup to share this code for the three use sites below. Question: `r12` does not need to remember? According to [CallingSequences in OpenJDK Wiki](https://wiki.openjdk.org/display/HotSpot/CallingSequences), `r12` may be reserved for HeapBase if COOP is enabled. (`r12` is also used in another places in downcallLinker_x86_64.cpp without restoring...) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1374042333 From dholmes at openjdk.org Fri Oct 27 04:19:43 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 27 Oct 2023 04:19:43 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v11] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <0YkU44VdBbj08LTg9zw27GoMHMs1GNEz8U2nUb1wkpc=.29b8c5a2-2384-4566-9c5f-e483a1437ef2@github.com> Message-ID: On Thu, 26 Oct 2023 17:41:52 GMT, Vladimir Ivanov wrote: > > do you have any suggestion about where to move the remaining code that's in stubRoutines.cpp? > > As it is now, `globalDefinitions.hpp/cpp` looks the most appropriate one to me. That seems okay to me too and much better than `stubRoutines.cpp`. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1782264665 From dholmes at openjdk.org Fri Oct 27 04:23:29 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 27 Oct 2023 04:23:29 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 04:23:42 GMT, Julian Waters wrote: >> src/hotspot/os/posix/os_posix.cpp line 894: >> >>> 892: sleep: // sleep forever ... >>> 893: ::sleep(100); // ... 100 seconds at a time >>> 894: goto sleep; >> >> I don't recall now why this was written the way it was, but I certainly do not understand why you rewrote it this way with a goto! > > ah, when I was searching for functions to implement the noreturn with and stumbled across this one, I thought it could do with a goto instead of a while true since the intent seemed to be clearer. I can revert this if need be Please revert - `while (true)` conveys the exact intent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1374055747 From dholmes at openjdk.org Fri Oct 27 04:27:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 27 Oct 2023 04:27:31 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 04:26:45 GMT, Julian Waters wrote: > I figured that a little refactoring of scope (from os::win32 to os_windows.cpp file scope) could help here The very loose, not well followed, historical convention here is that the Windows specific os class contains the methods defined by os.hpp, while the implementation details go into the os::win32 class. In many cases the choice is somewhat arbitrary, but there should still be a good reason to move something around. Unnecessary refactoring just makes the PR harder to understand. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1782270410 From dholmes at openjdk.org Fri Oct 27 04:35:41 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 27 Oct 2023 04:35:41 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v33] In-Reply-To: <2VxMwwQKNjq7EeXnyQ7fo_aXYj2EGCJyW2MrAUkoSQs=.902af266-1470-43ed-8e4c-a8b7611cc154@github.com> References: <2VxMwwQKNjq7EeXnyQ7fo_aXYj2EGCJyW2MrAUkoSQs=.902af266-1470-43ed-8e4c-a8b7611cc154@github.com> Message-ID: <6XB0v77G_T6Expnc20NX5dyeJMFuxdTrb_IsSyHurTo=.a7351987-1928-4f67-bb30-693b6bec60fd@github.com> On Thu, 26 Oct 2023 21:01:53 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Remove StringDedup from GC thread list > `os::elapsedVTime()` could silently fall back from CPU time to wall time, according to os_linux.cpp The only allowed failure modes for `getrusage` are: EFAULT usage points outside the accessible address space. EINVAL who is invalid. neither of which are going to occur in practice ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1782275672 From jwaters at openjdk.org Fri Oct 27 04:38:47 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 27 Oct 2023 04:38:47 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v4] In-Reply-To: References: Message-ID: > On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. > > The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. > > Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method > > All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. > > This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset Julian Waters has updated the pull request incrementally with two additional commits since the last revision: - Revert os_windows.cpp - Revert os_posix.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16303/files - new: https://git.openjdk.org/jdk/pull/16303/files/c025c250..60ea51ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=02-03 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16303/head:pull/16303 PR: https://git.openjdk.org/jdk/pull/16303 From jwaters at openjdk.org Fri Oct 27 04:41:49 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 27 Oct 2023 04:41:49 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: > On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. > > The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. > > Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method > > All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. > > This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Revert to exit_code in os_windows.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16303/files - new: https://git.openjdk.org/jdk/pull/16303/files/60ea51ab..6b81a926 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16303/head:pull/16303 PR: https://git.openjdk.org/jdk/pull/16303 From jwaters at openjdk.org Fri Oct 27 05:33:30 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 27 Oct 2023 05:33:30 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 04:41:49 GMT, Julian Waters wrote: >> On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. >> >> The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. >> >> Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method >> >> All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. >> >> This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert to exit_code in os_windows.cpp Addressed some of the review comments Side note: Should the Style Guide only permit noreturn for void methods? It's Undefined Behaviour when applied to something that returns int for instance, such as exit_process_or_thread here (which I had to refactor to void) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1782319618 From dholmes at openjdk.org Fri Oct 27 05:58:32 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 27 Oct 2023 05:58:32 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: <1FD6HX13l4t-q6wuU-zlMVcS6CDPCTB3sEHwO4HzhLQ=.8024ce47-f9cc-4b2f-9702-752178546975@github.com> On Wed, 25 Oct 2023 21:08:01 GMT, Leonid Mesnik wrote: > The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. > Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. > > A few tests start failing. > > The test > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java > has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. > > The test > java/lang/Thread/virtual/ThreadAPI.java > tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. > > Test > test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. > > Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. Not at all sure this is the right approach ... an exception in an arbitrary thread should not terminate the VM. Sometimes we might expect a thread to terminate by exception. ------------- PR Review: https://git.openjdk.org/jdk/pull/16369#pullrequestreview-1701043216 From dholmes at openjdk.org Fri Oct 27 05:58:33 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 27 Oct 2023 05:58:33 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: On Thu, 26 Oct 2023 22:34:24 GMT, Leonid Mesnik wrote: >> test/jtreg_test_thread_factory/src/share/classes/Virtual.java line 37: >> >>> 35: // The virtual threads don't belong to any group and need global handler. >>> 36: Thread.setDefaultUncaughtExceptionHandler((t, e) -> { >>> 37: if (e instanceof ThreadDeath) { >> >> `ThreadDeath` has been deprecated for removal since Java 20, so this should no longer be needed. > > It is still used in tests and we should ignore it like jtreg doing. Shouldn't this code first retrieve the current default exception handler, and then check whether t is a virtual thread, and if so handle the exception as appropriate (not sure System.exit is appropriate ..). Then for non-virtual threads it delegates to the previous default handler. final UncaughtExceptionHandler originalUEH = Thread.getDefaultUncaughtExceptionHandler(); Thread.setDefaultUncaughtExceptionHandler((t, e) -> { if (t.isVirtual()) { // ... } else { originalUEH.uncaughtException(t, e); } }); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16369#discussion_r1374102902 From fyang at openjdk.org Fri Oct 27 06:41:32 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 27 Oct 2023 06:41:32 GMT Subject: RFR: 8318827: RISC-V: Improve readability of fclass result testing [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 15:35:49 GMT, Feilong Jiang wrote: >> Hi, please consider. >> >> Currently, we test results of `fclass` instruction with hard-coded bits which has bad readability. This patch adds an enumeration of the flcass mask bits for ease of use. >> >> Testing: >> >> - [ ] tier1 with release build > > Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fclass-mask > - remove 'fclass_' prefix > - adjust enum name style > - Add FCLASS_MASK enum for better readability Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16362#pullrequestreview-1701087407 From fyang at openjdk.org Fri Oct 27 06:56:50 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 27 Oct 2023 06:56:50 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v4] In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 19:53:54 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into method_entry_8301997 > - Added asserts for getters and fixed printing > - Removed dead code in interpreters > - Removed unused structures, improved set_method_handle and appendix_if_resolved > - Removed some comments and relocated code > - 8301997: Move method resolution information out of the cpCache src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3306: > 3304: // x86 uses a shift and mask or wings it with a shift plus assert > 3305: // the mask is not needed. aarch64 just uses bitfield extract > 3306: __ ubfxw(rscratch2, flags, ConstantPoolCacheEntry::tos_state_shift, ConstantPoolCacheEntry::tos_state_bits); Nit: You might want to remove the preceding 3-line code comment together. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1374143907 From gcao at openjdk.org Fri Oct 27 06:58:44 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 27 Oct 2023 06:58:44 GMT Subject: RFR: 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit Message-ID: Hi, The current test_bit assembly function needs to accept a temporary register because it needs one if it goes to the andi else branch. However, in this case we can actually avoid calling andi and accomplish the same thing by logically shifting to the right and testing the lowest bit. The advantage is that it makes the test_bit function much simpler. Also, to reduce the number of instructions in a given case (consider the mv function), mv actually calls the li function, which generates more than one instruction when the parameter imm exceeds the 32-bit range. https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L804-L840 ### Testing: qemu 8.1.50: - [ ] Tier1 tests (fastdebug) - [ ] Tier2 tests (release) - [ ] Tier3 tests (release) ------------- Commit messages: - 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit Changes: https://git.openjdk.org/jdk/pull/16391/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16391&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318953 Stats: 10 lines in 3 files changed: 6 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16391.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16391/head:pull/16391 PR: https://git.openjdk.org/jdk/pull/16391 From fyang at openjdk.org Fri Oct 27 07:22:36 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 27 Oct 2023 07:22:36 GMT Subject: RFR: 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit In-Reply-To: References: Message-ID: <0KOeNWUib5P8FywvRMy6Pt6XjXQ6Jwm3bVf_s5AcmsM=.2678c2a6-248d-4e79-a65b-3b320d7227d0@github.com> On Fri, 27 Oct 2023 06:52:54 GMT, Gui Cao wrote: > Hi, The current test_bit assembly function needs to accept a temporary register because it needs one if it goes to the andi else branch. However, in this case we can actually avoid calling andi and accomplish the same thing by logically shifting to the right and testing the lowest bit. The advantage is that it makes the test_bit function much simpler. Also, to reduce the number of instructions in a given case (consider the mv function), mv actually calls the li function, which generates more than one instruction when the parameter imm exceeds the 32-bit range. > https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L2009-L2017 > https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp#L730 > https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L804-L840 > > ### Testing: > qemu 8.1.50: > - [ ] Tier1 tests (fastdebug) > - [ ] Tier2 tests (release) > - [ ] Tier3 tests (release) src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4678: > 4676: int64_t imm = (int64_t)(1UL << bit_pos); > 4677: if (is_simm12(imm)) { > 4678: andi(Rd, Rs, imm); Since `imm` is guaranteed to be a signed 12-bit immediate in this block, we could call `and_imm12` directly instead of `andi`. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4681: > 4679: } else { > 4680: srli(Rd, Rs, bit_pos); > 4681: andi(Rd, Rd, 1); Similar here: call `and_imm12` directly instead of `andi` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16391#discussion_r1374166348 PR Review Comment: https://git.openjdk.org/jdk/pull/16391#discussion_r1374167066 From gcao at openjdk.org Fri Oct 27 07:30:03 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 27 Oct 2023 07:30:03 GMT Subject: RFR: 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit [v2] In-Reply-To: References: Message-ID: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> > Hi, The current test_bit assembly function needs to accept a temporary register because it needs one if it goes to the andi else branch. However, in this case we can actually avoid calling andi and accomplish the same thing by logically shifting to the right and testing the lowest bit. The advantage is that it makes the test_bit function much simpler. Also, to reduce the number of instructions in a given case (consider the mv function), mv actually calls the li function, which generates more than one instruction when the parameter imm exceeds the 32-bit range. > https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L2009-L2017 > https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp#L730 > https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L804-L840 > > ### Testing: > qemu 8.1.50: > - [ ] Tier1 tests (fastdebug) > - [ ] Tier2 tests (release) > - [ ] Tier3 tests (release) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Use and_imm12 to replace andi in test_bit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16391/files - new: https://git.openjdk.org/jdk/pull/16391/files/d14c03f2..a6e85a19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16391&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16391&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16391.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16391/head:pull/16391 PR: https://git.openjdk.org/jdk/pull/16391 From gcao at openjdk.org Fri Oct 27 07:30:04 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 27 Oct 2023 07:30:04 GMT Subject: RFR: 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit [v2] In-Reply-To: <0KOeNWUib5P8FywvRMy6Pt6XjXQ6Jwm3bVf_s5AcmsM=.2678c2a6-248d-4e79-a65b-3b320d7227d0@github.com> References: <0KOeNWUib5P8FywvRMy6Pt6XjXQ6Jwm3bVf_s5AcmsM=.2678c2a6-248d-4e79-a65b-3b320d7227d0@github.com> Message-ID: On Fri, 27 Oct 2023 07:19:00 GMT, Fei Yang wrote: >> Gui Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Use and_imm12 to replace andi in test_bit > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4678: > >> 4676: int64_t imm = (int64_t)(1UL << bit_pos); >> 4677: if (is_simm12(imm)) { >> 4678: andi(Rd, Rs, imm); > > Since `imm` is guaranteed to be a signed 12-bit immediate in this block, we could call `and_imm12` more directly instead of `andi`. Thanks for your review. fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16391#discussion_r1374174249 From fyang at openjdk.org Fri Oct 27 07:36:31 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 27 Oct 2023 07:36:31 GMT Subject: RFR: 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit [v2] In-Reply-To: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> References: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> Message-ID: On Fri, 27 Oct 2023 07:30:03 GMT, Gui Cao wrote: >> Hi, The current test_bit assembly function needs to accept a temporary register because it needs one if it goes to the andi else branch. However, in this case we can actually avoid calling andi and accomplish the same thing by logically shifting to the right and testing the lowest bit. The advantage is that it makes the test_bit function much simpler. Also, to reduce the number of instructions in a given case (consider the mv function), mv actually calls the li function, which generates more than one instruction when the parameter imm exceeds the 32-bit range. >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L2009-L2017 >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp#L730 >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L804-L840 >> >> ### Testing: >> qemu 8.1.50: >> - [ ] Tier1 tests (fastdebug) >> - [ ] Tier2 tests (release) >> - [ ] Tier3 tests (release) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use and_imm12 to replace andi in test_bit Looks good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16391#pullrequestreview-1701162701 From duke at openjdk.org Fri Oct 27 07:54:56 2023 From: duke at openjdk.org (Liming Liu) Date: Fri, 27 Oct 2023 07:54:56 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v9] In-Reply-To: References: Message-ID: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Use address to find the mapping of the heap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15781/files - new: https://git.openjdk.org/jdk/pull/15781/files/d1a33373..f59dff5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=07-08 Stats: 79 lines in 1 file changed: 35 ins; 20 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From duke at openjdk.org Fri Oct 27 07:59:59 2023 From: duke at openjdk.org (Liming Liu) Date: Fri, 27 Oct 2023 07:59:59 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v10] In-Reply-To: References: Message-ID: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Remove the unneccessary class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15781/files - new: https://git.openjdk.org/jdk/pull/15781/files/f59dff5d..b33edafd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From duke at openjdk.org Fri Oct 27 08:00:13 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Fri, 27 Oct 2023 08:00:13 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information In-Reply-To: References: Message-ID: <9JV9_zHq4b5pT59EbkAPWl575P9wh1SpB6gVl0Fvc3k=.7fe37ac5-8a41-4d5b-b6c4-45fed36f7074@github.com> On Thu, 26 Oct 2023 16:11:00 GMT, Thomas Obermeier wrote: > MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. > > We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. > > As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. Thanks for pointing this out, and kudos to @JoKern65 for his help and insights; he confirms your point. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1782469927 From duke at openjdk.org Fri Oct 27 08:00:12 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Fri, 27 Oct 2023 08:00:12 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v2] In-Reply-To: References: Message-ID: > MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. > > We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. > > As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. Thomas Obermeier has updated the pull request incrementally with one additional commit since the last revision: 8306561: move solution to caller ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16381/files - new: https://git.openjdk.org/jdk/pull/16381/files/0e45bef8..1a078ab5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16381&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16381&range=00-01 Stats: 6 lines in 2 files changed: 1 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16381/head:pull/16381 PR: https://git.openjdk.org/jdk/pull/16381 From jkern at openjdk.org Fri Oct 27 08:25:31 2023 From: jkern at openjdk.org (Joachim Kern) Date: Fri, 27 Oct 2023 08:25:31 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 17:09:04 GMT, Thomas Stuefe wrote: >> MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. >> >> We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. >> >> As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. > > Yes, let it blow up then. @tstuefe: I helped analyzing this problem by writing a plain c test pgm mmaping a page and trying to read beyound. On AIX as expected the very next byte after the requested region leads to a segmentation violation, but on linux (both flavours, linuxintel and linuxppc64) I was able to read exactly 20 KB beyond, before running into segmentation violation. This might be the reason, why the developer of print_pointer_information() was not aware that he creates code that could crash. Thomas, do you have an idea why linux (and maybe other platforms) map more memory as requested? It has nothing to do with memory pages. The additional memory does not end at the next memory page boundary, but exactly 20KB after the end of the requested region. Astonishing is that at the lower end of the region there is no extra memory accessible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1782503932 From vkempik at openjdk.org Fri Oct 27 08:32:32 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 27 Oct 2023 08:32:32 GMT Subject: RFR: 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit [v2] In-Reply-To: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> References: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> Message-ID: On Fri, 27 Oct 2023 07:30:03 GMT, Gui Cao wrote: >> Hi, The current test_bit assembly function needs to accept a temporary register because it needs one if it goes to the andi else branch. However, in this case we can actually avoid calling andi and accomplish the same thing by logically shifting to the right and testing the lowest bit. The advantage is that it makes the test_bit function much simpler. Also, to reduce the number of instructions in a given case (consider the mv function), mv actually calls the li function, which generates more than one instruction when the parameter imm exceeds the 32-bit range. >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L2009-L2017 >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp#L730 >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L804-L840 >> >> ### Testing: >> qemu 8.1.50: >> - [ ] Tier1 tests (fastdebug) >> - [ ] Tier2 tests (release) >> - [ ] Tier3 tests (release) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use and_imm12 to replace andi in test_bit Hello, do you have plans to backport this to 21u and 17u afterwards ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16391#issuecomment-1782513393 From lkorinth at openjdk.org Fri Oct 27 08:50:47 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Fri, 27 Oct 2023 08:50:47 GMT Subject: Integrated: 8315097: Rename createJavaProcessBuilder In-Reply-To: References: Message-ID: On Mon, 28 Aug 2023 15:54:08 GMT, Leo Korinth wrote: > This pull request renames `createJavaProcessBuilder` to `createLimitedTestJavaProcessBuilder` and renames `createTestJvm` to `createTestJavaProcessBuilder`. Both are implemented through a private `createJavaProcessBuilder`. It also updates the java doc. > > This is so that it should be harder to by mistake use `createLimitedTestJavaProcessBuilder` that is problematic because it will not forward JVM flags to the tested JVM. This pull request has now been integrated. Changeset: d52a995f Author: Leo Korinth URL: https://git.openjdk.org/jdk/commit/d52a995f35de26c2cc4074297a75141e4a363e1b Stats: 1574 lines in 560 files changed: 44 ins; 10 del; 1520 mod 8315097: Rename createJavaProcessBuilder Reviewed-by: lmesnik, dholmes, rriggs, stefank ------------- PR: https://git.openjdk.org/jdk/pull/15452 From lkorinth at openjdk.org Fri Oct 27 09:00:48 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Fri, 27 Oct 2023 09:00:48 GMT Subject: RFR: 8315097: Rename createJavaProcessBuilder [v7] In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 08:44:29 GMT, Leo Korinth wrote: >> This pull request renames `createJavaProcessBuilder` to `createLimitedTestJavaProcessBuilder` and renames `createTestJvm` to `createTestJavaProcessBuilder`. Both are implemented through a private `createJavaProcessBuilder`. It also updates the java doc. >> >> This is so that it should be harder to by mistake use `createLimitedTestJavaProcessBuilder` that is problematic because it will not forward JVM flags to the tested JVM. > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright year and indentation Thanks, see: https://bugs.openjdk.org/browse/JDK-8318962 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15452#issuecomment-1782552641 From jvernee at openjdk.org Fri Oct 27 09:20:44 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 27 Oct 2023 09:20:44 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: <6Ywn5vjecuN2UtqWT6IdM_EyK4xP4foammAeoJCvxx8=.2ab7ba6f-fb37-4620-96b1-9e39b4e939e4@github.com> Message-ID: On Fri, 27 Oct 2023 03:49:17 GMT, Yasumasa Suenaga wrote: >> src/hotspot/cpu/x86/downcallLinker_x86_64.cpp line 110: >> >>> 108: __ mov(rsp, r12); // restore sp >>> 109: __ reinit_heapbase(); >>> 110: } >> >> This is a minor cleanup to share this code for the three use sites below. > > Question: `r12` does not need to remember? > > According to [CallingSequences in OpenJDK Wiki](https://wiki.openjdk.org/display/HotSpot/CallingSequences), `r12` may be reserved for HeapBase if COOP is enabled. > (`r12` is also used in another places in downcallLinker_x86_64.cpp without restoring...) You mean `reinit_heapbase` can be removed? I'm not sure whether the caller expects it to be preserved. Note that `r12` is used in this case to save and restore `rsp`. This is needed since we access data in the frame relative to `rsp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1374300821 From mli at openjdk.org Fri Oct 27 09:20:45 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 27 Oct 2023 09:20:45 GMT Subject: RFR: 8318225: RISC-V: C2 UModI Message-ID: Hi, Can you review the change to add intrinsic for UModI and UModL? ( This is a quite similar patch to https://github.com/openjdk/jdk/pull/16346, which addresses UDivI and UDivL, so for the performance consideration please also check the discussion in that pr. ) Thanks! ## Tests ### Functionality Run tests successfully found via `grep -nr test/jdk/ -we remainderUnsigned` and `grep -nr test/hotspot/ -we remainderUnsigned` ### Performance #### Long **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** **Before** LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 21222.911 ? 57.735 ns/op LongDivMod.testRemainderUnsigned 1024 positive avgt 10 28841.429 ? 6.294 ns/op LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op **After** LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 22666.448 ? 34.986 ns/op LongDivMod.testRemainderUnsigned 1024 positive avgt 10 15967.846 ? 24.805 ns/op LongDivMod.testRemainderUnsigned 1024 negative avgt 10 29507.865 ? 20.593 ns/op #### Integer **Before** IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23396.475 ? 24.065 ns/op IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16796.796 ? 3.389 ns/op IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30159.407 ? 6.716 ns/op **After** IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23216.710 ? 14.351 ns/op IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16621.374 ? 3.203 ns/op IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30002.088 ? 41.212 ns/op ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/16394/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16394&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318225 Stats: 46 lines in 2 files changed: 44 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16394.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16394/head:pull/16394 PR: https://git.openjdk.org/jdk/pull/16394 From rehn at openjdk.org Fri Oct 27 09:36:30 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 27 Oct 2023 09:36:30 GMT Subject: RFR: 8318225: RISC-V: C2 UModI In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 09:15:11 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UModI and UModL? > ( This is a quite similar patch to https://github.com/openjdk/jdk/pull/16346, which addresses UDivI and UDivL, so for the performance consideration please also check the discussion in that pr. ) > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we remainderUnsigned` and `grep -nr test/hotspot/ -we remainderUnsigned` > > ### Performance > > #### Long > **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** > > **Before** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 21222.911 ? 57.735 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 28841.429 ? 6.294 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op > > > **After** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 22666.448 ? 34.986 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 15967.846 ? 24.805 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 29507.865 ? 20.593 ns/op > > > #### Integer > **Before** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23396.475 ? 24.065 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16796.796 ? 3.389 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30159.407 ? 6.716 ns/op > > > **After** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23216.710 ? 14.351 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16621.374 ? 3.203 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30002.088 ? 41.212 ns/op Looks good, but these numbers: LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30159.407 ? 6.716 ns/op What up here ? How come in before case Long is 4-5x faster than Integer? I don't see an explanation in other PR either ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16394#issuecomment-1782602965 From mli at openjdk.org Fri Oct 27 10:07:32 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 27 Oct 2023 10:07:32 GMT Subject: RFR: 8318225: RISC-V: C2 UModI In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 09:15:11 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UModI and UModL? > ( This is a quite similar patch to https://github.com/openjdk/jdk/pull/16346, which addresses UDivI and UDivL, so for the performance consideration please also check the discussion in that pr. ) > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we remainderUnsigned` and `grep -nr test/hotspot/ -we remainderUnsigned` > > ### Performance > > #### Long > **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** > > **Before** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 21222.911 ? 57.735 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 28841.429 ? 6.294 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op > > > **After** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 22666.448 ? 34.986 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 15967.846 ? 24.805 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 29507.865 ? 20.593 ns/op > > > #### Integer > **Before** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23396.475 ? 24.065 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16796.796 ? 3.389 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30159.407 ? 6.716 ns/op > > > **After** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23216.710 ? 14.351 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16621.374 ? 3.203 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30002.088 ? 41.212 ns/op Good find. Let me try to explain. For l.l.Long, remainderUnsigned has a quick path for negative divisor (This was discussed in https://github.com/openjdk/jdk/pull/16346), it's much quicker than the case of positive divisor. For j.l.Integer public static int remainderUnsigned(int dividend, int divisor) { // In lieu of tricky code, for now just use long arithmetic. return (int)(toUnsignedLong(dividend) % toUnsignedLong(divisor)); } it will call Long's rem, and the value of divisor passed to Long rem will be positive in the sense of Long. To verify this, you can compare the Long's rem data before the patch, positive one is much lower than negative one. LongDivMod.testRemainderUnsigned 1024 positive avgt 10 28841.429 ? 6.294 ns/op LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op Hope this answers your question. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16394#issuecomment-1782646419 From gcao at openjdk.org Fri Oct 27 10:08:38 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 27 Oct 2023 10:08:38 GMT Subject: RFR: 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit [v2] In-Reply-To: References: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> Message-ID: On Fri, 27 Oct 2023 08:29:33 GMT, Vladimir Kempik wrote: > Hello, do you have plans to backport this to 21u and 17u afterwards ? Yes, we'll do a backport later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16391#issuecomment-1782647174 From jvernee at openjdk.org Fri Oct 27 10:10:56 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 27 Oct 2023 10:10:56 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v61] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 10:35:29 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge branch 'master' into JDK-8139457 > - Fix ARM build > - Merge remote-tracking branch 'upstream/master' into JDK-8139457 > - Various cleanups > - RISC changes > - Move gap init into allocate_header() (x86) > - Fix gtest failure on x86 > - Merge remote-tracking branch 'upstream/master' into JDK-8139457 > - Fix comments > - Fix inconsistencies in argument naming C1_MacroAssembler::allocate_array() > - ... and 80 more: https://git.openjdk.org/jdk/compare/9bfa0829...7eaca124 Just to check: object headers of arrays are still aligned to 8-bytes, and only the elements alignment of e.g. a `byte[]` is 4? If that's the case, than someone can always get back to 8-byte alignment simply by adding 4 bytes of offset when doing the access. Probably not great to rely on that though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1782650313 From qamai at openjdk.org Fri Oct 27 10:26:43 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 27 Oct 2023 10:26:43 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: <6Ywn5vjecuN2UtqWT6IdM_EyK4xP4foammAeoJCvxx8=.2ab7ba6f-fb37-4620-96b1-9e39b4e939e4@github.com> Message-ID: On Fri, 27 Oct 2023 03:49:17 GMT, Yasumasa Suenaga wrote: >> src/hotspot/cpu/x86/downcallLinker_x86_64.cpp line 110: >> >>> 108: __ mov(rsp, r12); // restore sp >>> 109: __ reinit_heapbase(); >>> 110: } >> >> This is a minor cleanup to share this code for the three use sites below. > > Question: `r12` does not need to remember? > > According to [CallingSequences in OpenJDK Wiki](https://wiki.openjdk.org/display/HotSpot/CallingSequences), `r12` may be reserved for HeapBase if COOP is enabled. > (`r12` is also used in another places in downcallLinker_x86_64.cpp without restoring...) @YaSuenag `r12` is restored in `reinit_heapbase()` if needed and no, `r12` does not need remembering because it is a constant and can be restored from somewhere else. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1374370475 From jvernee at openjdk.org Fri Oct 27 10:34:59 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 27 Oct 2023 10:34:59 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v61] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 10:35:29 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge branch 'master' into JDK-8139457 > - Fix ARM build > - Merge remote-tracking branch 'upstream/master' into JDK-8139457 > - Various cleanups > - RISC changes > - Move gap init into allocate_header() (x86) > - Fix gtest failure on x86 > - Merge remote-tracking branch 'upstream/master' into JDK-8139457 > - Fix comments > - Fix inconsistencies in argument naming C1_MacroAssembler::allocate_array() > - ... and 80 more: https://git.openjdk.org/jdk/compare/9bfa0829...7eaca124 Also, I think we can handle the issues with `ByteBuffer::alignedSlice`, `ByteBuffer::alignmentOffset`, `MethodHandles::byteArrayViewVarHandle` and `MethodHandles::byteBufferViewVarHandle` separately, ahead of this PR. There is already a bug in the spec of these methods, as they try to give alignment guarantees for heap memory that the VM spec does not support. I've filed a couple of JBS issues for these: https://bugs.openjdk.org/browse/JDK-8318967 & https://bugs.openjdk.org/browse/JDK-8318966 ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1782681427 From ogillespie at openjdk.org Fri Oct 27 11:00:48 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Fri, 27 Oct 2023 11:00:48 GMT Subject: RFR: 8315559: Dacapo pmd regressions 5-20% on all platforms in 22-b11 Message-ID: Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. When concurrent symbol table cleanup runs, it also drains the queue. In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. Thanks @shipilev and @coleenp for helping with this fix. ------------- Commit messages: - Delay temp symbol cleanup to reduce churn Changes: https://git.openjdk.org/jdk/pull/16398/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315559 Stats: 64 lines in 2 files changed: 64 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From aph at openjdk.org Fri Oct 27 11:12:21 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Oct 2023 11:12:21 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v17] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Move IEE subnormal check to globalDefinitions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/bd51efde..30a1381d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=15-16 Stats: 67 lines in 6 files changed: 33 ins; 30 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From dnsimon at openjdk.org Fri Oct 27 11:20:55 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 27 Oct 2023 11:20:55 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads Message-ID: This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. ------------- Commit messages: - [skip ci] update CompilerThread::can_call_java to be false more often for libjvmci Changes: https://git.openjdk.org/jdk/pull/16383/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318694 Stats: 116 lines in 7 files changed: 91 ins; 10 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/16383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16383/head:pull/16383 PR: https://git.openjdk.org/jdk/pull/16383 From dnsimon at openjdk.org Fri Oct 27 11:20:57 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 27 Oct 2023 11:20:57 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 17:39:46 GMT, Doug Simon wrote: > This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 224: > 222: jint __throw_res = env->ThrowNew(JNIJVMCI::name::clazz(), msg); \ > 223: if (__throw_res != JNI_OK) { \ > 224: tty->print_cr("Throwing " #name " in " caller " returned %d", __throw_res); \ The VM should prefer event logging over printing to the console. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 583: > 581: > 582: C2V_VMENTRY_NULL(jobject, lookupType, (JNIEnv* env, jobject, jstring jname, ARGUMENT_PAIR(accessing_klass), jint accessing_klass_loader, jboolean resolve)) > 583: CompilerThreadCanCallJava canCallJava(thread, resolve); // Resolution requires Java calls This is currently required by libgraal - it may be fixable in future. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 597: > 595: if (strstr(val, "") != nullptr) { > 596: tty->print_cr("CompilerToVM.lookupType: %s", str); > 597: } else if (strstr(str, val) != nullptr) { This fixes an existing bug: the test is meant to be whether `val` is a substring of `str`, not the other way around. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16383#discussion_r1373521606 PR Review Comment: https://git.openjdk.org/jdk/pull/16383#discussion_r1373846457 PR Review Comment: https://git.openjdk.org/jdk/pull/16383#discussion_r1373522668 From vkempik at openjdk.org Fri Oct 27 11:27:30 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 27 Oct 2023 11:27:30 GMT Subject: RFR: 8318225: RISC-V: C2 UModI In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 09:15:11 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UModI and UModL? > ( This is a quite similar patch to https://github.com/openjdk/jdk/pull/16346, which addresses UDivI and UDivL, so for the performance consideration please also check the discussion in that pr. ) > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we remainderUnsigned` and `grep -nr test/hotspot/ -we remainderUnsigned` > > ### Performance > > #### Long > **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** > > **Before** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 21222.911 ? 57.735 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 28841.429 ? 6.294 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op > > > **After** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 22666.448 ? 34.986 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 15967.846 ? 24.805 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 29507.865 ? 20.593 ns/op > > > #### Integer > **Before** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23396.475 ? 24.065 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16796.796 ? 3.389 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30159.407 ? 6.716 ns/op > > > **After** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23216.710 ? 14.351 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16621.374 ? 3.203 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30002.088 ? 41.212 ns/op So, you win 45% on LongDivMod.testRemainderUnsigned 1024 positive but loose almost 4x on LongDivMod.testRemainderUnsigned 1024 negative Does this intrinsic ( for LONGs) make sense at all? or better leave it as is ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16394#issuecomment-1782747969 From mdoerr at openjdk.org Fri Oct 27 11:31:40 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 27 Oct 2023 11:31:40 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: <6Ywn5vjecuN2UtqWT6IdM_EyK4xP4foammAeoJCvxx8=.2ab7ba6f-fb37-4620-96b1-9e39b4e939e4@github.com> Message-ID: On Fri, 27 Oct 2023 10:23:43 GMT, Quan Anh Mai wrote: >> Question: `r12` does not need to remember? >> >> According to [CallingSequences in OpenJDK Wiki](https://wiki.openjdk.org/display/HotSpot/CallingSequences), `r12` may be reserved for HeapBase if COOP is enabled. >> (`r12` is also used in another places in downcallLinker_x86_64.cpp without restoring...) > > @YaSuenag `r12` is restored in `reinit_heapbase()` if needed and no, `r12` does not need remembering because it is a constant and can be restored from somewhere else. I think your code is fine. Restoring `r12_heapbase` at this point is not bad because `runtime_call` is only for slow paths. I don't think it should be moved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1374436854 From omikhaltcova at openjdk.org Fri Oct 27 11:42:40 2023 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Fri, 27 Oct 2023 11:42:40 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics Message-ID: Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. As shown below the output for RISC-V instructions and Java methods differs only for NaN argument. RISC-V Java (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE Output for NaN 2^31 ? 1 2^63 - 1 0 0 The benchmark shows the following performance improvement: **Before** Benchmark (TESTSIZE) Mode Cnt Score Error Units FpRoundingBenchmark.test_round_double 2048 thrpt 15 4.675 ? 0.259 ops/ms FpRoundingBenchmark.test_round_float 2048 thrpt 15 4.549 ? 0.210 ops/ms **After** Benchmark (TESTSIZE) Mode Cnt Score Error Units FpRoundingBenchmark.test_round_double 2048 thrpt 15 10.483 ? 0.681 ops/ms FpRoundingBenchmark.test_round_float 2048 thrpt 15 10.475 ? 0.480 ops/ms Testing: tier1 tests successfully passed on a RISC-V HiFive board with Linux. ------------- Commit messages: - 8318158: RISC-V: implement roundD/roundF intrinsics Changes: https://git.openjdk.org/jdk/pull/16382/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16382&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318158 Stats: 31 lines in 3 files changed: 25 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16382/head:pull/16382 PR: https://git.openjdk.org/jdk/pull/16382 From stuefe at openjdk.org Fri Oct 27 11:46:30 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 27 Oct 2023 11:46:30 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 17:09:04 GMT, Thomas Stuefe wrote: >> MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. >> >> We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. >> >> As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. > > Yes, let it blow up then. > @tstuefe: I helped analyzing this problem by writing a plain c test pgm mmaping a page and trying to read beyound. On AIX as expected the very next byte after the requested region leads to a segmentation violation, but on linux (both flavours, linuxintel and linuxppc64) I was able to read exactly 20 KB beyond, before running into segmentation violation. This might be the reason, why the developer of print_pointer_information() was not aware that he creates code that could crash. Thomas, do you have an idea why linux (and maybe other platforms) map more memory as requested? It has nothing to do with memory pages. The additional memory does not end at the next memory page boundary, but exactly 20KB after the end of the requested region. Astonishing is that at the lower end of the region there is no extra memory accessible. Plain bad luck and rare test execution. Whether or not you can read over the end of an mmaped segment depends on whether there are VMAs mapped beyond that. Linux kernel clusters VMAs to keep VMA fragmentation down. So you may have adjacent mappings. This is subject to ASLR of course but I always see the VMAs pretty much clustered regardless of ASLR. And though this sounds like it should be random, you can have a pretty consistent order of VMA allocation across many VM runs, and therefore similarly looking process memory maps. Just looking at /proc/pid/maps will be probably clearer to you than me explaining it :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1782772667 From aph at openjdk.org Fri Oct 27 11:51:47 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Oct 2023 11:51:47 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v16] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Thu, 26 Oct 2023 17:42:39 GMT, Vladimir Ivanov wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove accidental include > > make/test/JtregNativeHotspot.gmk line 854: > >> 852: BUILD_HOTSPOT_JTREG_EXECUTABLES_LIBS_exeFPRegs := -ldl >> 853: BUILD_HOTSPOT_JTREG_LIBRARIES_LIBS_libAsyncGetCallTraceTest := -ldl >> 854: BUILD_HOTSPOT_JTREG_LIBRARIES_LDFLAGS_libfast-math := -ffast-math > > Is the flag redundant by now? The test explicitly works with corresponding platform-specific registers. it's belt-and braces (or belt-and-suspenders in American). This way, we can still do something useful for platforms not fully supported by mainline. I don't want to add ifdefs for every platform to the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10661#discussion_r1374458024 From aph at openjdk.org Fri Oct 27 11:51:44 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Oct 2023 11:51:44 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v15] In-Reply-To: <3pqaeZsqg2XlpZvNMJ9qjAUq5PLPnOR3wCrDesXsvHM=.c6a7de5f-c93c-435a-8fed-78fe5abab22b@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <3pqaeZsqg2XlpZvNMJ9qjAUq5PLPnOR3wCrDesXsvHM=.c6a7de5f-c93c-435a-8fed-78fe5abab22b@github.com> Message-ID: On Thu, 26 Oct 2023 15:59:08 GMT, Thomas Stuefe wrote: > One more thought, it would be good to add the FTZ_mode_enabled check to `os::run_periodic_checks()`. > > We already do signal handler checks there, and it is the right place to check for "global things third party native code may mess up". It runs when one uses `-XX:CheckJNICalls`. If a native library messes with fenv, one will get a delayed assertion, with a hs-err file that lists all the shared objects. That's a terrific idea, but maybe it'd want a CSR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1782780145 From stuefe at openjdk.org Fri Oct 27 11:55:33 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 27 Oct 2023 11:55:33 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 11:43:28 GMT, Thomas Stuefe wrote: > > @tstuefe: I helped analyzing this problem by writing a plain c test pgm mmaping a page and trying to read beyound. On AIX as expected the very next byte after the requested region leads to a segmentation violation, but on linux (both flavours, linuxintel and linuxppc64) I was able to read exactly 20 KB beyond, before running into segmentation violation. This might be the reason, why the developer of print_pointer_information() was not aware that he creates code that could crash. Thomas, do you have an idea why linux (and maybe other platforms) map more memory as requested? It has nothing to do with memory pages. The additional memory does not end at the next memory page boundary, but exactly 20KB after the end of the requested region. Astonishing is that at the lower end of the region there is no extra memory accessible. > > Plain bad luck and rare test execution. > > Whether or not you can read over the end of an mmaped segment depends on whether there are VMAs mapped beyond that. Linux kernel clusters VMAs to keep VMA fragmentation down. So you may have adjacent mappings. This is subject to ASLR of course but I always see the VMAs pretty much clustered regardless of ASLR. And though this sounds like it should be random, you can have a pretty consistent order of VMA allocation across many VM runs, and therefore similarly looking process memory maps. > > Just looking at /proc/pid/maps will be probably clearer to you than me explaining it :) Just read your answer and I see you wrote a little test program that shows that behavior, probably single threaded. The same explanation applies here though, unless there is a simple bug. For instance, if - after the mmap - you do something that allocates C-heap, the libc may allocate a new arena with mmap and place it just beyond your mmaped region. A lot of libc functions need C-heap under the hood (e.g. calling C assert() will allocate C-heap to assemble the assert line). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1782784646 From aph at openjdk.org Fri Oct 27 11:59:59 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Oct 2023 11:59:59 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v18] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <6BxIuSlNcRQh3Lb2XhvRh0UejTh0AHa8wRLGNi7nQWI=.2ce874d6-475a-4044-9b1c-e9806484b760@github.com> > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/30a1381d..53388f97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=16-17 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From fjiang at openjdk.org Fri Oct 27 12:11:34 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 27 Oct 2023 12:11:34 GMT Subject: RFR: 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit [v2] In-Reply-To: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> References: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> Message-ID: On Fri, 27 Oct 2023 07:30:03 GMT, Gui Cao wrote: >> Hi, The current test_bit assembly function needs to accept a temporary register because it needs one if it goes to the andi else branch. However, in this case we can actually avoid calling andi and accomplish the same thing by logically shifting to the right and testing the lowest bit. The advantage is that it makes the test_bit function much simpler. Also, to reduce the number of instructions in a given case (consider the mv function), mv actually calls the li function, which generates more than one instruction when the parameter imm exceeds the 32-bit range. >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L2009-L2017 >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp#L730 >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L804-L840 >> >> ### Testing: >> qemu 8.1.50: >> - [ ] Tier1 tests (fastdebug) >> - [ ] Tier2 tests (release) >> - [ ] Tier3 tests (release) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use and_imm12 to replace andi in test_bit Looks good, thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/16391#pullrequestreview-1701637222 From simonis at openjdk.org Fri Oct 27 12:13:39 2023 From: simonis at openjdk.org (Volker Simonis) Date: Fri, 27 Oct 2023 12:13:39 GMT Subject: Integrated: 8318811: Compiler directives parser swallows a character after line comments In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 11:46:10 GMT, Volker Simonis wrote: > Currently, the following valid compiler directive file: > > [{ > match: "*::*", > c2: { Exclude: true } // c1 only for startup > }] > > will be rejected by the parser: > > Syntax error on line 4 byte 2: Expected value separator or object end (one of ',}'). > At ']'. > }] > > Parsing of compiler directives failed > > > This is because `JSON::skip_line_comment()`, in contradiction to its specification, does **not** "*return the first token after the line comment without consuming it*" but does consumes it. > > The fix is trivial: > > --- a/src/hotspot/share/utilities/json.cpp > +++ b/src/hotspot/share/utilities/json.cpp > @@ -580,7 +580,7 @@ u_char JSON::skip_line_comment() { > return 0; > } > next(); > - return next(); > + return peek(); > } This pull request has now been integrated. Changeset: 141dae8b Author: Volker Simonis URL: https://git.openjdk.org/jdk/commit/141dae8b76d41accfa02a0250a1c24364cbf6f25 Stats: 21 lines in 2 files changed: 20 ins; 0 del; 1 mod 8318811: Compiler directives parser swallows a character after line comments Reviewed-by: shade, phh ------------- PR: https://git.openjdk.org/jdk/pull/16359 From duke at openjdk.org Fri Oct 27 12:30:45 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Fri, 27 Oct 2023 12:30:45 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v3] In-Reply-To: References: Message-ID: > MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. > > We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. > > As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. Thomas Obermeier has updated the pull request incrementally with one additional commit since the last revision: 8306561: forgot to remove include at revert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16381/files - new: https://git.openjdk.org/jdk/pull/16381/files/1a078ab5..8df7c091 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16381&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16381&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16381/head:pull/16381 PR: https://git.openjdk.org/jdk/pull/16381 From jkern at openjdk.org Fri Oct 27 12:39:32 2023 From: jkern at openjdk.org (Joachim Kern) Date: Fri, 27 Oct 2023 12:39:32 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 11:52:50 GMT, Thomas Stuefe wrote: >>> @tstuefe: I helped analyzing this problem by writing a plain c test pgm mmaping a page and trying to read beyound. On AIX as expected the very next byte after the requested region leads to a segmentation violation, but on linux (both flavours, linuxintel and linuxppc64) I was able to read exactly 20 KB beyond, before running into segmentation violation. This might be the reason, why the developer of print_pointer_information() was not aware that he creates code that could crash. Thomas, do you have an idea why linux (and maybe other platforms) map more memory as requested? It has nothing to do with memory pages. The additional memory does not end at the next memory page boundary, but exactly 20KB after the end of the requested region. Astonishing is that at the lower end of the region there is no extra memory accessible. >> >> Plain bad luck and rare test execution. >> >> Whether or not you can read over the end of an mmaped segment depends on whether there are VMAs mapped beyond that. Linux kernel clusters VMAs to keep VMA fragmentation down. So you may have adjacent mappings. This is subject to ASLR of course but I always see the VMAs pretty much clustered regardless of ASLR. And though this sounds like it should be random, you can have a pretty consistent order of VMA allocation across many VM runs, and therefore similarly looking process memory maps. >> >> Just looking at /proc/pid/maps will be probably clearer to you than me explaining it :) > >> > @tstuefe: I helped analyzing this problem by writing a plain c test pgm mmaping a page and trying to read beyound. On AIX as expected the very next byte after the requested region leads to a segmentation violation, but on linux (both flavours, linuxintel and linuxppc64) I was able to read exactly 20 KB beyond, before running into segmentation violation. This might be the reason, why the developer of print_pointer_information() was not aware that he creates code that could crash. Thomas, do you have an idea why linux (and maybe other platforms) map more memory as requested? It has nothing to do with memory pages. The additional memory does not end at the next memory page boundary, but exactly 20KB after the end of the requested region. Astonishing is that at the lower end of the region there is no extra memory accessible. >> >> Plain bad luck and rare test execution. >> >> Whether or not you can read over the end of an mmaped segment depends on whether there are VMAs mapped beyond that. Linux kernel clusters VMAs to keep VMA fragmentation down. So you may have adjacent mappings. This is subject to ASLR of course but I always see the VMAs pretty much clustered regardless of ASLR. And though this sounds like it should be random, you can have a pretty consistent order of VMA allocation across many VM runs, and therefore similarly looking process memory maps. >> >> Just looking at /proc/pid/maps will be probably clearer to you than me explaining it :) > > Just read your answer and I see you wrote a little test program that shows that behavior, probably single threaded. The same explanation applies here though, unless there is a simple bug. > > For instance, if - after the mmap - you do something that allocates C-heap, the libc may allocate a new arena with mmap and place it just beyond your mmaped region. A lot of libc functions need C-heap under the hood (e.g. calling C assert() will allocate C-heap to assemble the assert line). @tstuefe: Meanwhile I had also asked Klaus M?ser who gave me a similar explanation and we both together examined /proc/pid/maps to understand that a runtime library had allocated the memory beyond my mmap region. Klaus also told me that AIX has an mmap alignment of 256GB, so it is very likely that the addresses beyond and below your mapped region are not used, explaining the segmentation violation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1782843697 From stuefe at openjdk.org Fri Oct 27 12:43:46 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 27 Oct 2023 12:43:46 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v15] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <3pqaeZsqg2XlpZvNMJ9qjAUq5PLPnOR3wCrDesXsvHM=.c6a7de5f-c93c-435a-8fed-78fe5abab22b@github.com> Message-ID: <-l90x8BnvO4iPfXfjKqqLRqA1U-J3soXjgjeRFSPTd4=.d74a7085-9506-4b98-8295-1733d3947720@github.com> On Fri, 27 Oct 2023 11:49:16 GMT, Andrew Haley wrote: > > One more thought, it would be good to add the FTZ_mode_enabled check to `os::run_periodic_checks()`. > > We already do signal handler checks there, and it is the right place to check for "global things third party native code may mess up". It runs when one uses `-XX:CheckJNICalls`. If a native library messes with fenv, one will get a delayed assertion, with a hs-err file that lists all the shared objects. > > That's a terrific idea, but maybe it'd want a CSR? Maybe. CheckJNICalls is a product switch and its behavior has not changed since initial check-in AFAICS. Since CSR-freeze is close for JDK 22, maybe this is better as a separate RFE. If you want, I can take care of it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1782849255 From duke at openjdk.org Fri Oct 27 12:44:49 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Fri, 27 Oct 2023 12:44:49 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v4] In-Reply-To: References: Message-ID: > MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. > > We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. > > As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. Thomas Obermeier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into JDK-8306561 - 8306561: forgot to remove include at revert - 8306561: move solution to caller - 8306561: copyright and problem listing - JDK-8306561: test canary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16381/files - new: https://git.openjdk.org/jdk/pull/16381/files/8df7c091..ddd8661b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16381&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16381&range=02-03 Stats: 9523 lines in 818 files changed: 4241 ins; 2070 del; 3212 mod Patch: https://git.openjdk.org/jdk/pull/16381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16381/head:pull/16381 PR: https://git.openjdk.org/jdk/pull/16381 From stuefe at openjdk.org Fri Oct 27 12:51:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 27 Oct 2023 12:51:37 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v4] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 12:44:49 GMT, Thomas Obermeier wrote: >> MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. >> >> We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. >> >> As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. > > Thomas Obermeier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8306561 > - 8306561: forgot to remove include at revert > - 8306561: move solution to caller > - 8306561: copyright and problem listing > - JDK-8306561: test canary Okay (if tests are green). ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16381#pullrequestreview-1701715239 From clanger at openjdk.org Fri Oct 27 13:21:38 2023 From: clanger at openjdk.org (Christoph Langer) Date: Fri, 27 Oct 2023 13:21:38 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v4] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 12:44:49 GMT, Thomas Obermeier wrote: >> MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. >> >> We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. >> >> As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. > > Thomas Obermeier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8306561 > - 8306561: forgot to remove include at revert > - 8306561: move solution to caller > - 8306561: copyright and problem listing > - JDK-8306561: test canary LGTM but wait for the nightly testing results. src/hotspot/share/nmt/mallocHeader.inline.hpp line 3: > 1: /* > 2: * Copyright (c) 2014, 2023, Oracle and/or its affiliates. All rights reserved. > 3: * Copyright (c) 2021, 2023 SAP SE. All rights reserved. The copyright change in this file is obsolete now. ------------- Marked as reviewed by clanger (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16381#pullrequestreview-1701770444 PR Review Comment: https://git.openjdk.org/jdk/pull/16381#discussion_r1374563541 From aph at openjdk.org Fri Oct 27 13:39:44 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Oct 2023 13:39:44 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v15] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Thu, 26 Oct 2023 15:41:35 GMT, Thomas Stuefe wrote: > This looks good to me. > > One suggestion: to reduce code duplication and to make the code a bit safer against accidental returns prior to fesetenv, I would have used a mark object like this: Thanks. I take your point, but I think that might be something of a premature generalization right now. It would further complicate the code. > About the dlopen calls in the JDK, at SAP we were faced with similar problems for other libc APIs (how to apply a fix to all of them). Some of these issues we solved by redirecting all calls to libjvm. Others we solved manually, in-place, with a lot of duplication. None of these sound appealing, but I like the redirect-to-libjvm route somewhat, if Oracle can be convinced. > > A third option would be to use an interposition library with LD_PRELOAD. One that overwrites dlopen and redirects to the real one. I don't see this to be a practical solution but it may be valid for testing. Thanks, interesting stuff. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1782933843 From matsaave at openjdk.org Fri Oct 27 13:56:00 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 27 Oct 2023 13:56:00 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v5] In-Reply-To: References: Message-ID: <9upaBNtcWpIBCafV9sVSVV3f7ZYGnkbIC1Zl5uZ8NTA=.4e617a4d-ebbd-48f8-adff-364d8605d21c@github.com> > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Coleen and Fei comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15455/files - new: https://git.openjdk.org/jdk/pull/15455/files/cd867fa9..ff7b3db9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=03-04 Stats: 23 lines in 6 files changed: 1 ins; 6 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From duke at openjdk.org Fri Oct 27 13:57:56 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Fri, 27 Oct 2023 13:57:56 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v4] In-Reply-To: References: Message-ID: <7k7nxijePhWhaTz7COYA9HHcoVy-00Gj4dZSW3WSEb4=.2c3b94ab-ce84-4cbd-8ee6-78173276d791@github.com> On Fri, 27 Oct 2023 13:17:44 GMT, Christoph Langer wrote: >> Thomas Obermeier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/master' into JDK-8306561 >> - 8306561: forgot to remove include at revert >> - 8306561: move solution to caller >> - 8306561: copyright and problem listing >> - JDK-8306561: test canary > > src/hotspot/share/nmt/mallocHeader.inline.hpp line 3: > >> 1: /* >> 2: * Copyright (c) 2014, 2023, Oracle and/or its affiliates. All rights reserved. >> 3: * Copyright (c) 2021, 2023 SAP SE. All rights reserved. > > The copyright change in this file is obsolete now. done, thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16381#discussion_r1374621745 From duke at openjdk.org Fri Oct 27 13:57:53 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Fri, 27 Oct 2023 13:57:53 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v5] In-Reply-To: References: Message-ID: > MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. > > We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. > > As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. Thomas Obermeier has updated the pull request incrementally with one additional commit since the last revision: Update mallocHeader.inline.hpp - revert obsolete copyright change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16381/files - new: https://git.openjdk.org/jdk/pull/16381/files/ddd8661b..c831830d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16381&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16381&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16381/head:pull/16381 PR: https://git.openjdk.org/jdk/pull/16381 From ayang at openjdk.org Fri Oct 27 14:03:37 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 27 Oct 2023 14:03:37 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v33] In-Reply-To: <2VxMwwQKNjq7EeXnyQ7fo_aXYj2EGCJyW2MrAUkoSQs=.902af266-1470-43ed-8e4c-a8b7611cc154@github.com> References: <2VxMwwQKNjq7EeXnyQ7fo_aXYj2EGCJyW2MrAUkoSQs=.902af266-1470-43ed-8e4c-a8b7611cc154@github.com> Message-ID: <9tV7khSiXH_2_Ju1_egmea6dyYQMC6HKSmcfblg0xSw=.18b97f9d-0fda-4043-8d71-eac0e857fd19@github.com> On Thu, 26 Oct 2023 21:01:53 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Remove StringDedup from GC thread list > For tools like AHS that reads these CPU hsperf counters, they could read these CPU data every 1 to 5 seconds. Okay, these counters can be accessed frequently, but is it necessary for them to provide up-to-date information on every access? If not, what level of delay is acceptable? I assume this depends on how often AHS resizes the heap. (In the Parallel case, I believe the counters can be outdated for the duration of the full-gc.) My primary concern is that the change in G1 is too intrusive -- the logic for tracking/collecting thread-CPU is scattered in many places. Additionally, the situation is expected to worsen in the near future, based on the statement "I can create a separate RFE to make it update more frequently..." Also, why isn't `G1ServiceThread` part of the change? I would expect all subclasses of `ConcurrentGCThread` to be taken into account. Is this omission intentional? Finally, thread-CPU time is something tracked at the OS level, so it's a bit odd that one has to instrument the VM to get that information. > - Looking into the `/proc/` method, it also is not super intuitive as to which value corresponds to GC time, and it might be difficult to reliably identify all PIDs of the relevant GC threads. According to https://man7.org/linux/man-pages/man5/proc.5.html, "(14) utime" + "(15) stime %lu" == thread-cpu-time. `cat /proc//task/*/stat` lists all VM internal threads, including GC, JIT, and etc. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1782969884 From aph at openjdk.org Fri Oct 27 14:22:36 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Oct 2023 14:22:36 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v9] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 16:11:04 GMT, Thomas Stuefe wrote: >> Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. >> >> Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. >> >> >> 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 >> 8b7b69: 0f b6 00 movzbl (%rax),%eax >> 8b7b6c: 84 c0 test %al,%al >> 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> >> 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi >> 8b7b7e: 8b 0a mov (%rdx),%ecx >> 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> >> 8b7b87: 48 d3 e7 shl %cl,%rdi >> 8b7b8a: 48 03 3a add (%rdx),%rdi >> >> >> Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. >> >> >> 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> >> 8ba309: 48 8b 08 mov (%rax),%rcx >> 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? >> 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi >> 8ba318: 48 d3 e7 shl %cl,%rdi # shift >> 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base >> 8ba31e: 48 01 cf add %rcx,%rdi # add base >> 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx >> >> --- >> >> Performance measurements: >> >> G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. >> >> I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. >> >> --- >> >> Future extensions: >> >> This patch uses the fact that the encoding base is aligned to metaspace reser... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Renamed _combo > - Merge branch 'master' into optimize-narrow-klass-decoding-in-c++ > - simplify assert > - add comment > - Update src/hotspot/share/oops/compressedKlass.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/oops/compressedKlass.cpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/oops/compressedKlass.cpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/oops/compressedKlass.cpp > > Co-authored-by: Aleksey Shipil?v > - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ > - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ > - ... and 6 more: https://git.openjdk.org/jdk/compare/9864951d...56cde2a9 Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15389#pullrequestreview-1701914463 From luhenry at openjdk.org Fri Oct 27 14:27:31 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 27 Oct 2023 14:27:31 GMT Subject: RFR: 8318225: RISC-V: C2 UModI In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 09:15:11 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UModI and UModL? > ( This is a quite similar patch to https://github.com/openjdk/jdk/pull/16346, which addresses UDivI and UDivL, so for the performance consideration please also check the discussion in that pr. ) > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we remainderUnsigned` and `grep -nr test/hotspot/ -we remainderUnsigned` > > ### Performance > > #### Long > **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** > > **Before** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 21222.911 ? 57.735 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 28841.429 ? 6.294 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op > > > **After** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 22666.448 ? 34.986 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 15967.846 ? 24.805 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 29507.865 ? 20.593 ns/op > > > #### Integer > **Before** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23396.475 ? 24.065 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16796.796 ? 3.389 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30159.407 ? 6.716 ns/op > > > **After** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23216.710 ? 14.351 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16621.374 ? 3.203 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30002.088 ? 41.212 ns/op Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16394#pullrequestreview-1701927677 From ysuenaga at openjdk.org Fri Oct 27 14:41:46 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Fri, 27 Oct 2023 14:41:46 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: <6Ywn5vjecuN2UtqWT6IdM_EyK4xP4foammAeoJCvxx8=.2ab7ba6f-fb37-4620-96b1-9e39b4e939e4@github.com> Message-ID: On Fri, 27 Oct 2023 11:28:29 GMT, Martin Doerr wrote: >> @YaSuenag `r12` is restored in `reinit_heapbase()` if needed and no, `r12` does not need remembering because it is a constant and can be restored from somewhere else. > > I think your code is fine. Restoring `r12_heapbase` at this point is not bad because `runtime_call` is only for slow paths. I don't think it should be moved. Thanks everyone! I understood `r12` is restored in `reinit_heapbase()`. It is necessary. I have no comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1374678254 From rehn at openjdk.org Fri Oct 27 14:51:32 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 27 Oct 2023 14:51:32 GMT Subject: RFR: 8318225: RISC-V: C2 UModI In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 09:15:11 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UModI and UModL? > ( This is a quite similar patch to https://github.com/openjdk/jdk/pull/16346, which addresses UDivI and UDivL, so for the performance consideration please also check the discussion in that pr. ) > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we remainderUnsigned` and `grep -nr test/hotspot/ -we remainderUnsigned` > > ### Performance > > #### Long > **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** > > **Before** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 21222.911 ? 57.735 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 28841.429 ? 6.294 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op > > > **After** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 22666.448 ? 34.986 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 15967.846 ? 24.805 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 29507.865 ? 20.593 ns/op > > > #### Integer > **Before** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23396.475 ? 24.065 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16796.796 ? 3.389 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30159.407 ? 6.716 ns/op > > > **After** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23216.710 ? 14.351 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16621.374 ? 3.203 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30002.088 ? 41.212 ns/op As you already discussed all other things in other PR, thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16394#pullrequestreview-1701975487 From aph-open at littlepinkcloud.com Fri Oct 27 14:52:30 2023 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Fri, 27 Oct 2023 15:52:30 +0100 Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v15] In-Reply-To: <-l90x8BnvO4iPfXfjKqqLRqA1U-J3soXjgjeRFSPTd4=.d74a7085-9506-4b98-8295-1733d3947720@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <3pqaeZsqg2XlpZvNMJ9qjAUq5PLPnOR3wCrDesXsvHM=.c6a7de5f-c93c-435a-8fed-78fe5abab22b@github.com> <-l90x8BnvO4iPfXfjKqqLRqA1U-J3soXjgjeRFSPTd4=.d74a7085-9506-4b98-8295-1733d3947720@github.com> Message-ID: <0ea2a93f-4751-4c9b-b619-23897eb49e3e@littlepinkcloud.com> On 10/27/23 13:43, Thomas Stuefe wrote: > On Fri, 27 Oct 2023 11:49:16 GMT, Andrew Haley wrote: > >>> One more thought, it would be good to add the FTZ_mode_enabled check to `os::run_periodic_checks()`. >>> We already do signal handler checks there, and it is the right place to check for "global things third party native code may mess up". It runs when one uses `-XX:CheckJNICalls`. If a native library messes with fenv, one will get a delayed assertion, with a hs-err file that lists all the shared objects. >> >> That's a terrific idea, but maybe it'd want a CSR? > > Maybe. CheckJNICalls is a product switch and its behavior has not changed since initial check-in AFAICS. > > Since CSR-freeze is close for JDK 22, maybe this is better as a separate RFE. If you want, I can take care of it. I think so. This patch has already suffered greatly from "Wouldn't it be nice if..." From luhenry at openjdk.org Fri Oct 27 15:09:33 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 27 Oct 2023 15:09:33 GMT Subject: RFR: 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit [v2] In-Reply-To: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> References: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> Message-ID: On Fri, 27 Oct 2023 07:30:03 GMT, Gui Cao wrote: >> Hi, The current test_bit assembly function needs to accept a temporary register because it needs one if it goes to the andi else branch. However, in this case we can actually avoid calling andi and accomplish the same thing by logically shifting to the right and testing the lowest bit. The advantage is that it makes the test_bit function much simpler. Also, to reduce the number of instructions in a given case (consider the mv function), mv actually calls the li function, which generates more than one instruction when the parameter imm exceeds the 32-bit range. >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L2009-L2017 >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp#L730 >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L804-L840 >> >> ### Testing: >> qemu 8.1.50: >> - [ ] Tier1 tests (fastdebug) >> - [ ] Tier2 tests (release) >> - [ ] Tier3 tests (release) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use and_imm12 to replace andi in test_bit Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16391#pullrequestreview-1702008218 From luhenry at openjdk.org Fri Oct 27 15:21:35 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 27 Oct 2023 15:21:35 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics In-Reply-To: References: Message-ID: <5eO6y-Uz08yJut6hUzmlIX19mqnZJwmJ9QyWFegQqYs=.eb1544fa-30cd-4104-b267-92b815c93765@github.com> On Thu, 26 Oct 2023 17:20:49 GMT, Olga Mikhaltsova wrote: > Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. > As shown below the output for RISC-V instructions and Java methods differs only for NaN argument. > > RISC-V Java > (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) > Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for NaN 2^31 ? 1 2^63 - 1 0 0 > > The benchmark shows the following performance improvement: > > **Before** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 4.675 ? 0.259 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 4.549 ? 0.210 ops/ms > > > **After** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 10.483 ? 0.681 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 10.475 ? 0.480 ops/ms > > > Testing: tier1 tests successfully passed on a RISC-V HiFive board with Linux. On wording, `RoundingMode::rne` says "round to Nearest, ties to Even", while `Math.round(float)` says "round to Neares, ties to positive infinity". Are these equivalent? Do we have a test covering that? ------------- PR Review: https://git.openjdk.org/jdk/pull/16382#pullrequestreview-1702031459 From aph at openjdk.org Fri Oct 27 15:21:36 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Oct 2023 15:21:36 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics In-Reply-To: References: Message-ID: <5xO3wgLcq5NQto-WHDIQDvWiGwUmQkLenL2Gzmm-IdQ=.096879a4-c518-4ead-812f-fa2242f39fc9@github.com> On Thu, 26 Oct 2023 17:20:49 GMT, Olga Mikhaltsova wrote: > Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. As shown below the output for RISC-V instructions and Java methods differs only for NaN argument. I doubt that. Check the result for all x in float, x < 0 && abs(x) < 0x1.0p23f ------------- PR Comment: https://git.openjdk.org/jdk/pull/16382#issuecomment-1783093253 From lmesnik at openjdk.org Fri Oct 27 16:19:30 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 27 Oct 2023 16:19:30 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: On Fri, 27 Oct 2023 05:55:43 GMT, David Holmes wrote: >> It is still used in tests and we should ignore it like jtreg doing. > > Shouldn't this code first retrieve the current default exception handler, and then check whether t is a virtual thread, and if so handle the exception as appropriate (not sure System.exit is appropriate ..). Then for non-virtual threads it delegates to the previous default handler. > > final UncaughtExceptionHandler originalUEH = Thread.getDefaultUncaughtExceptionHandler(); > Thread.setDefaultUncaughtExceptionHandler((t, e) -> { > if (t.isVirtual()) { > // ... > } else { > originalUEH.uncaughtException(t, e); > } > }); There shouldn't be an original UHE. jtreg doesn't set it. The goal is to add this handler for all threads, not only virtual. Please note, that it is planned to add it only until the similar problem in jtreg is completely fixed. I thought to add something like Thread.setDefaultUncaughtExceptionHandler((t, e) -> { UHE.ecxeptionThrown = e; } ... @Override public Thread newThread(Runnable task) { return VIRTUAL_TF.newThread(new Runnable() -> { task.run(); if (UHE.ecxeptionThrown != null) { throw new RuntimeException(UHE.ecxeptionThrown); } ); } } So test actually throws exceptions. It might miss exceptions for threads that finish later than the main thread. but might it is ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16369#discussion_r1374790707 From matsaave at openjdk.org Fri Oct 27 16:44:50 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 27 Oct 2023 16:44:50 GMT Subject: RFR: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp Message-ID: Calls in instanceKlass.cpp and unsafe.cpp try to call an atomic load on method calls that could return nullptr. This patch ensures that nullptr is not passed into the load. Verified with tier1-5 tests. ------------- Commit messages: - 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp Changes: https://git.openjdk.org/jdk/pull/16405/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16405&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315890 Stats: 4 lines in 2 files changed: 2 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16405.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16405/head:pull/16405 PR: https://git.openjdk.org/jdk/pull/16405 From cslucas at openjdk.org Fri Oct 27 16:57:49 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 27 Oct 2023 16:57:49 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v2] In-Reply-To: References: Message-ID: > ### Description > > Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. > > Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. > > The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. > > ### Benchmarking > > **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. > **Note 2:** Marging of error was negligible. > > | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | > |--------------------------------------|------------------|-------------------| > | TestTrapAfterMerge | 19.515 | 13.386 | > | TestArgEscape | 33.165 | 33.254 | > | TestCallTwoSide | 70.547 | 69.427 | > | TestCmpAfterMerge | 16.400 | 2.984 | > | TestCmpMergeWithNull_Second | 27.204 | 27.293 | > | TestCmpMergeWithNull | 8.248 | 4.920 | > | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | > | TestCondAfterMergeWithNull | 6.265 | 5.078 | > | TestCondLoadAfterMerge | 12.713 | 5.163 | > | TestConsecutiveSimpleMerge | 30.863 | 4.068 | > | TestDoubleIfElseMerge | 16.069 | 2.444 | > | TestEscapeInCallAfterMerge | 23.111 | 22.924 | > | TestGlobalEscape | 14.459 | 14.425 | > | TestIfElseInLoop | 246.061 | 42.786 | > | TestLoadAfterLoopAlias | 45.808 | 45.812 | > | TestLoadAfterTrap | 28.370 | ... Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix to prevent creating NULL ConNKlass constants. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15825/files - new: https://git.openjdk.org/jdk/pull/15825/files/257e0447..03016c96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=00-01 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15825.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15825/head:pull/15825 PR: https://git.openjdk.org/jdk/pull/15825 From dnsimon at openjdk.org Fri Oct 27 17:14:58 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 27 Oct 2023 17:14:58 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v2] In-Reply-To: References: Message-ID: > This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: CompilerThreadCanCallJava scope must enclose the JVMCIEnv scope ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16383/files - new: https://git.openjdk.org/jdk/pull/16383/files/9fc5ad9e..7f2285ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=00-01 Stats: 6 lines in 1 file changed: 5 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16383/head:pull/16383 PR: https://git.openjdk.org/jdk/pull/16383 From coleenp at openjdk.org Fri Oct 27 18:18:33 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 27 Oct 2023 18:18:33 GMT Subject: RFR: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 15:40:58 GMT, Matias Saavedra Silva wrote: > Calls in instanceKlass.cpp and unsafe.cpp try to call an atomic load on method calls that could return nullptr. This patch ensures that nullptr is not passed into the load. Verified with tier1-5 tests. Looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16405#pullrequestreview-1702314539 From dnsimon at openjdk.org Fri Oct 27 18:19:33 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 27 Oct 2023 18:19:33 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v2] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 17:14:58 GMT, Doug Simon wrote: >> This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > CompilerThreadCanCallJava scope must enclose the JVMCIEnv scope src/hotspot/share/classfile/javaClasses.cpp line 2456: > 2454: print_stack_element_to_stream(st, bte._mirror, bte._method_id, bte._version, bte._bci, bte._name); > 2455: } > 2456: if (THREAD->can_call_java()) { This allows `java_lang_Throwable::print_stack_trace` to be used in contexts where Java calls cannot be made. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16383#discussion_r1374905848 From dnsimon at openjdk.org Fri Oct 27 18:29:45 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 27 Oct 2023 18:29:45 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v3] In-Reply-To: References: Message-ID: > This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: improved error message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16383/files - new: https://git.openjdk.org/jdk/pull/16383/files/7f2285ed..af42062f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16383/head:pull/16383 PR: https://git.openjdk.org/jdk/pull/16383 From coleenp at openjdk.org Fri Oct 27 19:20:57 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 27 Oct 2023 19:20:57 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v61] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 10:31:22 GMT, Jorn Vernee wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: >> >> - Merge branch 'master' into JDK-8139457 >> - Fix ARM build >> - Merge remote-tracking branch 'upstream/master' into JDK-8139457 >> - Various cleanups >> - RISC changes >> - Move gap init into allocate_header() (x86) >> - Fix gtest failure on x86 >> - Merge remote-tracking branch 'upstream/master' into JDK-8139457 >> - Fix comments >> - Fix inconsistencies in argument naming C1_MacroAssembler::allocate_array() >> - ... and 80 more: https://git.openjdk.org/jdk/compare/9bfa0829...7eaca124 > > Also, I think we can handle the issues with `ByteBuffer::alignedSlice`, `ByteBuffer::alignmentOffset`, `MethodHandles::byteArrayViewVarHandle` and `MethodHandles::byteBufferViewVarHandle` separately, ahead of this PR. > > There is already a bug in the spec of these methods, as they try to give alignment guarantees for heap memory that the VM spec does not support. > > I've filed a couple of JBS issues for these: https://bugs.openjdk.org/browse/JDK-8318967 & https://bugs.openjdk.org/browse/JDK-8318966 @JornVernee thank you for filing these separate issues. @rkennke I noticed that the two files below require copyright updates. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/Universe.java src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Array.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1783392459 From bstafford at openjdk.org Fri Oct 27 21:56:31 2023 From: bstafford at openjdk.org (Brian Stafford) Date: Fri, 27 Oct 2023 21:56:31 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 18:11:45 GMT, Mat Carter wrote: >> Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) >> >> Passes tier1 on linux (x86) and mac (aarch64) > > Mat Carter has updated the pull request incrementally with one additional commit since the last revision: > > fixed return type and changed NULL to nullptr test/jdk/jdk/jfr/event/compiler/TestCompilerQueueUtilization.java line 48: > 46: recording.enable(EVENT_NAME); > 47: recording.start(); > 48: recording.stop(); Would it be useful to delay the recording stop in order to give more of an opportunity for compiler events to occur? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1375076209 From dlong at openjdk.org Fri Oct 27 22:53:32 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 27 Oct 2023 22:53:32 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v5] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 13:57:53 GMT, Thomas Obermeier wrote: >> MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. >> >> We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. >> >> As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. > > Thomas Obermeier has updated the pull request incrementally with one additional commit since the last revision: > > Update mallocHeader.inline.hpp - revert obsolete copyright change src/hotspot/share/nmt/mallocTracker.cpp line 215: > 213: for (; here >= end; here -= smallest_possible_alignment) { > 214: // JDK-8306561: cast to a MallocHeader needs to guarantee it can reside in readable memory > 215: if (!os::is_readable_pointer(here) || !os::is_readable_pointer(here + sizeof(MallocHeader) - 1)) { Would os::is_readable_range be the better choice here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16381#discussion_r1375105235 From cslucas at openjdk.org Fri Oct 27 23:04:47 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 27 Oct 2023 23:04:47 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v3] In-Reply-To: References: Message-ID: > ### Description > > Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. > > Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. > > The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. > > ### Benchmarking > > **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. > **Note 2:** Marging of error was negligible. > > | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | > |--------------------------------------|------------------|-------------------| > | TestTrapAfterMerge | 19.515 | 13.386 | > | TestArgEscape | 33.165 | 33.254 | > | TestCallTwoSide | 70.547 | 69.427 | > | TestCmpAfterMerge | 16.400 | 2.984 | > | TestCmpMergeWithNull_Second | 27.204 | 27.293 | > | TestCmpMergeWithNull | 8.248 | 4.920 | > | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | > | TestCondAfterMergeWithNull | 6.265 | 5.078 | > | TestCondLoadAfterMerge | 12.713 | 5.163 | > | TestConsecutiveSimpleMerge | 30.863 | 4.068 | > | TestDoubleIfElseMerge | 16.069 | 2.444 | > | TestEscapeInCallAfterMerge | 23.111 | 22.924 | > | TestGlobalEscape | 14.459 | 14.425 | > | TestIfElseInLoop | 246.061 | 42.786 | > | TestLoadAfterLoopAlias | 45.808 | 45.812 | > | TestLoadAfterTrap | 28.370 | ... Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix to prevent reducing already reduced Phi ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15825/files - new: https://git.openjdk.org/jdk/pull/15825/files/03016c96..9d09d872 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=01-02 Stats: 23 lines in 2 files changed: 23 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15825.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15825/head:pull/15825 PR: https://git.openjdk.org/jdk/pull/15825 From egahlin at openjdk.org Fri Oct 27 23:22:34 2023 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 27 Oct 2023 23:22:34 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v2] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 18:11:45 GMT, Mat Carter wrote: >> Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) >> >> Passes tier1 on linux (x86) and mac (aarch64) > > Mat Carter has updated the pull request incrementally with one additional commit since the last revision: > > fixed return type and changed NULL to nullptr src/hotspot/share/jfr/metadata/metadata.xml line 857: > 855: > 856: > 857: There is no contentType called count. Please remove. No need to use a description, if it's just a repetition of the name. test/jdk/jdk/jfr/event/compiler/TestCompilerQueueUtilization.java line 2: > 1: /* > 2: * Copyright (c) 2013, 2023, Oracle and/or its affiliates. All rights reserved. Why copyright 2013, the file is new. test/jdk/jdk/jfr/event/compiler/TestCompilerQueueUtilization.java line 45: > 43: > 44: public static void main(String[] args) throws Exception { > 45: Recording recording = new Recording(); Use try-with-resources ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1366464679 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1366477481 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1366464793 From macarte at openjdk.org Sat Oct 28 01:52:08 2023 From: macarte at openjdk.org (Mat Carter) Date: Sat, 28 Oct 2023 01:52:08 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v3] In-Reply-To: References: Message-ID: > Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) > > Passes tier1 on linux (x86) and mac (aarch64) Mat Carter has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16211/files - new: https://git.openjdk.org/jdk/pull/16211/files/6c0b1670..310ce342 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16211&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16211&range=01-02 Stats: 26 lines in 2 files changed: 1 ins; 0 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/16211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16211/head:pull/16211 PR: https://git.openjdk.org/jdk/pull/16211 From macarte at openjdk.org Sat Oct 28 01:52:08 2023 From: macarte at openjdk.org (Mat Carter) Date: Sat, 28 Oct 2023 01:52:08 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v2] In-Reply-To: References: Message-ID: On Fri, 20 Oct 2023 04:55:24 GMT, Erik Gahlin wrote: >> Mat Carter has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed return type and changed NULL to nullptr > > src/hotspot/share/jfr/metadata/metadata.xml line 857: > >> 855: >> 856: >> 857: > > There is no contentType called count. Please remove. > > No need to use a description, if it's just a repetition of the name. -removed contenttType="count" -removed superfluous descriptions > test/jdk/jdk/jfr/event/compiler/TestCompilerQueueUtilization.java line 2: > >> 1: /* >> 2: * Copyright (c) 2013, 2023, Oracle and/or its affiliates. All rights reserved. > > Why copyright 2013, the file is new. copy paste mistake (fixed) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1375145353 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1375145237 From macarte at openjdk.org Sat Oct 28 01:57:35 2023 From: macarte at openjdk.org (Mat Carter) Date: Sat, 28 Oct 2023 01:57:35 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v2] In-Reply-To: References: Message-ID: <1pncdEjKopXaKhU4pvL2Q4M_rR8lOMNjojOxkuArDxo=.550efc65-a08b-4dfe-a4d1-d2467816de24@github.com> On Fri, 20 Oct 2023 04:55:44 GMT, Erik Gahlin wrote: >> Mat Carter has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed return type and changed NULL to nullptr > > test/jdk/jdk/jfr/event/compiler/TestCompilerQueueUtilization.java line 45: > >> 43: >> 44: public static void main(String[] args) throws Exception { >> 45: Recording recording = new Recording(); > > Use try-with-resources Added ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1375145987 From macarte at openjdk.org Sat Oct 28 01:57:37 2023 From: macarte at openjdk.org (Mat Carter) Date: Sat, 28 Oct 2023 01:57:37 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v2] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 21:53:38 GMT, Brian Stafford wrote: >> Mat Carter has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed return type and changed NULL to nullptr > > test/jdk/jdk/jfr/event/compiler/TestCompilerQueueUtilization.java line 48: > >> 46: recording.enable(EVENT_NAME); >> 47: recording.start(); >> 48: recording.stop(); > > Would it be useful to delay the recording stop in order to give more of an opportunity for compiler events to occur? I've followed the pattern for other events that output periodically and don't want to add unnecessary time as that will cause the test suite to take longer, the test will fail if there are no events due to the line 'Events.hasEvents(events)' ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1375145759 From ccheung at openjdk.org Sat Oct 28 06:24:31 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Sat, 28 Oct 2023 06:24:31 GMT Subject: RFR: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 15:40:58 GMT, Matias Saavedra Silva wrote: > Calls in instanceKlass.cpp and unsafe.cpp try to call an atomic load on method calls that could return nullptr. This patch ensures that nullptr is not passed into the load. Verified with tier1-5 tests. LGTM ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16405#pullrequestreview-1702715503 From stuefe at openjdk.org Sat Oct 28 06:26:34 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 28 Oct 2023 06:26:34 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v5] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 22:50:50 GMT, Dean Long wrote: >> Thomas Obermeier has updated the pull request incrementally with one additional commit since the last revision: >> >> Update mallocHeader.inline.hpp - revert obsolete copyright change > > src/hotspot/share/nmt/mallocTracker.cpp line 215: > >> 213: for (; here >= end; here -= smallest_possible_alignment) { >> 214: // JDK-8306561: cast to a MallocHeader needs to guarantee it can reside in readable memory >> 215: if (!os::is_readable_pointer(here) || !os::is_readable_pointer(here + sizeof(MallocHeader) - 1)) { > > Would os::is_readable_range be the better choice here? That would work too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16381#discussion_r1375182306 From fyang at openjdk.org Sat Oct 28 07:58:29 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 28 Oct 2023 07:58:29 GMT Subject: RFR: 8318225: RISC-V: C2 UModI In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 09:15:11 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UModI and UModL? > ( This is a quite similar patch to https://github.com/openjdk/jdk/pull/16346, which addresses UDivI and UDivL, so for the performance consideration please also check the discussion in that pr. ) > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we remainderUnsigned` and `grep -nr test/hotspot/ -we remainderUnsigned` > > ### Performance > > #### Long > **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** > > **Before** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 21222.911 ? 57.735 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 28841.429 ? 6.294 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op > > > **After** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 22666.448 ? 34.986 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 15967.846 ? 24.805 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 29507.865 ? 20.593 ns/op > > > #### Integer > **Before** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23396.475 ? 24.065 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16796.796 ? 3.389 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30159.407 ? 6.716 ns/op > > > **After** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23216.710 ? 14.351 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16621.374 ? 3.203 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30002.088 ? 41.212 ns/op And JMH result on hifive unmatched board for reference: Before: LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 26377.726 ? 535.884 ns/op LongDivMod.testRemainderUnsigned 1024 positive avgt 15 36251.980 ? 23.214 ns/op LongDivMod.testRemainderUnsigned 1024 negative avgt 15 8441.098 ? 9.375 ns/op IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 28501.570 ? 14.628 ns/op IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 20226.145 ? 19.112 ns/op IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 37002.906 ? 34.361 ns/op After: LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 27140.089 ? 17.391 ns/op LongDivMod.testRemainderUnsigned 1024 positive avgt 15 18772.742 ? 18.943 ns/op LongDivMod.testRemainderUnsigned 1024 negative avgt 15 35780.310 ? 28.929 ns/op IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 27864.893 ? 15.551 ns/op IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 19585.612 ? 17.651 ns/op IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 36366.167 ? 18.933 ns/op ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16394#pullrequestreview-1702731475 From mli at openjdk.org Sat Oct 28 09:15:39 2023 From: mli at openjdk.org (Hamlin Li) Date: Sat, 28 Oct 2023 09:15:39 GMT Subject: RFR: 8318225: RISC-V: C2 UModI In-Reply-To: References: Message-ID: On Sat, 28 Oct 2023 07:55:36 GMT, Fei Yang wrote: >> Hi, >> Can you review the change to add intrinsic for UModI and UModL? >> ( This is a quite similar patch to https://github.com/openjdk/jdk/pull/16346, which addresses UDivI and UDivL, so for the performance consideration please also check the discussion in that pr. ) >> Thanks! >> >> >> ## Tests >> >> ### Functionality >> Run tests successfully found via `grep -nr test/jdk/ -we remainderUnsigned` and `grep -nr test/hotspot/ -we remainderUnsigned` >> >> ### Performance >> >> #### Long >> **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** >> >> **Before** >> >> LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 21222.911 ? 57.735 ns/op >> LongDivMod.testRemainderUnsigned 1024 positive avgt 10 28841.429 ? 6.294 ns/op >> LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op >> >> >> **After** >> >> LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 22666.448 ? 34.986 ns/op >> LongDivMod.testRemainderUnsigned 1024 positive avgt 10 15967.846 ? 24.805 ns/op >> LongDivMod.testRemainderUnsigned 1024 negative avgt 10 29507.865 ? 20.593 ns/op >> >> >> #### Integer >> **Before** >> >> IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23396.475 ? 24.065 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16796.796 ? 3.389 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30159.407 ? 6.716 ns/op >> >> >> **After** >> >> IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23216.710 ? 14.351 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16621.374 ? 3.203 ns/op >> IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30002.088 ? 41.212 ns/op > > And JMH result on hifive unmatched board for reference: > Before: > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 26377.726 ? 535.884 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 15 36251.980 ? 23.214 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 15 8441.098 ? 9.375 ns/op > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 28501.570 ? 14.628 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 20226.145 ? 19.112 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 37002.906 ? 34.361 ns/op > > After: > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 27140.089 ? 17.391 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 15 18772.742 ? 18.943 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 15 35780.310 ? 28.929 ns/op > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 27864.893 ? 15.551 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 19585.612 ? 17.651 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 36366.167 ? 18.933 ns/op @RealFYang Thanks for testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16394#issuecomment-1783755834 From mli at openjdk.org Sat Oct 28 09:15:40 2023 From: mli at openjdk.org (Hamlin Li) Date: Sat, 28 Oct 2023 09:15:40 GMT Subject: RFR: 8318225: RISC-V: C2 UModI In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 09:15:11 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UModI and UModL? > ( This is a quite similar patch to https://github.com/openjdk/jdk/pull/16346, which addresses UDivI and UDivL, so for the performance consideration please also check the discussion in that pr. ) > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we remainderUnsigned` and `grep -nr test/hotspot/ -we remainderUnsigned` > > ### Performance > > #### Long > **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** > > **Before** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 21222.911 ? 57.735 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 28841.429 ? 6.294 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op > > > **After** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 22666.448 ? 34.986 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 15967.846 ? 24.805 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 29507.865 ? 20.593 ns/op > > > #### Integer > **Before** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23396.475 ? 24.065 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16796.796 ? 3.389 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30159.407 ? 6.716 ns/op > > > **After** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23216.710 ? 14.351 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16621.374 ? 3.203 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30002.088 ? 41.212 ns/op Thanks all for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16394#issuecomment-1783756058 From mli at openjdk.org Sat Oct 28 09:15:41 2023 From: mli at openjdk.org (Hamlin Li) Date: Sat, 28 Oct 2023 09:15:41 GMT Subject: Integrated: 8318225: RISC-V: C2 UModI In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 09:15:11 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for UModI and UModL? > ( This is a quite similar patch to https://github.com/openjdk/jdk/pull/16346, which addresses UDivI and UDivL, so for the performance consideration please also check the discussion in that pr. ) > Thanks! > > > ## Tests > > ### Functionality > Run tests successfully found via `grep -nr test/jdk/ -we remainderUnsigned` and `grep -nr test/hotspot/ -we remainderUnsigned` > > ### Performance > > #### Long > **NOTE: for positive divisor, it's the common case; for negative divisor, it's a rare case** > > **Before** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 21222.911 ? 57.735 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 28841.429 ? 6.294 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 7733.038 ? 3.856 ns/op > > > **After** > > LongDivMod.testRemainderUnsigned 1024 mixed avgt 10 22666.448 ? 34.986 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 10 15967.846 ? 24.805 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 10 29507.865 ? 20.593 ns/op > > > #### Integer > **Before** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23396.475 ? 24.065 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16796.796 ? 3.389 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30159.407 ? 6.716 ns/op > > > **After** > > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 10 23216.710 ? 14.351 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 10 16621.374 ? 3.203 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 10 30002.088 ? 41.212 ns/op This pull request has now been integrated. Changeset: 1ec0d027 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/1ec0d02717b6be4faeb13cd0596d80eea90e81ed Stats: 46 lines in 2 files changed: 44 ins; 0 del; 2 mod 8318225: RISC-V: C2 UModI 8318226: RISC-V: C2 UModL Reviewed-by: luhenry, rehn, fyang ------------- PR: https://git.openjdk.org/jdk/pull/16394 From qamai at openjdk.org Sat Oct 28 15:29:32 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 28 Oct 2023 15:29:32 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v7] In-Reply-To: References: <12Jiijy5Y9tn-_eHyE01oDMVUomgXbEiuEgDPgY8GU0=.af4948be-6026-4b35-bcd1-b281855e3ede@github.com> Message-ID: On Thu, 26 Oct 2023 02:24:49 GMT, David Holmes wrote: >> You can bind a non-const reference to a const one but not the other way. > > Sorry I was unclear: what is the advantage of a reference here? Is it just to avoid copying @dholmes-ora Yes it helps avoid copying, especially if the copy constructor is non-trivial. And I think it is more idiomatic in C++ to use references here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1375271244 From kbarrett at openjdk.org Sun Oct 29 07:54:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 29 Oct 2023 07:54:36 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v7] In-Reply-To: References: <12Jiijy5Y9tn-_eHyE01oDMVUomgXbEiuEgDPgY8GU0=.af4948be-6026-4b35-bcd1-b281855e3ede@github.com> Message-ID: On Sat, 28 Oct 2023 15:27:20 GMT, Quan Anh Mai wrote: >> Sorry I was unclear: what is the advantage of a reference here? Is it just to avoid copying > > @dholmes-ora Yes it helps avoid copying, especially if the copy constructor is non-trivial. And I think it is more idiomatic in C++ to use references here. Using a reference here leads to unnecessary overhead when `E` is small and trivially copyable, unless the predicate function is inlined. Pass by value is better in that case. Of course, as noted above, if `E` is "expensive" to copy or non-copyable then a reference is needed. Boost has this thing called `call_traits::param_type` for this issue, but I don't recommend we copy that. Idiomatic C++ makes the entire function a template parameter, as was suggested earlier in this PR. That dodges this question entirely, leaving the parameter passing decision to the predicate function where it belongs, rather than having it imposed by GrowableArray::find. The find function just imposes the requirement that the predicate satisfies the appropriate constraints, e.g. it is callable on an element reference and returns convertible to bool. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1375386257 From kbarrett at openjdk.org Sun Oct 29 08:12:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 29 Oct 2023 08:12:36 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v7] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 10:48:01 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge remote-tracking branch 'upstream/master' into _8314502 > - first arg of `find` casted to `uint*` > - Merge branch 'master' into _8314502 > - changed the `E` param of find methods to `const E&`. > - find_from_end and its caller are also updated. > - 8314502: Change the comparator taking version of GrowableArray::find to be a template method > - 8314502: GrowableArray: Make find with comparator take template I would prefer the function template with a unary predicate approach suggested earlier by multiple commenters. Is there a reason not to just do that? ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15418#pullrequestreview-1702913003 From kbarrett at openjdk.org Sun Oct 29 08:12:37 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 29 Oct 2023 08:12:37 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v3] In-Reply-To: References: Message-ID: On Tue, 29 Aug 2023 09:29:56 GMT, Johan Sj?len wrote: > I still approve of this patch as it's better than what we had before. There are a lot of suggested improvements that can be done either in this PR or in a future RFE. `git blame` shows that this hasn't been touched since 2008, so I don't think applying all suggestions now is in any sense critical :-). Not touched since 2008 suggests to me there might not be a rush to make the change as proposed, and instead take the (I think small) additional time to do the better thing, e.g. the unary-predicate suggestion made by several folks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1784029053 From fjiang at openjdk.org Sun Oct 29 10:18:41 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Sun, 29 Oct 2023 10:18:41 GMT Subject: RFR: 8318827: RISC-V: Improve readability of fclass result testing [v3] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 15:35:49 GMT, Feilong Jiang wrote: >> Hi, please consider. >> >> Currently, we test results of `fclass` instruction with hard-coded bits which has bad readability. This patch adds an enumeration of the flcass mask bits for ease of use. >> >> Testing: >> >> - [x] tier1 with release build > > Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-fclass-mask > - remove 'fclass_' prefix > - adjust enum name style > - Add FCLASS_MASK enum for better readability tier1 tests passed, going to integrate then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16362#issuecomment-1784056411 From fjiang at openjdk.org Sun Oct 29 10:18:43 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Sun, 29 Oct 2023 10:18:43 GMT Subject: Integrated: 8318827: RISC-V: Improve readability of fclass result testing In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 14:42:07 GMT, Feilong Jiang wrote: > Hi, please consider. > > Currently, we test results of `fclass` instruction with hard-coded bits which has bad readability. This patch adds an enumeration of the flcass mask bits for ease of use. > > Testing: > > - [x] tier1 with release build This pull request has now been integrated. Changeset: db340257 Author: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/db3402577a2c14a41045753a1ffe2829a6bdda91 Stats: 32 lines in 4 files changed: 19 ins; 4 del; 9 mod 8318827: RISC-V: Improve readability of fclass result testing Reviewed-by: vkempik, luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/16362 From dnsimon at openjdk.org Sun Oct 29 11:37:52 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 29 Oct 2023 11:37:52 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v4] In-Reply-To: References: Message-ID: > This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - avoid class loading in HotSpotConstantPool.lookupField if current thread cannot call Java - removed unused method: JVMCIRuntime::get_field_by_index ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16383/files - new: https://git.openjdk.org/jdk/pull/16383/files/af42062f..a646620c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=02-03 Stats: 73 lines in 5 files changed: 12 ins; 55 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16383/head:pull/16383 PR: https://git.openjdk.org/jdk/pull/16383 From jpai at openjdk.org Sun Oct 29 14:13:31 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Sun, 29 Oct 2023 14:13:31 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: On Wed, 25 Oct 2023 21:08:01 GMT, Leonid Mesnik wrote: > The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. > Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. > > A few tests start failing. > > The test > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java > has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. > > The test > java/lang/Thread/virtual/ThreadAPI.java > tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. > > Test > test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. > > Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. Hello Leonid, in order to understand what exactly we are trying to solve here, I ran a few tests to see how things work without the changes being proposed in this PR. Here's my findings. A bit of background first. When jtreg runs, either in agent mode or othervm mode, it creates a specific thread within which the actual test code runs. In either of these modes, it uses a custom jtreg specific `ThreadGroup` instance for this thread which is running the test code. This instance of `ThreadGroup` has specific overridden implementation of the `public void uncaughtException(Thread t, Throwable e)` API which keeps track of uncaught exception that might have been thrown by any threads that could have been spawned by the test. After `start()`ing the thread which runs the test code, the jtreg framework then waits for that thread to complete and once completed (either exceptionally or normally), jtreg framework then queries a state on the custom `ThreadGroup` instance to see if any uncaught exception(s) were received during the lifetime of this thread which ran that test. If it finds any, then it marks the test as failed and reports such a failure appropriately in the test repo rt. As noted, this applies for both the agent mode and other vm mode. Some important aspects of this implementation is that: - The custom `ThreadGroup` which has the overridden implementation of the `uncaughtException(Thread t, Throwable e)` method doesn't interfere with the JVM level default exception handler. - After the thread which ran the test code completes, the decision on whether to fail or pass a test is taken by checking the custom `ThreadGroup`'s state. Once this decision is done, the decision is immediately reported in relevant ways and the test status is marked (i.e. finalized) at this point. - If this was running in agent vm mode, the agent vm mode continues to operate and isn't terminated and thus is available for subsequent tests. This point I think is important to remember for reasons that will be noted later in this comment. Now coming to the part where in https://bugs.openjdk.org/browse/JDK-8303703 we introduced a way where jtreg instead of creating a platform thread (backed by its custom implementation of the `ThreadGroup`) to run the test code, now checks the presence of a `ThreadFactory` and if present, lets it create a `Thread` which runs this test code. In the JDK repo, we have an implementation of a `ThreadFactory` which creates a virtual thread instead of platform thread. Instances of virtual threads do belong to a `ThreadGroup`. But that instance of `ThreadGroup` cannot be controlled by user code and thus it cannot have a custom overridden implementation of `uncaughtException(Thread t, Throwable e)`. Any `Thread` created within the test code will have its uncaught exception handled by the virtual thread's `ThreadGroup` whose internal implementation just delegates to the "system" `ThreadGroup` which just writes it out to the `System.err` (of course, I am not considering `Thread`s which have been set with a specific uncaught excepiton handler). Effectively, what this means is when jtreg runs a test using a platform thread and if that test creates a `Thread` which throws an uncaught exception, then those tests will be marked and reported as failed. If however, jtreg runs a test using a virtual thread, then the same test will now be marked as passed. Clearly this is not a good thing. For reference, here's the test I used to verify this behaviour: import java.util.concurrent.atomic.AtomicBoolean; /* * @test * @run main FooTest */ public class FooTest { public static void main(final String[] args) throws Exception { final Thread otherThread = new Thread(new AlwaysThrows()); otherThread.setName("foo-bar"); System.out.println("Starting " + otherThread.getName() + " thread"); otherThread.start(); try { otherThread.join(); // wait for it } catch (InterruptedException ie) { System.out.println(Thread.currentThread().getName() + " was interrupted"); } // verify that the other thread was invoked if (!AlwaysThrows.invoked.get()) { throw new AssertionError(otherThread.getName() + " wasn't run"); } System.out.println("Test execution done"); } private static class AlwaysThrows implements Runnable { private static AtomicBoolean invoked = new AtomicBoolean(); @Override public void run() { invoked.set(true); System.out.println("Throwing an exception from " + Thread.currentThread().getName()); throw new RuntimeException("Intentionally thrown exception"); } } } Running this with jtreg in agentvm mode jtreg -report:files -verbose:summary -ea -esa -agentvm -jdk:build/macosx-aarch64/images/jdk test/jdk/java/lang/FooTest.java .... #section:main ----------messages:(7/204)---------- command: main FooTest reason: User specified action: run main FooTest started: Sun Oct 29 12:40:14 IST 2023 Mode: agentvm Agent id: 2 finished: Sun Oct 29 12:40:14 IST 2023 elapsed time (seconds): 0.191 ... ----------System.out:(4/109)---------- Starting foo-bar thread Throwing an exception from foo-bar AgentVMThread was interrupted Test execution done ----------System.err:(3/35)---------- JavaTest Message: Test complete. result: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Intentionally thrown exception test result: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Intentionally thrown exception Running in othervm mode: jtreg -report:files -verbose:summary -ea -esa -othervm -jdk:build/macosx-aarch64/images/jdk test/jdk/java/lang/FooTest.java ... #section:main ----------messages:(6/192)---------- command: main FooTest reason: User specified action: run main FooTest started: Sun Oct 29 12:41:22 IST 2023 Mode: othervm finished: Sun Oct 29 12:41:22 IST 2023 elapsed time (seconds): 0.073 ----------configuration:(0/0)---------- ----------System.out:(2/59)---------- Starting foo-bar thread Throwing an exception from foo-bar ----------System.err:(4/255)---------- java.lang.RuntimeException: Intentionally thrown exception at FooTest$AlwaysThrows.run(FooTest.java:32) at java.base/java.lang.Thread.run(Thread.java:1570) STATUS:Failed.`main' threw exception: java.lang.RuntimeException: Intentionally thrown exception So in both agentvm and othervm mode it gets marked as failed. Now running the same using the virtual thread feature of jtreg: jtreg -report:files -verbose:summary -ea -esa -agentvm -jdk:build/macosx-aarch64/images/jdk -testThreadFactoryPath:.../build/macosx-aarch64/support/test/jtreg_test_thread_factory/jtregTestThreadFactory.jar -testThreadFactory:Virtual test/jdk/java/lang/FooTest.java ... #section:main ----------messages:(7/204)---------- command: main FooTest reason: User specified action: run main FooTest started: Sun Oct 29 19:12:33 IST 2023 Mode: agentvm Agent id: 1 finished: Sun Oct 29 19:12:34 IST 2023 elapsed time (seconds): 0.211 ----------System.out:(3/79)---------- Starting foo-bar thread Throwing an exception from foo-bar Test execution done ----------System.err:(6/223)---------- Exception in thread "foo-bar" java.lang.RuntimeException: Intentionally thrown exception at FooTest$AlwaysThrows.run(FooTest.java:32) at java.base/java.lang.Thread.run(Thread.java:1570) JavaTest Message: Test complete. result: Passed. Execution successful test result: Passed. Execution successful jtreg -report:files -verbose:summary -ea -esa -othervm -jdk:build/macosx-aarch64/images/jdk -testThreadFactoryPath:.../build/macosx-aarch64/support/test/jtreg_test_thread_factory/jtregTestThreadFactory.jar -testThreadFactory:Virtual test/jdk/java/lang/FooTest.java ... #section:main ----------messages:(6/191)---------- command: main FooTest reason: User specified action: run main FooTest started: Sun Oct 29 19:13:22 IST 2023 Mode: othervm finished: Sun Oct 29 19:13:22 IST 2023 elapsed time (seconds): 0.07 ----------configuration:(0/0)---------- ----------System.out:(3/79)---------- Starting foo-bar thread Throwing an exception from foo-bar Test execution done ----------System.err:(4/203)---------- Exception in thread "foo-bar" java.lang.RuntimeException: Intentionally thrown exception at FooTest$AlwaysThrows.run(FooTest.java:32) at java.base/java.lang.Thread.run(Thread.java:1570) STATUS:Passed. In both agentvm and othervm mode it is now marked as passed, which clearly is a change in behaviour and not a good thing. So coming to the proposed solution in this PR. It proposes to do a `System.exit(1)` on a uncaught exception being thrown from within `Thread` running in test code. I think doing a System.exit isn't a good thing, even if it is expected to be a temporary solution. I noted previously that when running in agentvm mode with platform thread, jtreg doesn't kill/exit the agent JVM if there's an uncaught exception from with the test code. That's one of the reasons why System.exit isn't a good idea here. The other reason is, even if we did go with this System.exit approach, I think that will abruptly bring down the test infrastructure without it having a way to properly report about this test status. I went ahead and applied this proposed patch (to `Virtual` class) locally and ran my test again and as expected it shows that that JVM was abruptly shutdown and jtreg now reports: Starting foo-bar thread Throwing an exception from foo-bar ----------System.err:(3/158)---------- java.lang.RuntimeException: Intentionally thrown exception at FooTest$AlwaysThrows.run(FooTest.java:32) at java.base/java.lang.Thread.run(Thread.java:1570) result: Error. Agent communication error: java.io.EOFException; check console log for any additional details test result: Error. Agent communication error: java.io.EOFException; check console log for any additional details I think to solve this correctly, this entire thing (perhaps even the virtual thread creation) needs to be done within the jtreg project where there's much more access to the internals of jtreg and there might be ways to detect these uncaught exceptions and report them cleanly as test failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1784125246 From gcao at openjdk.org Sun Oct 29 15:50:35 2023 From: gcao at openjdk.org (Gui Cao) Date: Sun, 29 Oct 2023 15:50:35 GMT Subject: RFR: 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit [v2] In-Reply-To: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> References: <1k89pNxSnBzSLvOh070gs1_CwdrLxu9oEjpgwPM3XFo=.341da50d-6596-4f84-9771-bf03be86f6cf@github.com> Message-ID: On Fri, 27 Oct 2023 07:30:03 GMT, Gui Cao wrote: >> Hi, The current test_bit assembly function needs to accept a temporary register because it needs one if it goes to the andi else branch. However, in this case we can actually avoid calling andi and accomplish the same thing by logically shifting to the right and testing the lowest bit. The advantage is that it makes the test_bit function much simpler. Also, to reduce the number of instructions in a given case (consider the mv function), mv actually calls the li function, which generates more than one instruction when the parameter imm exceeds the 32-bit range. >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L2009-L2017 >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp#L730 >> https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L804-L840 >> >> ### Testing: >> qemu 8.1.50: >> - [x] Tier1 tests (fastdebug) >> - [x] Tier2 tests (release) >> - [x] Tier3 tests (release) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Use and_imm12 to replace andi in test_bit tier1-3 test passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16391#issuecomment-1784149248 From dnsimon at openjdk.org Sun Oct 29 16:03:54 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 29 Oct 2023 16:03:54 GMT Subject: RFR: 8318982: improve Exceptions::special_exception Message-ID: This PR consolidates the 2 almost identical versions of `Exceptions::special_exception` into a single method. If a special exception is thrown and `-Xlog:exceptions` is enabled, a log message is emitted and it indicates the special handling. Here's an example in the output from running `compiler/linkage/LinkageErrors.java` with `-Xlog:exceptions -Xcomp`: [0.194s][info][exceptions] Exception (java.util.Set, java.lang.String, java.util.Set, boolean)' (java.lang.module.ModuleDescriptor$1 and java.lang.module.ModuleDescriptor$Exports are in module java.base of loader 'bootstrap')> (0x0000000000000000) thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 591] for thread 0x000000011e18c600 thread cannot call Java, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} The motivation for this change was work on [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) where it's useful to know when exceptions are thrown on a CompilerThread. ------------- Commit messages: - improve Exceptions::special_exception Changes: https://git.openjdk.org/jdk/pull/16401/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16401&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318982 Stats: 45 lines in 2 files changed: 16 ins; 22 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16401.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16401/head:pull/16401 PR: https://git.openjdk.org/jdk/pull/16401 From dnsimon at openjdk.org Sun Oct 29 16:03:55 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 29 Oct 2023 16:03:55 GMT Subject: RFR: 8318982: improve Exceptions::special_exception In-Reply-To: References: Message-ID: <8fYpTiffRL76fPFWSk34cd2n15q7083XyGMiFonn_Bc=.879fc7d1-b2de-4c1c-8400-3cc91140a44b@github.com> On Fri, 27 Oct 2023 14:03:45 GMT, Doug Simon wrote: > This PR consolidates the 2 almost identical versions of `Exceptions::special_exception` into a single method. > If a special exception is thrown and `-Xlog:exceptions` is enabled, a log message is emitted and it indicates the special handling. > > Here's an example in the output from running `compiler/linkage/LinkageErrors.java` with `-Xlog:exceptions -Xcomp`: > > [0.194s][info][exceptions] Exception (java.util.Set, java.lang.String, java.util.Set, boolean)' (java.lang.module.ModuleDescriptor$1 and java.lang.module.ModuleDescriptor$Exports are in module java.base of loader 'bootstrap')> (0x0000000000000000) > thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 591] > for thread 0x000000011e18c600 > thread cannot call Java, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} > > > The motivation for this change was work on [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) where it's useful to know when exceptions are thrown on a CompilerThread. src/hotspot/share/utilities/exceptions.cpp line 111: > 109: #endif // ASSERT > 110: > 111: if (!thread->can_call_java()) { If this method was called from `Exceptions::_throw`, a log message will have already been emitted. I think the duplication is acceptable for these special exceptions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16401#discussion_r1374651692 From dnsimon at openjdk.org Sun Oct 29 20:24:57 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 29 Oct 2023 20:24:57 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v5] In-Reply-To: References: Message-ID: > This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: unconditionally allow class loading in HotSpotConstantPool.callSystemExit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16383/files - new: https://git.openjdk.org/jdk/pull/16383/files/a646620c..f47527a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16383/head:pull/16383 PR: https://git.openjdk.org/jdk/pull/16383 From dnsimon at openjdk.org Sun Oct 29 20:39:47 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 29 Oct 2023 20:39:47 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: References: Message-ID: > This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: allow JavaCalls in HotSpotConstantPool.callSystemExit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16383/files - new: https://git.openjdk.org/jdk/pull/16383/files/f47527a7..b7181d70 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=04-05 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16383/head:pull/16383 PR: https://git.openjdk.org/jdk/pull/16383 From egahlin at openjdk.org Sun Oct 29 20:45:38 2023 From: egahlin at openjdk.org (Erik Gahlin) Date: Sun, 29 Oct 2023 20:45:38 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v3] In-Reply-To: References: Message-ID: On Sat, 28 Oct 2023 01:52:08 GMT, Mat Carter wrote: >> Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) >> >> Passes tier1 on linux (x86) and mac (aarch64) > > Mat Carter has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments src/hotspot/share/jfr/metadata/metadata.xml line 853: > 851: > 852: > 853: Could we use other terms then ingress and egress? Something that is more in general use, src/hotspot/share/jfr/metadata/metadata.xml line 855: > 853: > 854: > 855: The label could be more clarifying? Queue Size? src/hotspot/share/jfr/metadata/metadata.xml line 856: > 854: > 855: > 856: peakQueueSize? src/hotspot/share/jfr/metadata/metadata.xml line 861: > 859: > 860: > 861: The label should be short and use headline-style capitalization, how about "Compiler Thread Count"? https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Label.html src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp line 53: > 51: return 0; > 52: } > 53: return ((current - old) * NANOSECS_PER_SEC) / interval.nanoseconds(); Shouldn't it be ticks per second here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1375505798 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1375508747 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1375508798 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1375505254 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1375508417 From gcao at openjdk.org Mon Oct 30 00:33:40 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 30 Oct 2023 00:33:40 GMT Subject: Integrated: 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit In-Reply-To: References: Message-ID: <5czTa7_o_Iq3UuCljP8I2RDwls6wsrMGDtLdp9KapXQ=.a4cf8b63-5e02-4f04-8902-380d0f7557d8@github.com> On Fri, 27 Oct 2023 06:52:54 GMT, Gui Cao wrote: > Hi, The current test_bit assembly function needs to accept a temporary register because it needs one if it goes to the andi else branch. However, in this case we can actually avoid calling andi and accomplish the same thing by logically shifting to the right and testing the lowest bit. The advantage is that it makes the test_bit function much simpler. Also, to reduce the number of instructions in a given case (consider the mv function), mv actually calls the li function, which generates more than one instruction when the parameter imm exceeds the 32-bit range. > https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L2009-L2017 > https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp#L730 > https://github.com/openjdk/jdk/blob/9123961aaa47aa58ec436640590d2cceedb8cbb1/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L804-L840 > > ### Testing: > qemu 8.1.50: > - [x] Tier1 tests (fastdebug) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) This pull request has now been integrated. Changeset: 988e1dfe Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/988e1dfe6ec9b5e77d2e8a78eb792a127c6fe907 Stats: 10 lines in 3 files changed: 6 ins; 0 del; 4 mod 8318953: RISC-V: Small refactoring for MacroAssembler::test_bit Reviewed-by: fyang, fjiang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/16391 From duke at openjdk.org Mon Oct 30 03:35:12 2023 From: duke at openjdk.org (Liming Liu) Date: Mon, 30 Oct 2023 03:35:12 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v11] In-Reply-To: References: Message-ID: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Liming Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: - Merge branch 'openjdk:master' into populate - Remove the unneccessary class - Use address to find the mapping of the heap - Make the test use a smaller heap and exit properly - Make the jtreg test check the usage of THP - Untabify - Improve the use of madvise for pretouching: 1. use madvise when THP is actually used; 2. remove the need of modifing page_size; 3. log the failure of madvise rather than warn. - Cuddle ptr-operators in pretouch_memory_common - Use pointer_delta to calculate the distance - Add a sanity check for MADV_POPULATE_WRITE - ... and 9 more: https://git.openjdk.org/jdk/compare/ee5dd8f6...f3503519 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15781/files - new: https://git.openjdk.org/jdk/pull/15781/files/b33edafd..f3503519 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=09-10 Stats: 83846 lines in 2680 files changed: 49990 ins; 19444 del; 14412 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From amitkumar at openjdk.org Mon Oct 30 03:54:45 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 30 Oct 2023 03:54:45 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v5] In-Reply-To: <9upaBNtcWpIBCafV9sVSVV3f7ZYGnkbIC1Zl5uZ8NTA=.4e617a4d-ebbd-48f8-adff-364d8605d21c@github.com> References: <9upaBNtcWpIBCafV9sVSVV3f7ZYGnkbIC1Zl5uZ8NTA=.4e617a4d-ebbd-48f8-adff-364d8605d21c@github.com> Message-ID: On Fri, 27 Oct 2023 13:56:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen and Fei comments src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3292: > 3290: > 3291: void TemplateTable::prepare_invoke(Register recv, // if caller wants to see it > 3292: Register flags // if caller wants to test it Hi All, I think `flags` argument here is unnecessary, as it is not being in this method. similarly `invokestatic` & `invokespecial` method is getting flags value from `load_resolved_method_entry_special_or_static` but it's not being used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1375640947 From duke at openjdk.org Mon Oct 30 06:20:07 2023 From: duke at openjdk.org (Liming Liu) Date: Mon, 30 Oct 2023 06:20:07 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v12] In-Reply-To: References: Message-ID: > As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). > > Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: > > > > > > > > > > > >
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Update the name of the method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15781/files - new: https://git.openjdk.org/jdk/pull/15781/files/f3503519..e29ead09 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15781&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15781/head:pull/15781 PR: https://git.openjdk.org/jdk/pull/15781 From stuefe at openjdk.org Mon Oct 30 06:49:36 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 30 Oct 2023 06:49:36 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v12] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 06:20:07 GMT, Liming Liu wrote: >> As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). >> >> Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: >> >> >> >> >> >> >> >> >> >> >> >>
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
> > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Update the name of the method I don't have time to look at this for now and step back to wait for others. But I don't want to hold up this patch - if there are enough reviewers that ok it, please don't wait for me. As I wrote earlier, the patch itself is mechanically fine. The test has aspects I don't understand, and I am worried about concurrent usage of the about-to-be-pretouched area by other threads. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1784575980 From dholmes at openjdk.org Mon Oct 30 07:18:32 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 30 Oct 2023 07:18:32 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 05:30:41 GMT, Julian Waters wrote: > Side note: Should the Style Guide only permit noreturn for void methods? It's Undefined Behaviour when applied to something that returns int for instance, such as exit_process_or_thread here (which I had to refactor to void) I think it is implied that attributes should only be used in a way that is valid. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1784605126 From dholmes at openjdk.org Mon Oct 30 07:24:35 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 30 Oct 2023 07:24:35 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 04:41:49 GMT, Julian Waters wrote: >> On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. >> >> The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. >> >> Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method >> >> All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. >> >> This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert to exit_code in os_windows.cpp I think the refactoring could have been rolled back further, but okay. Needs second review. Thanks src/hotspot/os/windows/os_windows.cpp line 511: > 509: // Wrapper around _endthreadex(), exit() and _exit() > 510: [[noreturn]] > 511: static void exit_process_or_thread(Ept what, int code); Nit: code -> exit_code ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16303#pullrequestreview-1703400865 PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1375761562 From jwaters at openjdk.org Mon Oct 30 07:31:32 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 30 Oct 2023 07:31:32 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: <35KjjHSH766W4uq4Z2OfH9KO13rpv01JJpVYTPs4zck=.427e4ff4-18ca-446d-9596-aea97441a910@github.com> On Fri, 27 Oct 2023 04:41:49 GMT, Julian Waters wrote: >> On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. >> >> The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. >> >> Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method >> >> All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. >> >> This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert to exit_code in os_windows.cpp Paging for @kimbarrett, since the original issue was created by him ------------- PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1784621390 From dholmes at openjdk.org Mon Oct 30 07:43:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 30 Oct 2023 07:43:31 GMT Subject: RFR: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 15:40:58 GMT, Matias Saavedra Silva wrote: > Calls in instanceKlass.cpp and unsafe.cpp try to call an atomic load on method calls that could return nullptr. This patch ensures that nullptr is not passed into the load. Verified with tier1-5 tests. Changes requested by dholmes (Reviewer). src/hotspot/share/oops/instanceKlass.cpp line 2506: > 2504: // Use load_acquire due to competing with inserts > 2505: InstanceKlass* volatile* iklass = adr_implementor(); > 2506: InstanceKlass* impl = (iklass != nullptr) ? Atomic::load_acquire(iklass) : nullptr; This looks very klunky as we do a raw read, check it for null then re-read with acquire semantics. Cleaner IMO to do a raw read followed by a raw `OrderAccess::acquire()` and no need for a null check. src/hotspot/share/prims/unsafe.cpp line 234: > 232: T get_volatile() { > 233: GuardUnsafeAccess guard(_thread); > 234: assert(addr() != nullptr, "Attempting to load from null pointer"); I don't see how `addr()` can be null unless `_obj` was null - which would be a usage error. So asserting `_obj != nullptr` in the constructor would seem better to me. I mean no point checking `addr()` here but not in other functions where we dereference it! ------------- PR Review: https://git.openjdk.org/jdk/pull/16405#pullrequestreview-1703425012 PR Review Comment: https://git.openjdk.org/jdk/pull/16405#discussion_r1375776385 PR Review Comment: https://git.openjdk.org/jdk/pull/16405#discussion_r1375779895 From dholmes at openjdk.org Mon Oct 30 08:01:34 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 30 Oct 2023 08:01:34 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: References: Message-ID: On Sun, 29 Oct 2023 20:39:47 GMT, Doug Simon wrote: >> This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > allow JavaCalls in HotSpotConstantPool.callSystemExit Can't comment on all the details of the changes, but I don't see anything untoward in general. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16383#pullrequestreview-1703458518 From kbarrett at openjdk.org Mon Oct 30 08:08:41 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 30 Oct 2023 08:08:41 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 04:41:49 GMT, Julian Waters wrote: >> On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. >> >> The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. >> >> Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method >> >> All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. >> >> This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert to exit_code in os_windows.cpp Changes requested by kbarrett (Reviewer). src/hotspot/os/windows/os_windows.cpp line 510: > 508: enum Ept { EPT_THREAD, EPT_PROCESS, EPT_PROCESS_DIE }; > 509: // Wrapper around _endthreadex(), exit() and _exit() > 510: [[noreturn]] We (currently) have ATTRIBUTE_NORETURN. However, this is Windows/MSVC-specific code, so I'm okay with using the portable attribute directly, since we require a sufficiently recent version of MSVC to have this. Note, however, that this won't work with older versions of clang, which is the raison d'etre for the attribute macro. Just so you know, since you've been trying to build for Windows with other compilers. src/hotspot/os/windows/os_windows.cpp line 515: > 513: // The handler passed to _beginthreadex(). > 514: // Called with the associated Thread* as the argument. > 515: static unsigned __stdcall thread_native_entry(void*); This forward declaration is being added for a function that is defined a few lines later, with no intervening references. That seems pointless. src/hotspot/os/windows/os_windows.cpp line 571: > 569: // let it proceed to exit normally > 570: exit_process_or_thread(EPT_THREAD, res); > 571: return res; exit_process_or_thread is now marked noreturn. So it looks like thread_native_entry never returns either now, making the return of res (and res itself) effectively unused, and the comment obsolete. I don't know what the implications of marking thread_native_entry noreturn might be. Passing it as a parameter to _beginthreadex might be a problem. But it kind of seems like thread_native_entry shouldn't be calling exit_process_or_thread? The semantics of the "start_address" function for _beginthreadex are rather lightly documented so far as I could find. ------------- PR Review: https://git.openjdk.org/jdk/pull/16303#pullrequestreview-1703425611 PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1375783845 PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1375780181 PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1375776786 From dholmes at openjdk.org Mon Oct 30 08:10:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 30 Oct 2023 08:10:31 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: On Sun, 29 Oct 2023 14:10:32 GMT, Jaikiran Pai wrote: >> The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. >> Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. >> >> A few tests start failing. >> >> The test >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java >> has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. >> >> The test >> java/lang/Thread/virtual/ThreadAPI.java >> tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. >> >> Test >> test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. >> >> Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. > > Hello Leonid, in order to understand what exactly we are trying to solve here, I ran a few tests to see how things work without the changes being proposed in this PR. Here's my findings. > > A bit of background first. When jtreg runs, either in agent mode or othervm mode, it creates a specific thread within which the actual test code runs. In either of these modes, it uses a custom jtreg specific `ThreadGroup` instance for this thread which is running the test code. This instance of `ThreadGroup` has specific overridden implementation of the `public void uncaughtException(Thread t, Throwable e)` API which keeps track of uncaught exception that might have been thrown by any threads that could have been spawned by the test. After `start()`ing the thread which runs the test code, the jtreg framework then waits for that thread to complete and once completed (either exceptionally or normally), jtreg framework then queries a state on the custom `ThreadGroup` instance to see if any uncaught exception(s) were received during the lifetime of this thread which ran that test. If it finds any, then it marks the test as failed and reports such a failure appropriately in the test re port. As noted, this applies for both the agent mode and other vm mode. Some important aspects of this implementation is that: > > - The custom `ThreadGroup` which has the overridden implementation of the `uncaughtException(Thread t, Throwable e)` method doesn't interfere with the JVM level default exception handler. > > - After the thread which ran the test code completes, the decision on whether to fail or pass a test is taken by checking the custom `ThreadGroup`'s state. Once this decision is done, the decision is immediately reported in relevant ways and the test status is marked (i.e. finalized) at this point. > > - If this was running in agent vm mode, the agent vm mode continues to operate and isn't terminated and thus is available for subsequent tests. This point I think is important to remember for reasons that will be noted later in this comment. > > Now coming to the part where in https://bugs.openjdk.org/browse/JDK-8303703 we introduced a way where jtreg instead of creating a platform thread (backed by its custom implementation of the `ThreadGroup`) to run the test code, now checks the presence of a `ThreadFactory` and if present, lets it create a `Thread` which runs this test code. In the JDK repo, we have an implementation of a `ThreadFactory` which creates a virtual thread instead of platform... thanks for that detailed analysis @jaikiran ! I'm very uncomfortable with what is proposed in this PR. I would hope that when jtreg uses the virtual thread factory then we could provide a wrapper that will execute the real task in a try/catch and allow any uncaught exceptions to be processed in the way jtreg needs them to be. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1784672682 From dholmes at openjdk.org Mon Oct 30 08:28:32 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 30 Oct 2023 08:28:32 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 07:36:43 GMT, Kim Barrett wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert to exit_code in os_windows.cpp > > src/hotspot/os/windows/os_windows.cpp line 571: > >> 569: // let it proceed to exit normally >> 570: exit_process_or_thread(EPT_THREAD, res); >> 571: return res; > > exit_process_or_thread is now marked noreturn. So it looks like thread_native_entry never returns either > now, making the return of res (and res itself) effectively unused, and the comment obsolete. > > I don't know what the implications of marking thread_native_entry noreturn might be. Passing it as a > parameter to _beginthreadex might be a problem. But it kind of seems like thread_native_entry shouldn't > be calling exit_process_or_thread? The semantics of the "start_address" function for _beginthreadex > are rather lightly documented so far as I could find. It has to call `exit_process_or_thread` because of the exit bug. That means we will directly call `_endthreadex` instead of implicitly doing that by having the entry function return. The `return res` is redundant but keeps the compiler happy given the required declaration of the entry method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1375837737 From adinn at openjdk.org Mon Oct 30 08:29:45 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 30 Oct 2023 08:29:45 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v5] In-Reply-To: References: Message-ID: <_YII8Xum4F3dlHGHDsgZya7Bq-UJXC0wtQQo9Bdozno=.d76094ae-34ad-43a5-827c-ec527629596e@github.com> On Wed, 18 Oct 2023 07:36:49 GMT, Andrew Dinn wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Coleen and Fei comments > > src/hotspot/share/interpreter/bytecodeTracer.cpp line 527: > >> 525: assert(is_linked(), "invokehandle is only in rewritten methods"); >> 526: assert(cpcache_index >= 0, "must be"); >> 527: print_field_or_method(cp_index, st); > > I don't understand this code very well but it looks like this change means `print_field_or_method` gets called twice when we have an `invokehandle` bytecode, passing `cp_index` both times. Is that intended? This comment is still left to address, Matias. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1375838835 From eosterlund at openjdk.org Mon Oct 30 09:23:34 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 30 Oct 2023 09:23:34 GMT Subject: RFR: 8310239: Add missing cross modifying fence in nmethod entry barriers [v2] In-Reply-To: <59q2I50auQj6g_46nsbgF8xPfVC-ahFPhXfz-C6jE8Q=.c5f3fdc2-b066-4d83-8f80-2c2ce6a1b888@github.com> References: <59q2I50auQj6g_46nsbgF8xPfVC-ahFPhXfz-C6jE8Q=.c5f3fdc2-b066-4d83-8f80-2c2ce6a1b888@github.com> Message-ID: On Fri, 20 Oct 2023 15:31:38 GMT, Andrew Haley wrote: >>> > The assumption is that if the nmethod immediate oops are patched first, and the guard value (immediate of the cmp instruction) is patched after, then if a thread sees the new cmp instruction, it will also see the new oop immediates. And that is indeed what the "asynchronous" cross modifying code description ensures will work in the AMD APM. So that all checks out. >>> >>> I guess this is a separate issue from this patch, but where does the AMD APM guarantee that? >> >> In the APM, volume 2 (cf. https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf), section 7.6.1 under "Asynchronous modification", it says "" >> >>> > The assumption is that if the nmethod immediate oops are patched first, and the guard value (immediate of the cmp instruction) is patched after, then if a thread sees the new cmp instruction, it will also see the new oop immediates. And that is indeed what the "asynchronous" cross modifying code description ensures will work in the AMD APM. So that all checks out. >>> >>> I guess this is a separate issue from this patch, but where does the AMD APM guarantee that? >> >> Hmm, it used to be in Volume 2, section 7.6.1. But in the latest revision, 3.41 from this summer, I can't find it any more. Strange. > >> > >> > I guess this is a separate issue from this patch, but where does the AMD APM guarantee that? >> >> Hmm, it used to be in Volume 2, section 7.6.1. But in the latest revision, 3.41 from this summer, I can't find it any more. Strange. > > I wonder if they may be making it up as they go along. Thanks for the reviews, @theRealAph, @dean-long and @xmas92! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14543#issuecomment-1784787345 From ayang at openjdk.org Mon Oct 30 09:27:37 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 30 Oct 2023 09:27:37 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 09:56:57 GMT, Thomas Schatzl wrote: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 444: > 442: } > 443: > 444: if (succeeded) { Can these two `if`s can be merged into one, `if (succeeded) { return result; }`? src/hotspot/share/gc/g1/g1EvacFailureRegions.hpp line 36: > 34: class HeapRegionClaimer; > 35: > 36: // This class records for every region on the heap whether it has to be retained I feel the term "retain" has two diff meanings in this PR: 1. retain == pinned or evac-fail 2. should_retain_evac_failed_region 1 is during scavenging while 2 is after scavenging. src/hotspot/share/gc/g1/g1FullGCPrepareTask.inline.hpp line 82: > 80: } else { > 81: assert(hr->containing_set() == nullptr, "already cleared by PrepareRegionsClosure"); > 82: if (hr->has_pinned_objects() || This `do_heap_region` method is hard to follow; there multiple occurrences of same predicates. I wonder if one can reorganize these if-else a bit. Inlining `should_compact` should make all `if` on the same level at least. src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 494: > 492: // undo_allocation() method too. > 493: undo_allocation(dest_attr, obj_ptr, word_sz, node_index); > 494: return handle_evacuation_failure_par(old, old_mark, word_sz, true /* cause_pinned */); Why is this `cause_pinned == true`? This obj can be arbitrary, not necessarily type-array. src/hotspot/share/gc/g1/heapRegion.cpp line 734: > 732: // ranges passed in here corresponding to the space between live objects, it is > 733: // possible that there is a pinned object that is not any more referenced by > 734: // Java code (only by native). Can such obj becomes referenced by java again later on? IOW, a pointer passed from native to java. src/hotspot/share/gc/g1/heapRegion.inline.hpp line 262: > 260: } > 261: > 262: inline bool HeapRegion::can_reclaim() const { I'd suggest inline this method to callers, because "can reclaim" is sth caller context sensitive. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1374835226 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1375035713 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1375023324 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1375009476 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1375304500 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1375030685 From kbarrett at openjdk.org Mon Oct 30 09:34:33 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 30 Oct 2023 09:34:33 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: <6IBb6-3QwTjIGZCq65Z_kbbebtxfEGUZoouJ2Cf1iWQ=.a3ea9a02-2b67-4864-a3de-edaa8117f94e@github.com> On Fri, 27 Oct 2023 04:41:49 GMT, Julian Waters wrote: >> On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. >> >> The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. >> >> Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method >> >> All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. >> >> This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert to exit_code in os_windows.cpp To speed along the process, I'm approving this even though I still want the pointless forward declaration of `thread_native_entry` removed. I'll trust you to do that without needing a re-review by me. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16303#pullrequestreview-1703650587 From kbarrett at openjdk.org Mon Oct 30 09:34:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 30 Oct 2023 09:34:36 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 08:26:09 GMT, David Holmes wrote: >> src/hotspot/os/windows/os_windows.cpp line 571: >> >>> 569: // let it proceed to exit normally >>> 570: exit_process_or_thread(EPT_THREAD, res); >>> 571: return res; >> >> exit_process_or_thread is now marked noreturn. So it looks like thread_native_entry never returns either >> now, making the return of res (and res itself) effectively unused, and the comment obsolete. >> >> I don't know what the implications of marking thread_native_entry noreturn might be. Passing it as a >> parameter to _beginthreadex might be a problem. But it kind of seems like thread_native_entry shouldn't >> be calling exit_process_or_thread? The semantics of the "start_address" function for _beginthreadex >> are rather lightly documented so far as I could find. > > It has to call `exit_process_or_thread` because of the exit bug. That means we will directly call `_endthreadex` instead of implicitly doing that by having the entry function return. The `return res` is redundant but keeps the compiler happy given the required declaration of the entry method. I've been perusing the exit bug info, and ugh! But okay. The `return res` might *not* make the compiler happy anymore, and might instead be cause for complaint by the compiler, now that `exit_process_or_thread` is marked noreturn. I guess use whichever form is needed to keep the compiler from complaining... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1375913280 From rkennke at openjdk.org Mon Oct 30 10:42:56 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 30 Oct 2023 10:42:56 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v61] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 10:07:34 GMT, Jorn Vernee wrote: > Just to check: object headers of arrays are still aligned to 8-bytes, and only the elements alignment of e.g. a `byte[]` is 4? If that's the case, than someone can always get back to 8-byte alignment simply by adding 4 bytes of offset when doing the access. Probably not great to rely on that though. Object headers are always 8-byte-aligned and I have no plans to change that. Only elements of byte, boolean, short, char, int, float and compressed-oops elements will be 4-byte aligned (with -UseCompressedClassPointers and later with +UseCompactHeaders). long and double and uncompressed oops elements will remain 8-byte-aligned. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1784913173 From jwaters at openjdk.org Mon Oct 30 10:44:31 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 30 Oct 2023 10:44:31 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: <6IBb6-3QwTjIGZCq65Z_kbbebtxfEGUZoouJ2Cf1iWQ=.a3ea9a02-2b67-4864-a3de-edaa8117f94e@github.com> References: <6IBb6-3QwTjIGZCq65Z_kbbebtxfEGUZoouJ2Cf1iWQ=.a3ea9a02-2b67-4864-a3de-edaa8117f94e@github.com> Message-ID: On Mon, 30 Oct 2023 09:32:19 GMT, Kim Barrett wrote: > To speed along the process, I'm approving this even though I still want the pointless forward declaration of `thread_native_entry` removed. I'll trust you to do that without needing a re-review by me. I'll do just that, but I have a couple of questions: What about the comment accompanying the declaration? And is os::infinite_sleep appropriate for this purpose? I couldn't seem to find anything else to make exit_process_or_thread nor raise_fail_fast noreturn qualified ------------- PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1784917115 From duke at openjdk.org Mon Oct 30 11:23:33 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Mon, 30 Oct 2023 11:23:33 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v5] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 13:57:53 GMT, Thomas Obermeier wrote: >> MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. >> >> We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. >> >> As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. > > Thomas Obermeier has updated the pull request incrementally with one additional commit since the last revision: > > Update mallocHeader.inline.hpp - revert obsolete copyright change tests passed in dbg build; opt build still faces an SIGILL error in GTestWrapper when executing AsyncLogTest, which to my understanding is unrelated; therefore, I created created https://bugs.openjdk.org/browse/JDK-8319104 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1784979361 From rkennke at openjdk.org Mon Oct 30 11:29:15 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 30 Oct 2023 11:29:15 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v62] In-Reply-To: References: Message-ID: <4nY2b4KFUup0hXZTxltRg1aGg0-hYJDspAdrOhnZkZA=.ee307227-ee94-40a0-947d-3f8263da9e38@github.com> > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Update copyright headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11044/files - new: https://git.openjdk.org/jdk/pull/11044/files/7eaca124..89af2b9f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=61 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=60-61 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From ogillespie at openjdk.org Mon Oct 30 12:52:47 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 30 Oct 2023 12:52:47 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v2] In-Reply-To: References: Message-ID: > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Remove assertions in queue destructor The queue destructor requires the queue to be empty, but this is not easy to achieve in cases like ours with a class-level queue. Not all uses need to have an empty queue at shutdown. This could also be made optional at instantiation time if we want to keep the assertions for other users (at the moment only g1 dirty card queue set). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/a6fc2aa4..98cddcad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=00-01 Stats: 9 lines in 2 files changed: 0 ins; 8 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From tschatzl at openjdk.org Mon Oct 30 12:53:36 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Oct 2023 12:53:36 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 20:18:24 GMT, Albert Mingkun Yang wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 494: > >> 492: // undo_allocation() method too. >> 493: undo_allocation(dest_attr, obj_ptr, word_sz, node_index); >> 494: return handle_evacuation_failure_par(old, old_mark, word_sz, true /* cause_pinned */); > > Why is this `cause_pinned == true`? This obj can be arbitrary, not necessarily type-array. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1376162391 From jwaters at openjdk.org Mon Oct 30 12:56:36 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 30 Oct 2023 12:56:36 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 07:40:28 GMT, Kim Barrett wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert to exit_code in os_windows.cpp > > src/hotspot/os/windows/os_windows.cpp line 515: > >> 513: // The handler passed to _beginthreadex(). >> 514: // Called with the associated Thread* as the argument. >> 515: static unsigned __stdcall thread_native_entry(void*); > > This forward declaration is being added for a function that is defined a few lines later, with no intervening > references. That seems pointless. I understand, but what about the useful (at least to me) comment? Should I move it to the definition of the method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1376167186 From jwaters at openjdk.org Mon Oct 30 13:02:39 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 30 Oct 2023 13:02:39 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 09:29:28 GMT, Kim Barrett wrote: >> It has to call `exit_process_or_thread` because of the exit bug. That means we will directly call `_endthreadex` instead of implicitly doing that by having the entry function return. The `return res` is redundant but keeps the compiler happy given the required declaration of the entry method. > > I've been perusing the exit bug info, and ugh! But okay. The `return res` might *not* make the compiler happy > anymore, and might instead be cause for complaint by the compiler, now that `exit_process_or_thread` is marked > noreturn. I guess use whichever form is needed to keep the compiler from complaining... I'm not too sure what to make of this, since I don't know what the exit bug is about (Also, the return res doesn't cause an issue on MSVC under any circumstance, and would only do so on gcc if thread_native_entry was marked noreturn, which it isn't). I simply changed the return value to keep the original semantics of the code unchanged, I guess I should take this to mean I should keep my current changes as is? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1376175889 From jwaters at openjdk.org Mon Oct 30 13:02:37 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 30 Oct 2023 13:02:37 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 07:44:20 GMT, Kim Barrett wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert to exit_code in os_windows.cpp > > src/hotspot/os/windows/os_windows.cpp line 510: > >> 508: enum Ept { EPT_THREAD, EPT_PROCESS, EPT_PROCESS_DIE }; >> 509: // Wrapper around _endthreadex(), exit() and _exit() >> 510: [[noreturn]] > > We (currently) have ATTRIBUTE_NORETURN. However, this is Windows/MSVC-specific code, so I'm okay > with using the portable attribute directly, since we require a sufficiently recent version of MSVC to have this. > Note, however, that this won't work with older versions of clang, which is the raison d'etre for the attribute > macro. Just so you know, since you've been trying to build for Windows with other compilers. I'm not affected by this change, since I only compile using gcc, but Daniel, who uses clang, might be. I'll check with him to see ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1376178299 From jwaters at openjdk.org Mon Oct 30 13:03:36 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 30 Oct 2023 13:03:36 GMT Subject: RFR: 8314488: Compile the JDK as C++17 In-Reply-To: References: Message-ID: <6-flXjSLU0oyiXbsL-iyLkgSc87DH5E2iMJp5tUxp4s=.7aeaf8c6-fafa-4f34-b961-a70863e103c7@github.com> On Mon, 24 Jul 2023 01:41:16 GMT, Julian Waters wrote: > Implementation of [JEP draft: Compile the JDK as C++17](https://bugs.openjdk.org/browse/JDK-8310260) Keeping alive ------------- PR Comment: https://git.openjdk.org/jdk/pull/14988#issuecomment-1785145814 From jwaters at openjdk.org Mon Oct 30 13:18:32 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 30 Oct 2023 13:18:32 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v3] In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 04:24:54 GMT, David Holmes wrote: > > I figured that a little refactoring of scope (from os::win32 to os_windows.cpp file scope) could help here > > The very loose, not well followed, historical convention here is that the Windows specific os class contains the methods defined by os.hpp, while the implementation details go into the os::win32 class. In many cases the choice is somewhat arbitrary, but there should still be a good reason to move something around. Unnecessary refactoring just makes the PR harder to understand. Just noticed something in this review comment - I didn't move the method into the os class, I moved it into os_windows.cpp file scope ------------- PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1785173490 From tschatzl at openjdk.org Mon Oct 30 13:19:39 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Oct 2023 13:19:39 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 In-Reply-To: References: Message-ID: <7hjNlLCs4_Vwl-pz8bmXSCZV3EakunUIFNt8Q6yDIcs=.fe32d0f5-ba84-4b2a-a629-6491b99bc627@github.com> On Fri, 27 Oct 2023 20:38:19 GMT, Albert Mingkun Yang wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > src/hotspot/share/gc/g1/g1FullGCPrepareTask.inline.hpp line 82: > >> 80: } else { >> 81: assert(hr->containing_set() == nullptr, "already cleared by PrepareRegionsClosure"); >> 82: if (hr->has_pinned_objects() || > > This `do_heap_region` method is hard to follow; there multiple occurrences of same predicates. I wonder if one can reorganize these if-else a bit. Inlining `should_compact` should make all `if` on the same level at least. Apart from having an early return in the `should_compact`-if, one option would be making `has_pinned_objects()` more clever by stating something like: bool has_pinned_objects() const { return pinned_count() > 0 || (is_continues_humongous() && humongous_start_region()->pinned_count() > 0); } Then this predicate would get shorter. Or add a local helper for that (as suggested in the next commit). Either is fine with me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1376208039 From tschatzl at openjdk.org Mon Oct 30 13:27:32 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Oct 2023 13:27:32 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 20:53:29 GMT, Albert Mingkun Yang wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > src/hotspot/share/gc/g1/g1EvacFailureRegions.hpp line 36: > >> 34: class HeapRegionClaimer; >> 35: >> 36: // This class records for every region on the heap whether it has to be retained > > I feel the term "retain" has two diff meanings in this PR: > > 1. retain == pinned or evac-fail > 2. should_retain_evac_failed_region > > 1 is during scavenging while 2 is after scavenging. Maybe rename `should_retain_evac_failed_region` to `should_keep_retained_region[_in_old]` or something? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1376219339 From tschatzl at openjdk.org Mon Oct 30 13:50:14 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Oct 2023 13:50:14 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v2] In-Reply-To: References: Message-ID: <3UQDBc7erChTvsbzlTxJNd5umBOxAVkSCiooAAJLGUo=.6a670f22-bb90-402e-9a85-e23d75056704@github.com> > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/b882dd60..e6646399 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=00-01 Stats: 69 lines in 8 files changed: 12 ins; 25 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From tschatzl at openjdk.org Mon Oct 30 13:50:15 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Oct 2023 13:50:15 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v2] In-Reply-To: References: Message-ID: On Sat, 28 Oct 2023 18:32:56 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> ayang review1 > > src/hotspot/share/gc/g1/heapRegion.cpp line 734: > >> 732: // ranges passed in here corresponding to the space between live objects, it is >> 733: // possible that there is a pinned object that is not any more referenced by >> 734: // Java code (only by native). > > Can such obj becomes referenced by java again later on? IOW, a pointer passed from native to java. I do not think so. I will do some more testing about this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1376243855 From ayang at openjdk.org Mon Oct 30 13:50:14 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 30 Oct 2023 13:50:14 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v2] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 13:25:07 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1EvacFailureRegions.hpp line 36: >> >>> 34: class HeapRegionClaimer; >>> 35: >>> 36: // This class records for every region on the heap whether it has to be retained >> >> I feel the term "retain" has two diff meanings in this PR: >> >> 1. retain == pinned or evac-fail >> 2. should_retain_evac_failed_region >> >> 1 is during scavenging while 2 is after scavenging. > > Maybe rename `should_retain_evac_failed_region` to `should_keep_retained_region[_in_old]` or something? Is it possible to drop 1 so that an obj is evac-fail iff it's pinned or OOM? (I feel "retain" is on region-level.) >> src/hotspot/share/gc/g1/g1FullGCPrepareTask.inline.hpp line 82: >> >>> 80: } else { >>> 81: assert(hr->containing_set() == nullptr, "already cleared by PrepareRegionsClosure"); >>> 82: if (hr->has_pinned_objects() || >> >> This `do_heap_region` method is hard to follow; there multiple occurrences of same predicates. I wonder if one can reorganize these if-else a bit. Inlining `should_compact` should make all `if` on the same level at least. > > Apart from having an early return in the `should_compact`-if, one option would be making `has_pinned_objects()` more clever by stating something like: > > > bool has_pinned_objects() const { > return pinned_count() > 0 || (is_continues_humongous() && humongous_start_region()->pinned_count() > 0); > } > > > Then this predicate would get shorter. Or add a local helper for that (as suggested in the next commit). Either is fine with me. A local helper is possibly clearer here, IMO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1376247318 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1376251206 From eosterlund at openjdk.org Mon Oct 30 14:01:48 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 30 Oct 2023 14:01:48 GMT Subject: Integrated: 8310239: Add missing cross modifying fence in nmethod entry barriers In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 15:26:37 GMT, Erik ?sterlund wrote: > In fact, there is a current race in the nmethod entry barriers, where what we are doing violates the AMD APM (cf. APM volume 2 section 7.6.1 https://www.amd.com/system/files/TechDocs/24593.pdf). > In particular, if the compare instruction of the nmethod entry barrier is not yet patched and we call a slow path on thread 1, then before taking the nmethod entry lock, another thread 2 could fix and disarm the nmethod. Then thread 1 will observe *data* suggesting the nmethod has been patched, but never re-executes the patched compare (which might indeed still be stale), hence not qualifying for asynchronous cross modifying code, and neither do we run a serializing cpuid instruction, qualifying for synchronous cross modifying code. In this scenario, we can indeed start executing the nmethod instructions, while observing inconsistent concurrent patching effects, where some instructions will be updated and some not. > > The following patch ensures that x86 nmethod entry barriers execute cross modifying fence after calling into the VM, where another thread could have disarmed the nmethod. I also ensured the other platforms perform their fencing after the VM call, instead of before - including a cross_modify_fence in the shared code for OSR nmethod entries. While fencing before will flush out the instruction pipeline, and it shouldn't be populated with problematic instructions until after we start executing the nmethod again, it feels unnecessary to fence on the wrong side of the modifications it wishes to guard, and hence not strictly following the synchronous cross modifying fence recipe. > > I'm currently running tier1-5 and running performance testing in aurora. In the interest of time, I'm opening this PR before getting the final result, and will report the results when they come in. This pull request has now been integrated. Changeset: 4679e9aa Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/4679e9aa00c098cff715fb4deeb4d736e1063571 Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod 8310239: Add missing cross modifying fence in nmethod entry barriers Reviewed-by: aboldtch, dlong, aph ------------- PR: https://git.openjdk.org/jdk/pull/14543 From jsjolen at openjdk.org Mon Oct 30 15:01:42 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 30 Oct 2023 15:01:42 GMT Subject: RFR: 8319115: GrowableArray: Do not initialize up to capacity Message-ID: Hi, There's no reason to initialize the memory for the elements in the range `[_len, _capacity)` on resize, so let's not do it. Currently running testing tier1-3. ------------- Commit messages: - Make into one loop - Do not init or destruct capacity Changes: https://git.openjdk.org/jdk/pull/16418/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16418&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319115 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16418/head:pull/16418 PR: https://git.openjdk.org/jdk/pull/16418 From stuefe at openjdk.org Mon Oct 30 15:11:34 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 30 Oct 2023 15:11:34 GMT Subject: RFR: 8319115: GrowableArray: Do not initialize up to capacity In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 14:49:25 GMT, Johan Sj?len wrote: > Hi, > > There's no reason to initialize the memory for the elements in the range `[_len, _capacity)` on resize, so let's not do it. > > Currently running testing tier1-3. Makes sense, but it may make sense to poison them in debug builds (we poison in os::realloc, but only if NMT is enabled). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16418#issuecomment-1785425811 From jsjolen at openjdk.org Mon Oct 30 15:57:40 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 30 Oct 2023 15:57:40 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor Message-ID: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Hi, When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. Currently running tier1-tier4. ------------- Commit messages: - Introduce functional filler Changes: https://git.openjdk.org/jdk/pull/16409/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319117 Stats: 39 lines in 2 files changed: 33 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From tschatzl at openjdk.org Mon Oct 30 16:33:25 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Oct 2023 16:33:25 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v3] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Move tests into gc.g1.pinnedobjs package ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/e6646399..1b1d8ba9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=01-02 Stats: 11 lines in 5 files changed: 2 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From ogillespie at openjdk.org Mon Oct 30 16:59:45 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 30 Oct 2023 16:59:45 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v3] In-Reply-To: References: Message-ID: > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Fix failing tests TempNewSymbol now increments refcount again, messing with the expectations. This is not a complete fix, I will have to read the individual cases and make sure they are correct. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/98cddcad..6d41aa0a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=01-02 Stats: 14 lines in 2 files changed: 1 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From tschatzl at openjdk.org Mon Oct 30 17:14:32 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Oct 2023 17:14:32 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v3] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 13:41:02 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/heapRegion.cpp line 734: >> >>> 732: // ranges passed in here corresponding to the space between live objects, it is >>> 733: // possible that there is a pinned object that is not any more referenced by >>> 734: // Java code (only by native). >> >> Can such obj becomes referenced by java again later on? IOW, a pointer passed from native to java. > > I do not think so. I will do some more testing about this. I (still) do not think it is possible after some more re-testing. There are the following situations I can think of: * string deduplication is a need-to-be-supported case where only the C code may have a reference to a pinned object: thread A critical sections a string, gets the char array address, locking the region containing the char array. Then string dedup goes ahead and replaces the original char array with something else. Now the C code has the only reference to that char array. There is no API to convert a raw array pointer back to a Java object so destroying the header is fine; unpinning does not need the header. * there is some other case I can think of that could be problematic, but is actually a spec violation: the array is critical-locked by thread A, then shared with other C code (not critical-unlocked), resumes with Java code that forgets that reference. At some point other C code accesses that locked memory and (hopefully) critically-unlocks it. Again, there is no API to convert a raw array pointer back to a Java object so destroying the header is fine. In all other cases I can think of there is always a reference to the encapsulating java object either from the stack frame (when passing in the object into the JNI function they are part of the oop maps) or if you create a new array object (via `NewArray` and lock it, the VM will add a handle to it. There is also no API to inspect the array header using the raw pointer (e.g. passing the raw pointer to `GetArrayLength` - doesn't compile as it expects a `jarray`, and in debug VMs there is actually a check that the passed argument is something that resembles a handle), so modifications are already invalid, and the change is fine imo. hth, Thomas ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1376560372 From tschatzl at openjdk.org Mon Oct 30 17:41:32 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Oct 2023 17:41:32 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v3] In-Reply-To: References: Message-ID: <8fZa33eLsVSrL-9U6EiETW9yRSQBN-YbefzFVD__DMs=.31e7e53d-547f-46bb-ab56-a49654fc9282@github.com> On Mon, 30 Oct 2023 13:46:21 GMT, Albert Mingkun Yang wrote: >> Apart from having an early return in the `should_compact`-if, one option would be making `has_pinned_objects()` more clever by stating something like: >> >> >> bool has_pinned_objects() const { >> return pinned_count() > 0 || (is_continues_humongous() && humongous_start_region()->pinned_count() > 0); >> } >> >> >> Then this predicate would get shorter. Or add a local helper for that (as suggested in the next commit). Either is fine with me. > > A local helper is possibly clearer here, IMO. Done in one of the latest commits. Resolving. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1376594853 From tschatzl at openjdk.org Mon Oct 30 17:58:34 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Oct 2023 17:58:34 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v3] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 13:43:33 GMT, Albert Mingkun Yang wrote: >> Maybe rename `should_retain_evac_failed_region` to `should_keep_retained_region[_in_old]` or something? > > Is it possible to drop 1 so that an obj is evac-fail iff it's pinned or OOM? (I feel "retain" is on region-level.) The `G1EvacFailureRegions` class is on region level. There is a need for a term that encompasses both pinned and evac-failed regions. So far we used "retained" (i.e. contents partially not moved), also in logging. I am obviously open for suggestions, but I am not sure "OOM" in any variant is a good name to replace current "evac-fail" regions. Right now I don't see a good name. Maybe if we change `handle_evacuation_failure_par()` to something else? Like `handle_unmovable_object` or so to get rid of "evacuation failure" for objects? To some degree, we don't "fail" evacuation due to pinning, pinning is a conscious decision. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1376612711 From lmesnik at openjdk.org Mon Oct 30 18:32:35 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 30 Oct 2023 18:32:35 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: On Sun, 29 Oct 2023 14:10:32 GMT, Jaikiran Pai wrote: >> The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. >> Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. >> >> A few tests start failing. >> >> The test >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java >> has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. >> >> The test >> java/lang/Thread/virtual/ThreadAPI.java >> tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. >> >> Test >> test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. >> >> Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. > > Hello Leonid, in order to understand what exactly we are trying to solve here, I ran a few tests to see how things work without the changes being proposed in this PR. Here's my findings. > > A bit of background first. When jtreg runs, either in agent mode or othervm mode, it creates a specific thread within which the actual test code runs. In either of these modes, it uses a custom jtreg specific `ThreadGroup` instance for this thread which is running the test code. This instance of `ThreadGroup` has specific overridden implementation of the `public void uncaughtException(Thread t, Throwable e)` API which keeps track of uncaught exception that might have been thrown by any threads that could have been spawned by the test. After `start()`ing the thread which runs the test code, the jtreg framework then waits for that thread to complete and once completed (either exceptionally or normally), jtreg framework then queries a state on the custom `ThreadGroup` instance to see if any uncaught exception(s) were received during the lifetime of this thread which ran that test. If it finds any, then it marks the test as failed and reports such a failure appropriately in the test re port. As noted, this applies for both the agent mode and other vm mode. Some important aspects of this implementation is that: > > - The custom `ThreadGroup` which has the overridden implementation of the `uncaughtException(Thread t, Throwable e)` method doesn't interfere with the JVM level default exception handler. > > - After the thread which ran the test code completes, the decision on whether to fail or pass a test is taken by checking the custom `ThreadGroup`'s state. Once this decision is done, the decision is immediately reported in relevant ways and the test status is marked (i.e. finalized) at this point. > > - If this was running in agent vm mode, the agent vm mode continues to operate and isn't terminated and thus is available for subsequent tests. This point I think is important to remember for reasons that will be noted later in this comment. > > Now coming to the part where in https://bugs.openjdk.org/browse/JDK-8303703 we introduced a way where jtreg instead of creating a platform thread (backed by its custom implementation of the `ThreadGroup`) to run the test code, now checks the presence of a `ThreadFactory` and if present, lets it create a `Thread` which runs this test code. In the JDK repo, we have an implementation of a `ThreadFactory` which creates a virtual thread instead of platform... @jaikiran, your analysis is correct. @jaikiran , @dholmes-ora The jtreg is going to be fixed to handle all uncaught exceptions. See PR: https://github.com/openjdk/jtreg/pull/172 The problem might happens not only with test thread factory, but for any virtual threads and might be for system threads. It is more generic solution and might take a lot of time to be correctly implemented. So this pr is just temporary fix until jtreg is updated. I could withdraw this PR, but not sure what are the risks/issues if I integrate it. We are going just to have a ugly error reporting for the uncaught threads when test thread factory is used or missed something? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1785813862 From cslucas at openjdk.org Mon Oct 30 19:59:47 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 30 Oct 2023 19:59:47 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v4] In-Reply-To: References: Message-ID: > ### Description > > Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. > > Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. > > The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. > > ### Benchmarking > > **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. > **Note 2:** Marging of error was negligible. > > | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | > |--------------------------------------|------------------|-------------------| > | TestTrapAfterMerge | 19.515 | 13.386 | > | TestArgEscape | 33.165 | 33.254 | > | TestCallTwoSide | 70.547 | 69.427 | > | TestCmpAfterMerge | 16.400 | 2.984 | > | TestCmpMergeWithNull_Second | 27.204 | 27.293 | > | TestCmpMergeWithNull | 8.248 | 4.920 | > | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | > | TestCondAfterMergeWithNull | 6.265 | 5.078 | > | TestCondLoadAfterMerge | 12.713 | 5.163 | > | TestConsecutiveSimpleMerge | 30.863 | 4.068 | > | TestDoubleIfElseMerge | 16.069 | 2.444 | > | TestEscapeInCallAfterMerge | 23.111 | 22.924 | > | TestGlobalEscape | 14.459 | 14.425 | > | TestIfElseInLoop | 246.061 | 42.786 | > | TestLoadAfterLoopAlias | 45.808 | 45.812 | > | TestLoadAfterTrap | 28.370 | ... Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Ammend previous fix & add repro tests. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15825/files - new: https://git.openjdk.org/jdk/pull/15825/files/9d09d872..ad6b9d1a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=02-03 Stats: 115 lines in 2 files changed: 98 ins; 5 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/15825.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15825/head:pull/15825 PR: https://git.openjdk.org/jdk/pull/15825 From matsaave at openjdk.org Mon Oct 30 20:19:45 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 30 Oct 2023 20:19:45 GMT Subject: RFR: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp [v2] In-Reply-To: References: Message-ID: > Calls in instanceKlass.cpp and unsafe.cpp try to call an atomic load on method calls that could return nullptr. This patch ensures that nullptr is not passed into the load. Verified with tier1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Moved assert higher in call stack ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16405/files - new: https://git.openjdk.org/jdk/pull/16405/files/a546415a..0c91778f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16405&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16405&range=00-01 Stats: 4 lines in 3 files changed: 2 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16405.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16405/head:pull/16405 PR: https://git.openjdk.org/jdk/pull/16405 From matsaave at openjdk.org Mon Oct 30 20:27:02 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 30 Oct 2023 20:27:02 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v6] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge branch 'master' into method_entry_8301997 - Removed flag arg from prepare_invoke on aarch - Fixed bytecode tracer - Coleen and Fei comments - Merge branch 'master' into method_entry_8301997 - Added asserts for getters and fixed printing - Removed dead code in interpreters - Removed unused structures, improved set_method_handle and appendix_if_resolved - Removed some comments and relocated code - 8301997: Move method resolution information out of the cpCache ------------- Changes: https://git.openjdk.org/jdk/pull/15455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=05 Stats: 2950 lines in 64 files changed: 946 ins; 1558 del; 446 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From coleenp at openjdk.org Mon Oct 30 20:28:31 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 30 Oct 2023 20:28:31 GMT Subject: RFR: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp [v2] In-Reply-To: References: Message-ID: <-1Lov-AqYoY3LCyoctox4fRMBQ_2BFoKyHb898eRaRE=.e5acf632-079f-4e90-877c-dfc36c2e0549@github.com> On Mon, 30 Oct 2023 20:19:45 GMT, Matias Saavedra Silva wrote: >> Calls in instanceKlass.cpp and unsafe.cpp try to call an atomic load on method calls that could return nullptr. This patch ensures that nullptr is not passed into the load. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Moved assert higher in call stack Yes, this looks better. This was the source of the nullptr, except in these two cases, the pointer is never null. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16405#pullrequestreview-1705005437 From dlong at openjdk.org Mon Oct 30 21:08:34 2023 From: dlong at openjdk.org (Dean Long) Date: Mon, 30 Oct 2023 21:08:34 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: <33rkb93MSAYffTX3bYSyn9j6GLBcntqScDpLP9_PvNM=.bdc13397-86d2-43a2-b3db-010089aee397@github.com> On Sun, 29 Oct 2023 14:00:25 GMT, Johan Sj?len wrote: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. How do you see this being used? Shouldn't the filler be part of the template type for the class, so it can be used in the ctor? Is there really a need for each call to at_put to use a different filler function? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16409#issuecomment-1786040753 From coleenp at openjdk.org Mon Oct 30 21:28:36 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 30 Oct 2023 21:28:36 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v3] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 16:59:45 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix failing tests > > TempNewSymbol now increments refcount again, messing with the > expectations. This is not a complete fix, I will have to read the > individual cases and make sure they are correct. I think this looks really good. Test idea enclosed. src/hotspot/share/oops/symbolHandle.hpp line 86: > 84: // or entries that are held elsewhere - it's a waste of effort. > 85: if (s != nullptr && s->refcount() == 1) { > 86: add_to_cleanup_delay_queue(s); This could add the comment that adding to the delay queue will increment the refcount again for the entry while in the queue (had to look again to verify that). src/hotspot/share/oops/symbolHandle.hpp line 114: > 112: // and this queue allows them to be reused instead of churning. > 113: void add_to_cleanup_delay_queue(Symbol* sym) { > 114: if (sym == nullptr) return; sym is never null here since you check it above. test/hotspot/gtest/classfile/test_symbolTable.cpp line 40: > 38: int abccount = abc->refcount(); > 39: TempNewSymbol ss = abc; > 40: // TODO: properly account for Symbol cleanup delay queue I wonder if you can programmatically change the queue length to zero and keep these counts. Then add a test with some loop of n being the queue length, and create n symbols and check that the first has been reclaimed? ------------- PR Review: https://git.openjdk.org/jdk/pull/16398#pullrequestreview-1705095684 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1376819360 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1376816877 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1376815796 From ccheung at openjdk.org Mon Oct 30 22:12:31 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 30 Oct 2023 22:12:31 GMT Subject: RFR: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp [v2] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 20:19:45 GMT, Matias Saavedra Silva wrote: >> Calls in instanceKlass.cpp and unsafe.cpp try to call an atomic load on method calls that could return nullptr. This patch ensures that nullptr is not passed into the load. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Moved assert higher in call stack Updated changes look good. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16405#pullrequestreview-1705152443 From dholmes at openjdk.org Mon Oct 30 22:49:33 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 30 Oct 2023 22:49:33 GMT Subject: RFR: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp [v2] In-Reply-To: References: Message-ID: <3VWRgiKm23AkPtbApPyYRYt_WV9XHc_Ah_F1gsIuDiA=.483fed9b-34e8-4e3a-b9af-87af567f7a87@github.com> On Mon, 30 Oct 2023 20:19:45 GMT, Matias Saavedra Silva wrote: >> Calls in instanceKlass.cpp and unsafe.cpp try to call an atomic load on method calls that could return nullptr. This patch ensures that nullptr is not passed into the load. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Moved assert higher in call stack Not sure why it needed to be lifted out of Unsafe. The issue description should be updated now. ------------- PR Review: https://git.openjdk.org/jdk/pull/16405#pullrequestreview-1705187920 From dholmes at openjdk.org Mon Oct 30 22:56:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 30 Oct 2023 22:56:31 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: <0EcMjUEFedq-qGfu52qTJ_5X6jde-BD4Ej9NWvsNXfs=.30d2ea1a-7138-4118-aa1f-9b9c1367f3dc@github.com> On Mon, 30 Oct 2023 18:29:30 GMT, Leonid Mesnik wrote: >> Hello Leonid, in order to understand what exactly we are trying to solve here, I ran a few tests to see how things work without the changes being proposed in this PR. Here's my findings. >> >> A bit of background first. When jtreg runs, either in agent mode or othervm mode, it creates a specific thread within which the actual test code runs. In either of these modes, it uses a custom jtreg specific `ThreadGroup` instance for this thread which is running the test code. This instance of `ThreadGroup` has specific overridden implementation of the `public void uncaughtException(Thread t, Throwable e)` API which keeps track of uncaught exception that might have been thrown by any threads that could have been spawned by the test. After `start()`ing the thread which runs the test code, the jtreg framework then waits for that thread to complete and once completed (either exceptionally or normally), jtreg framework then queries a state on the custom `ThreadGroup` instance to see if any uncaught exception(s) were received during the lifetime of this thread which ran that test. If it finds any, then it marks the test as failed and reports such a failure appropriately in the test r eport. As noted, this applies for both the agent mode and other vm mode. Some important aspects of this implementation is that: >> >> - The custom `ThreadGroup` which has the overridden implementation of the `uncaughtException(Thread t, Throwable e)` method doesn't interfere with the JVM level default exception handler. >> >> - After the thread which ran the test code completes, the decision on whether to fail or pass a test is taken by checking the custom `ThreadGroup`'s state. Once this decision is done, the decision is immediately reported in relevant ways and the test status is marked (i.e. finalized) at this point. >> >> - If this was running in agent vm mode, the agent vm mode continues to operate and isn't terminated and thus is available for subsequent tests. This point I think is important to remember for reasons that will be noted later in this comment. >> >> Now coming to the part where in https://bugs.openjdk.org/browse/JDK-8303703 we introduced a way where jtreg instead of creating a platform thread (backed by its custom implementation of the `ThreadGroup`) to run the test code, now checks the presence of a `ThreadFactory` and if present, lets it create a `Thread` which runs this test code. In the JDK repo, we have an implementation of a `ThreadFactory` which creates a virtual thre... > > @jaikiran, your analysis is correct. > @jaikiran , @dholmes-ora The jtreg is going to be fixed to handle all uncaught exceptions. See PR: https://github.com/openjdk/jtreg/pull/172 > The problem might happens not only with test thread factory, but for any virtual threads and might be for system threads. It is more generic solution and might take a lot of time to be correctly implemented. So this pr is just temporary fix until jtreg is updated. I could withdraw this PR, but not sure what are the risks/issues if I integrate it. We are going just to have a ugly error reporting for the uncaught threads when test thread factory is used or missed something? @lmesnik Using `System.exit` is just wrong - we don't use that in JTREG tests at all. I would suggest just ProblemListing problematic tests that won't work as expected with virtual threads, until jtreg addresses this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1786171041 From jjoo at openjdk.org Mon Oct 30 23:25:37 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Mon, 30 Oct 2023 23:25:37 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v33] In-Reply-To: <9tV7khSiXH_2_Ju1_egmea6dyYQMC6HKSmcfblg0xSw=.18b97f9d-0fda-4043-8d71-eac0e857fd19@github.com> References: <2VxMwwQKNjq7EeXnyQ7fo_aXYj2EGCJyW2MrAUkoSQs=.902af266-1470-43ed-8e4c-a8b7611cc154@github.com> <9tV7khSiXH_2_Ju1_egmea6dyYQMC6HKSmcfblg0xSw=.18b97f9d-0fda-4043-8d71-eac0e857fd19@github.com> Message-ID: On Fri, 27 Oct 2023 14:00:43 GMT, Albert Mingkun Yang wrote: > Okay, these counters can be accessed frequently, but is it necessary for them to provide up-to-date information on every access? If not, what level of delay is acceptable? I assume this depends on how often AHS resizes the heap. (In the Parallel case, I believe the counters can be outdated for the duration of the full-gc.) The way AHS is implemented, we access these counters upon every heap resize, so in theory every call to `expand()` and `shrink()` will rely to some extent on these counters. If the counters do happen to be stale, AHS won't break, but will certainly be less effective. For the rest of the points, Man do you have any additional insight? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1786197139 From never at openjdk.org Mon Oct 30 23:28:32 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 30 Oct 2023 23:28:32 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: References: Message-ID: On Sun, 29 Oct 2023 20:39:47 GMT, Doug Simon wrote: >> This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > allow JavaCalls in HotSpotConstantPool.callSystemExit This looks better than I'd imagined. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16383#pullrequestreview-1705222347 From manc at openjdk.org Tue Oct 31 00:55:39 2023 From: manc at openjdk.org (Man Cao) Date: Tue, 31 Oct 2023 00:55:39 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v33] In-Reply-To: <2VxMwwQKNjq7EeXnyQ7fo_aXYj2EGCJyW2MrAUkoSQs=.902af266-1470-43ed-8e4c-a8b7611cc154@github.com> References: <2VxMwwQKNjq7EeXnyQ7fo_aXYj2EGCJyW2MrAUkoSQs=.902af266-1470-43ed-8e4c-a8b7611cc154@github.com> Message-ID: <8Ln25BTyHKzq_AfWrvQ_pZweFj6AkT1uvUa8GFndfmM=.c9d28dc5-daec-44ce-a6b2-4a8da1230db0@github.com> On Thu, 26 Oct 2023 21:01:53 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Remove StringDedup from GC thread list > Okay, these counters can be accessed frequently, but is it necessary for them to provide up-to-date information on every access? If not, what level of delay is acceptable? (In the Parallel case, I believe the counters can be outdated for the duration of the full-gc.) Besides Jonathan's point, in our experience updating these counters once every 1 second is good enough for AHS. It might even be OK for once every 2-3 seconds. We just don't want the counters to be outdated for tens or hundreds of seconds. Also the "tens or hundreds of seconds" delay is mainly a problem for concurrent mark, but less of a problem for GC pauses like the full-GC, because: - Multi-second GC pauses are uncommon compared to multi-second concurrent mark. - GC pauses are frequent. It is OK for AHS to get slightly outdated info for the ongoing GC pause, because AHS can quickly influence the next GC pause after the ongoing GC pause finishes and updates the counters. This is also the reason we only refresh `sun.threads.total_gc_cpu_time` after a GC pause. > My primary concern is that the change in G1 is too intrusive -- the logic for tracking/collecting thread-CPU is scattered in many places. Additionally, the situation is expected to worsen in the near future, based on the statement "I can create a separate RFE to make it update more frequently..." I think most G1 changes in this PR are straightforward and easy to maintain. With the exception of concurrent mark (`sun.threads.cpu_time.gc_conc_mark`), each counter is only updated at exactly one place. As part of [JDK-8318941](https://bugs.openjdk.org/browse/JDK-8318941), we could find a way to update `sun.threads.cpu_time.gc_conc_mark` at only one place as well. In addition, we think it is well worth the effort for G1 (or any modern garbage collector) to keep track of CPU time spent by their GC threads. Besides monitoring benefits of users and external tools like AHS, it could open up opportunity for G1 to develop better heuristics based on CPU time. We find CPU time spent by GC threads is a better measure for GC overhead, than wall (pause) time. There was [a discussion](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2021-May/035241.html) about `GCTimeRatio` and CPU time. Today even after [JDK-8253413](https://bugs.openjdk.org/browse/JDK-8253413), `GCTimeRatio` still only accounts for pause time. We hope JVM could provide a flag `GCCpuRatio` and resizes its heap to respect `GCCpuRatio`. AHS actually tries to partly achieve the effect of a `GCCpuRatio` flag. > Also, why isn't `G1ServiceThread` part of the change? I would expect all subclasses of ConcurrentGCThread to be taken into account. Is this omission intentional? Thanks. We missed this thread in our accounting due to oversight. It was named `G1YoungRemSetSamplingThread` before. @jjoo172, could you add an hsperf counter for `G1ServiceThread`? > Finally, thread-CPU time is something tracked at the OS level, so it's a bit odd that one has to instrument the VM to get that information. > According to https://man7.org/linux/man-pages/man5/proc.5.html, "(14) utime" + "(15) stime %lu" == thread-cpu-time. cat `/proc//task/*/stat` lists all VM internal threads, including GC, JIT, and etc. It is possible, but it is a lot of work for the users. `/proc//task/*/stat` lists all Java threads from the application as well. Users would need deep JVM knowledge to find out which threads are GC threads, which are JIT threads, etc. Quite a few other issues come along as well: - What if the JVM's internal threads are renamed across different JDK versions? (E.g. `GC Thread#N` was named `Gang worker#N` in JDK 8.) - Different tools that need to read CPU time would each need to implement a parser for `/proc//task/*/stat`, and deal with similar problems. - What if a tool needs to support multiple OSes like Windows, which do not have `/proc` FS? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1786267159 From jrose at openjdk.org Tue Oct 31 00:59:37 2023 From: jrose at openjdk.org (John R Rose) Date: Tue, 31 Oct 2023 00:59:37 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v7] In-Reply-To: References: <12Jiijy5Y9tn-_eHyE01oDMVUomgXbEiuEgDPgY8GU0=.af4948be-6026-4b35-bcd1-b281855e3ede@github.com> Message-ID: On Sun, 29 Oct 2023 07:51:48 GMT, Kim Barrett wrote: >> @dholmes-ora Yes it helps avoid copying, especially if the copy constructor is non-trivial. And I think it is more idiomatic in C++ to use references here. > > Using a reference here leads to unnecessary overhead when `E` is small and > trivially copyable, unless the predicate function is inlined. Pass by value is > better in that case. Of course, as noted above, if `E` is "expensive" to copy > or non-copyable then a reference is needed. Boost has this thing called > `call_traits::param_type` for this issue, but I don't recommend we copy > that. > > Idiomatic C++ makes the entire function a template parameter, as was suggested > earlier in this PR. That dodges this question entirely, leaving the parameter > passing decision to the predicate function where it belongs, rather than > having it imposed by GrowableArray::find. The find function just imposes the > requirement that the predicate satisfies the appropriate constraints, e.g. it > is callable on an element reference and returns convertible to bool. I agree we should be using a template-typed function instead of a function pointer here. I think a lot of our uses of function pointers in our code base would work better as template-typed args. See for example the `grow` argument (of template type `GFN`) at this point: https://github.com/openjdk/jdk/blob/d051f22284e7ccc288c658588f73da672d9bfacd/src/hotspot/share/utilities/unsigned5.hpp#L343C34-L343C34 In that case I cited, if the `grow` argument it were a function pointer instead of a template-typed function-like argument (Kim what?s the right term here? ?functor???), then performance and flexibility would be unacceptably harmed. I think the `find` argument is just the same kind of thingy. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1376940244 From jjoo at openjdk.org Tue Oct 31 04:17:07 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 31 Oct 2023 04:17:07 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v34] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Implement hsperf counter for G1ServiceThread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/2fc508f7..0ef70468 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=32-33 Stats: 19 lines in 4 files changed: 17 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Tue Oct 31 04:23:13 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 31 Oct 2023 04:23:13 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v35] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Replace NULL with nullptr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/0ef70468..be104e16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=33-34 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From kbarrett at openjdk.org Tue Oct 31 05:16:33 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 31 Oct 2023 05:16:33 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v7] In-Reply-To: References: <12Jiijy5Y9tn-_eHyE01oDMVUomgXbEiuEgDPgY8GU0=.af4948be-6026-4b35-bcd1-b281855e3ede@github.com> Message-ID: On Tue, 31 Oct 2023 00:57:02 GMT, John R Rose wrote: >> Using a reference here leads to unnecessary overhead when `E` is small and >> trivially copyable, unless the predicate function is inlined. Pass by value is >> better in that case. Of course, as noted above, if `E` is "expensive" to copy >> or non-copyable then a reference is needed. Boost has this thing called >> `call_traits::param_type` for this issue, but I don't recommend we copy >> that. >> >> Idiomatic C++ makes the entire function a template parameter, as was suggested >> earlier in this PR. That dodges this question entirely, leaving the parameter >> passing decision to the predicate function where it belongs, rather than >> having it imposed by GrowableArray::find. The find function just imposes the >> requirement that the predicate satisfies the appropriate constraints, e.g. it >> is callable on an element reference and returns convertible to bool. > > I agree we should be using a template-typed function instead of a function pointer here. > I think a lot of our uses of function pointers in our code base would work better as template-typed args. > See for example the `grow` argument (of template type `GFN`) at this point: > https://github.com/openjdk/jdk/blob/d051f22284e7ccc288c658588f73da672d9bfacd/src/hotspot/share/utilities/unsigned5.hpp#L343C34-L343C34 > > In that case I cited, if the `grow` argument it were a function pointer instead of a template-typed function-like argument (Kim what?s the right term here? ?functor???), then performance and flexibility would be unacceptably harmed. > > I think the `find` argument is just the same kind of thingy. "Functor" is a reasonable choice, with plenty of precedent. The C++ standard doesn't use the term "functor". Instead it defines (C++14 20.9) the terms "function object type" and associated "function object" to include all of (1) pointer to function types (2) class types with a member operator() (3) class types with a conversion to pointer to function The C++ standard doesn't have distinct terminology for cases (2) and (3), so far as I can tell. Sometimes people use either "functor" or "function object class" to distinguish (2) (web search for "C++ functor" for examples of the former). Sometimes people use "function object" to refer only to (2). So the community is inconsistent in this respect. I'm not sure I've ever seen (3). (Not even in "look at this obscure C++ thing" discussions.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1377062682 From jpai at openjdk.org Tue Oct 31 05:59:31 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Tue, 31 Oct 2023 05:59:31 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions In-Reply-To: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: On Wed, 25 Oct 2023 21:08:01 GMT, Leonid Mesnik wrote: > The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. > Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. > > A few tests start failing. > > The test > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java > has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. > > The test > java/lang/Thread/virtual/ThreadAPI.java > tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. > > Test > test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. > > Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. Hello Leonid, > So this pr is just temporary fix until jtreg is updated. > ... > I could withdraw this PR, but not sure what are the risks/issues if I integrate it. We are going just to have a ugly error reporting for the uncaught threads when test thread factory is used or missed something? I think there is more than one problem with the proposed approach and those problems outweigh any usefulness this change might bring in. Specifically, this PR introduces a call to `System.exit()` which terminates a JVM. When that happens, it's not the ugly error reporting which is a problem, but the fact that tests which get abruptly terminated like this will have no way to determine what went wrong. For example, you will see: > result: Error. Agent communication error: java.io.EOFException; check console log for any additional details > > > test result: Error. Agent communication error: java.io.EOFException; check console log for any additional details Whether or not there will be any additional logs anywhere is uncertain because jtreg and other infrastructure which does any kind of buffering of logs and other state wouldn't have had a chance to properly report the state/logs. It's almost as if the (remote) JVM has crashed. In fact even genuine failures from the test may not get reported and instead might get replaced by this generic JVM termination reporting. Furthermore, in its current form the System.exit() gets called when an uncaught exception gets thrown from some thread. When this happens, the test itself might not have completed (unlike in the case of the platform threads, where jtreg just keeps track of the uncaught exception and when the test is finally complete, it uses that state to do the decision of failing or passing the test). Then there is the case where the JVM level uncaught exception handler is being changed in this proposal. What that means is, when run with a virtual thread, this will now impact tests which (rightly) expect that the default uncaught exception handler would be null. One such example is the `test/jdk/java/util/concurrent/tck/ThreadTest.java` which is being updated in this PR. Not even using `/othervm` in those tests will get them to a state where they can expect the default uncaught exception handler to be null because even in `/othervm` scenarios, this proposed change will have set the default uncaught exception handler. If we leave out setting the default uncaught exception handler and the call to System.exit() from this PR, then what remains is just the exception logging and stacktrace printing. That part is currently anyway available even without this proposed change, because the "system" `ThreadGroup` does exactly that https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ThreadGroup.java#L696. In fact, if we currently run the tests using this virtual thread factory and then once the tests complete, if we run a search for this error mesage from the "system" ThreaGroup, we should be able to identify all those tests which had a thread throw some uncaught exception. Given all this, I think this PR isn't introducing anything new that will help us even in the short term. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1786498722 From stefank at openjdk.org Tue Oct 31 08:35:31 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 31 Oct 2023 08:35:31 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Sun, 29 Oct 2023 14:00:25 GMT, Johan Sj?len wrote: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. I'd also like to get a better understand how this is intended to use. It also lacks tests for this feature. Maybe you could write a gtest show-casing how this is supposed to be used? I'd prefer if there was a way to get rid of the SFINAE and the code duplication, if possible. There are a lot of inconsistencies of the formatting within the patch itself and with regards to the style used in the rest of the code. Could you take an extra pass add appropriate whitespaces and newlines? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16409#issuecomment-1786738435 From adinn at openjdk.org Tue Oct 31 08:51:52 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 31 Oct 2023 08:51:52 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v6] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 20:27:02 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Merge branch 'master' into method_entry_8301997 > - Removed flag arg from prepare_invoke on aarch > - Fixed bytecode tracer > - Coleen and Fei comments > - Merge branch 'master' into method_entry_8301997 > - Added asserts for getters and fixed printing > - Removed dead code in interpreters > - Removed unused structures, improved set_method_handle and appendix_if_resolved > - Removed some comments and relocated code > - 8301997: Move method resolution information out of the cpCache Thanks Matias. All my comments have been addressed. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15455#pullrequestreview-1705735470 From gcao at openjdk.org Tue Oct 31 09:41:46 2023 From: gcao at openjdk.org (Gui Cao) Date: Tue, 31 Oct 2023 09:41:46 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v6] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 20:27:02 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Merge branch 'master' into method_entry_8301997 > - Removed flag arg from prepare_invoke on aarch > - Fixed bytecode tracer > - Coleen and Fei comments > - Merge branch 'master' into method_entry_8301997 > - Added asserts for getters and fixed printing > - Removed dead code in interpreters > - Removed unused structures, improved set_method_handle and appendix_if_resolved > - Removed some comments and relocated code > - 8301997: Move method resolution information out of the cpCache Hi, @RealFYang and I have finished the RISC-V part, tier1-3 and hotspot:tier4 tested on hifive unmatched board. Please help us to add the RISC-V part, thanks a lot! [15455-riscv-port.diff.txt](https://github.com/openjdk/jdk/files/13214653/15455-riscv-port.diff.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1786849668 From qamai at openjdk.org Tue Oct 31 09:57:32 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 31 Oct 2023 09:57:32 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Sun, 29 Oct 2023 14:00:25 GMT, Johan Sj?len wrote: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. I think a more preferable approach is to do emplace-like filling template E& at_grow(int i, Args... args) { assert(0 <= i, "negative index %d", i); if (i >= this->_len) { if (i >= this->_capacity) { grow(i); } for (int j = this->_len; j <= i; j++) { _data[j].~E(); new (&_data[j]) E(args...); } this->_len = i + 1; } return _data[i]; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/16409#issuecomment-1786876601 From jsjolen at openjdk.org Tue Oct 31 10:04:31 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 10:04:31 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Sun, 29 Oct 2023 14:00:25 GMT, Johan Sj?len wrote: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. > I think a more preferable approach is to do emplace-like filling > > ``` > template > E& at_grow(int i, Args... args) { > assert(0 <= i, "negative index %d", i); > if (i >= this->_len) { > if (i >= this->_capacity) { > grow(i); > } > for (int j = this->_len; j <= i; j++) { > _data[j].~E(); > new (&_data[j]) E(args...); > } > this->_len = i + 1; > } > return _data[i]; > } > ``` I think you might be right. If I understand this correctly we can pick between copy construction (having `Args` be equal to `E`) and "regular" construction depending on the arguments provided? @stefank, @dean-long. Re: tests, yes, I should add tests. The goal here is to avoid copy construction, and I chose to provide a function so that you can yourself pick how to initialize the memory. I think @merykitty's solution might be preferable to mine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16409#issuecomment-1786888957 From qamai at openjdk.org Tue Oct 31 10:14:30 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 31 Oct 2023 10:14:30 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Tue, 31 Oct 2023 10:01:26 GMT, Johan Sj?len wrote: >> Hi, >> >> When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. >> >> I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. >> >> This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. >> >> Currently running tier1-tier4. > >> I think a more preferable approach is to do emplace-like filling >> >> ``` >> template >> E& at_grow(int i, Args... args) { >> assert(0 <= i, "negative index %d", i); >> if (i >= this->_len) { >> if (i >= this->_capacity) { >> grow(i); >> } >> for (int j = this->_len; j <= i; j++) { >> _data[j].~E(); >> new (&_data[j]) E(args...); >> } >> this->_len = i + 1; >> } >> return _data[i]; >> } >> ``` > > I think you might be right. If I understand this correctly we can pick between copy construction (having `Args` be equal to `E`) and "regular" construction depending on the arguments provided? > > @stefank, @dean-long. Re: tests, yes, I should add tests. The goal here is to avoid copy construction, and I chose to provide a function so that you can yourself pick how to initialize the memory. I think @merykitty's solution might be preferable to mine. @jdksjolen Yes you are right, note that the idiom for standard C++ looks like this template void emplace(Args&&... args) { call(std::forward(args)...); } And I'm not really sure what this would become without move semantics. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16409#issuecomment-1786906761 From jsjolen at openjdk.org Tue Oct 31 11:56:29 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 11:56:29 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Tue, 31 Oct 2023 10:11:59 GMT, Quan Anh Mai wrote: >>> I think a more preferable approach is to do emplace-like filling >>> >>> ``` >>> template >>> E& at_grow(int i, Args... args) { >>> assert(0 <= i, "negative index %d", i); >>> if (i >= this->_len) { >>> if (i >= this->_capacity) { >>> grow(i); >>> } >>> for (int j = this->_len; j <= i; j++) { >>> _data[j].~E(); >>> new (&_data[j]) E(args...); >>> } >>> this->_len = i + 1; >>> } >>> return _data[i]; >>> } >>> ``` >> >> I think you might be right. If I understand this correctly we can pick between copy construction (having `Args` be equal to `E`) and "regular" construction depending on the arguments provided? >> >> @stefank, @dean-long. Re: tests, yes, I should add tests. The goal here is to avoid copy construction, and I chose to provide a function so that you can yourself pick how to initialize the memory. I think @merykitty's solution might be preferable to mine. > > @jdksjolen Yes you are right, note that the idiom for standard C++ looks like this > > template > void emplace(Args&&... args) { > call(std::forward(args)...); > } > > And I'm not really sure what this would become without move semantics. Trying out @merykitty's idea reminded me of why this solution turned out like this: There's no 'generic' placement new-operator in `AnyObj`. This can be fixed by adding in ```c++ class AnyObj { //... operator new (size_t, void* ptr) { return ptr; } } tier1 and tier2 passes with the addition of this operator. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16409#issuecomment-1787070329 From jsjolen at openjdk.org Tue Oct 31 12:19:57 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 12:19:57 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Use variadic templates for in-place construction - Two tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/579083ee..3e2c7a1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=00-01 Stats: 77 lines in 3 files changed: 44 ins; 27 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From jsjolen at openjdk.org Tue Oct 31 12:27:06 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 12:27:06 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v3] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Simplify the patch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/3e2c7a1e..fa31729d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From jsjolen at openjdk.org Tue Oct 31 12:27:08 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 12:27:08 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: <4-q_GVxiCUffRIV-UHgLQ05BVfWlC1VlhRvu32W5Msw=.377165bb-2c78-4c1a-abff-a9effed8e826@github.com> On Tue, 31 Oct 2023 12:19:57 GMT, Johan Sj?len wrote: >> Hi, >> >> When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. >> >> I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. >> >> This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. >> >> Currently running tier1-tier4. > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Use variadic templates for in-place construction > - Two tests src/hotspot/share/memory/allocation.hpp line 500: > 498: return ptr; > 499: } > 500: This should probably be a separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1377500706 From qamai at openjdk.org Tue Oct 31 12:27:08 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 31 Oct 2023 12:27:08 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: <2K2wO2SGK7G-lR5Ml1pZyX9eQLDve5pJcjkuqbSQPZc=.d0c7fb43-d39e-4200-b281-362788777741@github.com> On Tue, 31 Oct 2023 12:19:57 GMT, Johan Sj?len wrote: >> Hi, >> >> When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. >> >> I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. >> >> This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. >> >> Currently running tier1-tier4. > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Use variadic templates for in-place construction > - Two tests src/hotspot/share/utilities/growableArray.hpp line 405: > 403: void push(const E& elem) { append(elem); } > 404: private: > 405: static void default_fill(E* ptr) { This method is unused I believe src/hotspot/share/utilities/growableArray.hpp line 410: > 408: public: > 409: template > 410: E at_grow(int i, Args... args) { Should this be `Args&... args`? Also this method is returning a value instead of a reference, is it intentional? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1377501493 PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1377501250 From ogillespie at openjdk.org Tue Oct 31 12:27:53 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 31 Oct 2023 12:27:53 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Adress comments Fix indentation Improve tests Improve comment Remove redundant null check Improve naming Pop when >, not >= max len ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/6d41aa0a..1cc810df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=02-03 Stats: 76 lines in 2 files changed: 39 ins; 8 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From ogillespie at openjdk.org Tue Oct 31 12:27:56 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 31 Oct 2023 12:27:56 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v3] In-Reply-To: References: Message-ID: <0VjsJRrXXBSdNhx18PBAa9oe94AStq7y5qVg5jTWBaI=.81362159-de7e-41ab-8cc8-3c9ab5aacc9a@github.com> On Mon, 30 Oct 2023 21:20:28 GMT, Coleen Phillimore wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix failing tests >> >> TempNewSymbol now increments refcount again, messing with the >> expectations. This is not a complete fix, I will have to read the >> individual cases and make sure they are correct. > > test/hotspot/gtest/classfile/test_symbolTable.cpp line 40: > >> 38: int abccount = abc->refcount(); >> 39: TempNewSymbol ss = abc; >> 40: // TODO: properly account for Symbol cleanup delay queue > > I wonder if you can programmatically change the queue length to zero and keep these counts. > > Then add a test with some loop of n being the queue length, and create n symbols and check that the first has been reclaimed? Good idea. I have implemented this in the next commit, but I'm not sure what the idiomatic way to expose this value for testing is - I have just done it naively so far. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1377504954 From jsjolen at openjdk.org Tue Oct 31 12:31:47 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 12:31:47 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: <2K2wO2SGK7G-lR5Ml1pZyX9eQLDve5pJcjkuqbSQPZc=.d0c7fb43-d39e-4200-b281-362788777741@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <2K2wO2SGK7G-lR5Ml1pZyX9eQLDve5pJcjkuqbSQPZc=.d0c7fb43-d39e-4200-b281-362788777741@github.com> Message-ID: On Tue, 31 Oct 2023 12:21:43 GMT, Quan Anh Mai wrote: >> Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: >> >> - Use variadic templates for in-place construction >> - Two tests > > src/hotspot/share/utilities/growableArray.hpp line 405: > >> 403: void push(const E& elem) { append(elem); } >> 404: private: >> 405: static void default_fill(E* ptr) { > > This method is unused I believe I hoped I'd manage to push a fix for that before anyone saw it :-). >Should this be Args&... args? Yes. >Also this method is returning a value instead of a reference, is it intentional? That is difficult to say. The caller can decided whether to capture the value by-reference or by-value either way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1377508655 PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1377507934 From jsjolen at openjdk.org Tue Oct 31 12:31:45 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 12:31:45 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v4] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Should take by reference and return reference ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/fa31729d..bcb098cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From jsjolen at openjdk.org Tue Oct 31 12:39:44 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 12:39:44 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v5] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Can't take by reference ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/bcb098cb..fb8c7061 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From jsjolen at openjdk.org Tue Oct 31 12:39:46 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 12:39:46 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <2K2wO2SGK7G-lR5Ml1pZyX9eQLDve5pJcjkuqbSQPZc=.d0c7fb43-d39e-4200-b281-362788777741@github.com> Message-ID: <-kCpEXE_nYfhhWc0DmVw816euNdRO7yHHektWGIlYyM=.13b04735-a6ec-4681-8682-f432f21cc9f5@github.com> On Tue, 31 Oct 2023 12:27:12 GMT, Johan Sj?len wrote: > > Should this be Args&... args? > > Yes. > Actually, no: We can't take by reference because most of the time we're calling `at_grow` and `at_put_grow` with rvalues and not lvalues, so we need move semantics for this to work. We have to keep the arguments as copying. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1377519599 From qamai at openjdk.org Tue Oct 31 12:51:33 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 31 Oct 2023 12:51:33 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: <-kCpEXE_nYfhhWc0DmVw816euNdRO7yHHektWGIlYyM=.13b04735-a6ec-4681-8682-f432f21cc9f5@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <2K2wO2SGK7G-lR5Ml1pZyX9eQLDve5pJcjkuqbSQPZc=.d0c7fb43-d39e-4200-b281-362788777741@github.com> <-kCpEXE_nYfhhWc0DmVw816euNdRO7yHHektWGIlYyM=.13b04735-a6ec-4681-8682-f432f21cc9f5@github.com> Message-ID: On Tue, 31 Oct 2023 12:36:58 GMT, Johan Sj?len wrote: >>>Should this be Args&... args? >> >> Yes. >> >>>Also this method is returning a value instead of a reference, is it intentional? >> >> That is difficult to say. The caller can decided whether to capture the value by-reference or by-value either way. > >> > Should this be Args&... args? >> >> Yes. >> > > Actually, no: We can't take by reference because most of the time we're calling `at_grow` and `at_put_grow` with rvalues and not lvalues, so we need move semantics for this to work. We have to keep the arguments as copying. I see, in that case it must be `const Args&... args`. https://godbolt.org/z/aov8v38s9 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1377534942 From jsjolen at openjdk.org Tue Oct 31 13:00:00 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 13:00:00 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v6] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: const Args&... works ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/fb8c7061..878635b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From jsjolen at openjdk.org Tue Oct 31 13:00:02 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 13:00:02 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <2K2wO2SGK7G-lR5Ml1pZyX9eQLDve5pJcjkuqbSQPZc=.d0c7fb43-d39e-4200-b281-362788777741@github.com> <-kCpEXE_nYfhhWc0DmVw816euNdRO7yHHektWGIlYyM=.13b04735-a6ec-4681-8682-f432f21cc9f5@github.com> Message-ID: On Tue, 31 Oct 2023 12:48:14 GMT, Quan Anh Mai wrote: >>> > Should this be Args&... args? >>> >>> Yes. >>> >> >> Actually, no: We can't take by reference because most of the time we're calling `at_grow` and `at_put_grow` with rvalues and not lvalues, so we need move semantics for this to work. We have to keep the arguments as copying. > > I see, in that case it must be `const Args&... args`. https://godbolt.org/z/aov8v38s9 Fixed, thanks for the help with this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1377545318 From jsjolen at openjdk.org Tue Oct 31 13:41:55 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 13:41:55 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v7] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Initialize member ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/878635b1..fa50a221 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From never at openjdk.org Tue Oct 31 14:10:36 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 31 Oct 2023 14:10:36 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: References: Message-ID: <1m_h_Q9e64V4ZKK8EOI3hdylhOjfbf54aQfPGn-1ElE=.33cd6b29-1f22-48dc-be7f-136df29dd5e9@github.com> On Sun, 29 Oct 2023 20:39:47 GMT, Doug Simon wrote: >> This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > allow JavaCalls in HotSpotConstantPool.callSystemExit Should we convert any assert only `can_call_java` checks into guarantees? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16383#issuecomment-1787292833 From dnsimon at openjdk.org Tue Oct 31 14:27:37 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 31 Oct 2023 14:27:37 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: <1m_h_Q9e64V4ZKK8EOI3hdylhOjfbf54aQfPGn-1ElE=.33cd6b29-1f22-48dc-be7f-136df29dd5e9@github.com> References: <1m_h_Q9e64V4ZKK8EOI3hdylhOjfbf54aQfPGn-1ElE=.33cd6b29-1f22-48dc-be7f-136df29dd5e9@github.com> Message-ID: On Tue, 31 Oct 2023 14:07:51 GMT, Tom Rodriguez wrote: > Should we convert any assert only can_call_java checks into guarantees? Makes sense to me. There are only 3 such assertions and none of them are on performance critical paths: * https://github.com/openjdk/jdk/blob/3e39d7b34cb310343a34adddc06bf1aaf4cacfb1/src/hotspot/share/classfile/systemDictionary.cpp#L614 * https://github.com/openjdk/jdk/blob/3e39d7b34cb310343a34adddc06bf1aaf4cacfb1/src/hotspot/share/classfile/systemDictionary.cpp#L2059 * https://github.com/openjdk/jdk/blob/3e39d7b34cb310343a34adddc06bf1aaf4cacfb1/src/hotspot/share/prims/upcallLinker.cpp#L82 What do you think @dholmes-ora @vnkozlov ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16383#issuecomment-1787325023 From kbarrett at openjdk.org Tue Oct 31 14:40:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 31 Oct 2023 14:40:36 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v7] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: <2lKJmd3IjknwUw1KHpU1Wk24TXaGELOIlQe1LrRJK_k=.1621c791-6495-457a-b4cb-60c716ef5484@github.com> On Tue, 31 Oct 2023 13:41:55 GMT, Johan Sj?len wrote: >> Hi, >> >> When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. >> >> I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. >> >> This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. >> >> Currently running tier1-tier4. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Initialize member Changes requested by kbarrett (Reviewer). src/hotspot/share/utilities/growableArray.hpp line 411: > 409: if (i >= this->_capacity) grow(i); > 410: for (int j = this->_len; j <= i; j++) > 411: new (&this->_data[j]) E(args...); Use global placement new, e.g. `::new`. Also below, in `at_put_grow` ------------- PR Review: https://git.openjdk.org/jdk/pull/16409#pullrequestreview-1706452432 PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1377678741 From kbarrett at openjdk.org Tue Oct 31 14:40:40 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 31 Oct 2023 14:40:40 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: <4-q_GVxiCUffRIV-UHgLQ05BVfWlC1VlhRvu32W5Msw=.377165bb-2c78-4c1a-abff-a9effed8e826@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <4-q_GVxiCUffRIV-UHgLQ05BVfWlC1VlhRvu32W5Msw=.377165bb-2c78-4c1a-abff-a9effed8e826@github.com> Message-ID: On Tue, 31 Oct 2023 12:21:00 GMT, Johan Sj?len wrote: >> Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: >> >> - Use variadic templates for in-place construction >> - Two tests > > src/hotspot/share/memory/allocation.hpp line 500: > >> 498: return ptr; >> 499: } >> 500: > > This should probably be a separate RFE. I think this change should not be made. Callers should be using global placement new. That's what we do everywhere else this comes up. (It has always seemed like a bug to me that the operation "global placement new" is syntactically an allocation function, and so subject to this kind of name lookup collision, when it doesn't allocate.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1377677915 From kbarrett at openjdk.org Tue Oct 31 14:40:42 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 31 Oct 2023 14:40:42 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v7] In-Reply-To: <2lKJmd3IjknwUw1KHpU1Wk24TXaGELOIlQe1LrRJK_k=.1621c791-6495-457a-b4cb-60c716ef5484@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <2lKJmd3IjknwUw1KHpU1Wk24TXaGELOIlQe1LrRJK_k=.1621c791-6495-457a-b4cb-60c716ef5484@github.com> Message-ID: On Tue, 31 Oct 2023 14:26:40 GMT, Kim Barrett wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Initialize member > > src/hotspot/share/utilities/growableArray.hpp line 411: > >> 409: if (i >= this->_capacity) grow(i); >> 410: for (int j = this->_len; j <= i; j++) >> 411: new (&this->_data[j]) E(args...); > > Use global placement new, e.g. `::new`. Also below, in `at_put_grow` Style: Missing braces around the for-loop body. Also below in `at_put_grow`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1377679955 From matsaave at openjdk.org Tue Oct 31 15:02:34 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 31 Oct 2023 15:02:34 GMT Subject: RFR: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp [v2] In-Reply-To: <3VWRgiKm23AkPtbApPyYRYt_WV9XHc_Ah_F1gsIuDiA=.483fed9b-34e8-4e3a-b9af-87af567f7a87@github.com> References: <3VWRgiKm23AkPtbApPyYRYt_WV9XHc_Ah_F1gsIuDiA=.483fed9b-34e8-4e3a-b9af-87af567f7a87@github.com> Message-ID: On Mon, 30 Oct 2023 22:46:48 GMT, David Holmes wrote: > Not sure why it needed to be lifted out of Unsafe. The issue description should be updated now. Sorry, I meant to explain this change in response to your comments. I'll explain here: The call stack that results in the issue shown in `get_volatile()` starts near where the new assert is placed. When discussing with @coleenp, we decided that placing the assert at the source of the nullptr would be a better indicate the problem should it arise in the code. The description has been updated with more detail. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16405#issuecomment-1787393136 From matsaave at openjdk.org Tue Oct 31 15:20:32 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 31 Oct 2023 15:20:32 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v7] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - RISCV Port - Merge branch 'master' into method_entry_8301997 - Merge branch 'master' into method_entry_8301997 - Removed flag arg from prepare_invoke on aarch - Fixed bytecode tracer - Coleen and Fei comments - Merge branch 'master' into method_entry_8301997 - Added asserts for getters and fixed printing - Removed dead code in interpreters - Removed unused structures, improved set_method_handle and appendix_if_resolved - ... and 2 more: https://git.openjdk.org/jdk/compare/3a7525d5...5660950d ------------- Changes: https://git.openjdk.org/jdk/pull/15455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=06 Stats: 3313 lines in 69 files changed: 1082 ins; 1737 del; 494 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From matsaave at openjdk.org Tue Oct 31 15:20:33 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 31 Oct 2023 15:20:33 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v6] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 09:38:45 GMT, Gui Cao wrote: > Hi, @RealFYang and I have finished the RISC-V part, tier1-3 and hotspot:tier4 tested on hifive unmatched board. Please help us to add the RISC-V part, thanks a lot! [15455-riscv-port.diff.txt](https://github.com/openjdk/jdk/files/13214653/15455-riscv-port.diff.txt) Thank you for the help! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1787425246 From mdoerr at openjdk.org Tue Oct 31 16:42:49 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 31 Oct 2023 16:42:49 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v6] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 20:27:02 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Merge branch 'master' into method_entry_8301997 > - Removed flag arg from prepare_invoke on aarch > - Fixed bytecode tracer > - Coleen and Fei comments > - Merge branch 'master' into method_entry_8301997 > - Added asserts for getters and fixed printing > - Removed dead code in interpreters > - Removed unused structures, improved set_method_handle and appendix_if_resolved > - Removed some comments and relocated code > - 8301997: Move method resolution information out of the cpCache I'll work on the PPC64 implementation. I have some minor suggestions regarding the aarch64 code after comparing it to x86. src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2374: > 2372: > 2373: // setup registers > 2374: const Register index = r4; Hardcoding is not very nice. Maybe reuse one of the other registers? src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2407: > 2405: // This must be done before we get the receiver, > 2406: // since the parameter_size includes it. > 2407: __ push(r19); I guess this should be `method`? See x86 version. src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3293: > 3291: void TemplateTable::prepare_invoke(Register recv) { > 3292: > 3293: const Register cache = r2; Passing `cache` and `flags` is better. See x86 version. ------------- PR Review: https://git.openjdk.org/jdk/pull/15455#pullrequestreview-1706459762 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1377707974 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1377682342 PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1377746666 From coleenp at openjdk.org Tue Oct 31 16:54:34 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 31 Oct 2023 16:54:34 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 12:27:53 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Adress comments > > Fix indentation > Improve tests > Improve comment > Remove redundant null check > Improve naming > Pop when >, not >= max len This looks good. Thank you for fixing this problem. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16398#pullrequestreview-1706807383 From duke at openjdk.org Tue Oct 31 17:27:57 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Tue, 31 Oct 2023 17:27:57 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v6] In-Reply-To: References: Message-ID: > MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. > > We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. > > As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. Thomas Obermeier has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8306561' of https://github.com/TOatGithub/jdk into JDK-8306561 - 8306561: test range instead of endpoints before casting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16381/files - new: https://git.openjdk.org/jdk/pull/16381/files/c831830d..60d46df2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16381&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16381&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16381.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16381/head:pull/16381 PR: https://git.openjdk.org/jdk/pull/16381 From duke at openjdk.org Tue Oct 31 17:28:01 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Tue, 31 Oct 2023 17:28:01 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v5] In-Reply-To: References: Message-ID: On Sat, 28 Oct 2023 06:23:35 GMT, Thomas Stuefe wrote: >> src/hotspot/share/nmt/mallocTracker.cpp line 215: >> >>> 213: for (; here >= end; here -= smallest_possible_alignment) { >>> 214: // JDK-8306561: cast to a MallocHeader needs to guarantee it can reside in readable memory >>> 215: if (!os::is_readable_pointer(here) || !os::is_readable_pointer(here + sizeof(MallocHeader) - 1)) { >> >> Would os::is_readable_range be the better choice here? > > That would work too. I agree; although it effectively does the same here, this matches the idea to check the whole region before casting to it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16381#discussion_r1377937131 From coleenp at openjdk.org Tue Oct 31 17:49:32 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 31 Oct 2023 17:49:32 GMT Subject: RFR: 8318982: improve Exceptions::special_exception In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 14:03:45 GMT, Doug Simon wrote: > This PR consolidates the 2 almost identical versions of `Exceptions::special_exception` into a single method. > If a special exception is thrown and `-Xlog:exceptions` is enabled, a log message is emitted and it indicates the special handling. > > Here's an example in the output from running `compiler/linkage/LinkageErrors.java` with `-Xlog:exceptions -Xcomp`: > > [0.194s][info][exceptions] Exception (java.util.Set, java.lang.String, java.util.Set, boolean)' (java.lang.module.ModuleDescriptor$1 and java.lang.module.ModuleDescriptor$Exports are in module java.base of loader 'bootstrap')> (0x0000000000000000) > thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 591] > for thread 0x000000011e18c600 > thread cannot call Java, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} > > > The motivation for this change was work on [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) where it's useful to know when exceptions are thrown on a CompilerThread. Looks good with a couple of minor comments/questions. src/hotspot/share/utilities/exceptions.cpp line 92: > 90: } else if (h_name == nullptr) { > 91: // at least an informative message. > 92: vm_exit_during_initialization("Exception", message); Should there be a space after "Exception"? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16401#pullrequestreview-1706916133 PR Review Comment: https://git.openjdk.org/jdk/pull/16401#discussion_r1377962527 From coleenp at openjdk.org Tue Oct 31 17:49:34 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 31 Oct 2023 17:49:34 GMT Subject: RFR: 8318982: improve Exceptions::special_exception In-Reply-To: <8fYpTiffRL76fPFWSk34cd2n15q7083XyGMiFonn_Bc=.879fc7d1-b2de-4c1c-8400-3cc91140a44b@github.com> References: <8fYpTiffRL76fPFWSk34cd2n15q7083XyGMiFonn_Bc=.879fc7d1-b2de-4c1c-8400-3cc91140a44b@github.com> Message-ID: <6NyLOEHFudi76SkNCdDCdtvLPltKcgBNRSHsntmdeIs=.185f7635-6f65-467e-8425-211a01439342@github.com> On Fri, 27 Oct 2023 14:19:18 GMT, Doug Simon wrote: >> This PR consolidates the 2 almost identical versions of `Exceptions::special_exception` into a single method. >> If a special exception is thrown and `-Xlog:exceptions` is enabled, a log message is emitted and it indicates the special handling. >> >> Here's an example in the output from running `compiler/linkage/LinkageErrors.java` with `-Xlog:exceptions -Xcomp`: >> >> [0.194s][info][exceptions] Exception (java.util.Set, java.lang.String, java.util.Set, boolean)' (java.lang.module.ModuleDescriptor$1 and java.lang.module.ModuleDescriptor$Exports are in module java.base of loader 'bootstrap')> (0x0000000000000000) >> thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 591] >> for thread 0x000000011e18c600 >> thread cannot call Java, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} >> >> >> The motivation for this change was work on [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) where it's useful to know when exceptions are thrown on a CompilerThread. > > src/hotspot/share/utilities/exceptions.cpp line 111: > >> 109: #endif // ASSERT >> 110: >> 111: if (!thread->can_call_java()) { > > If this method was called from `Exceptions::_throw`, a log message will have already been emitted. I think the duplication is acceptable for these special exceptions. Yes, seems fine. This code might need a local ResourceMark. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16401#discussion_r1377967082 From dnsimon at openjdk.org Tue Oct 31 18:06:33 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 31 Oct 2023 18:06:33 GMT Subject: RFR: 8318982: improve Exceptions::special_exception In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 17:43:30 GMT, Coleen Phillimore wrote: >> This PR consolidates the 2 almost identical versions of `Exceptions::special_exception` into a single method. >> If a special exception is thrown and `-Xlog:exceptions` is enabled, a log message is emitted and it indicates the special handling. >> >> Here's an example in the output from running `compiler/linkage/LinkageErrors.java` with `-Xlog:exceptions -Xcomp`: >> >> [0.194s][info][exceptions] Exception (java.util.Set, java.lang.String, java.util.Set, boolean)' (java.lang.module.ModuleDescriptor$1 and java.lang.module.ModuleDescriptor$Exports are in module java.base of loader 'bootstrap')> (0x0000000000000000) >> thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 591] >> for thread 0x000000011e18c600 >> thread cannot call Java, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} >> >> >> The motivation for this change was work on [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) where it's useful to know when exceptions are thrown on a CompilerThread. > > src/hotspot/share/utilities/exceptions.cpp line 92: > >> 90: } else if (h_name == nullptr) { >> 91: // at least an informative message. >> 92: vm_exit_during_initialization("Exception", message); > > Should there be a space after "Exception"? No, based on other usages: https://github.com/search?q=repo%3Aopenjdk%2Fjdk%20vm_exit_during_initialization&type=code ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16401#discussion_r1377991195 From tschatzl at openjdk.org Tue Oct 31 18:08:00 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 31 Oct 2023 18:08:00 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v4] In-Reply-To: References: Message-ID: <8BH2UtnHn-DYz3c80Su4v9BF_v0w-N4fHkASCXP_E2c=.70c7ff8f-32e2-4970-87e3-fe22f7b08e6b@github.com> > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Improve TestPinnedOldObjectsEvacuation test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/1b1d8ba9..78cb9df0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=02-03 Stats: 206 lines in 2 files changed: 190 ins; 7 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From kbarrett at openjdk.org Tue Oct 31 18:16:37 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 31 Oct 2023 18:16:37 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 12:27:53 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Adress comments > > Fix indentation > Improve tests > Improve comment > Remove redundant null check > Improve naming > Pop when >, not >= max len Changes requested by kbarrett (Reviewer). src/hotspot/share/oops/symbolHandle.hpp line 121: > 119: > 120: // If the queue is now full, implement a one-in, one-out policy. > 121: if (Atomic::add(&_cleanup_delay_len, 1, memory_order_relaxed) > _cleanup_delay_max_entries) { Why is incrementing relaxed? Now I have to think hard about whether there might be any ordering problems resulting from that. src/hotspot/share/oops/symbolHandle.hpp line 122: > 120: // If the queue is now full, implement a one-in, one-out policy. > 121: if (Atomic::add(&_cleanup_delay_len, 1, memory_order_relaxed) > _cleanup_delay_max_entries) { > 122: TempSymbolDelayQueueNode* result = _cleanup_delay.pop(); NonblockingQueue's push and pop operations are subject to ABA problems, and require the client to address that in some fashion. There's nothing here to do that. I think one possibility would be to wrap the push/pop calls in a GlobalCounter::CriticalSection and do a GlobalCounter::write_synchronize before deleting a node. src/hotspot/share/oops/symbolHandle.hpp line 125: > 123: if (result != nullptr) { > 124: delete result; > 125: Atomic::dec(&_cleanup_delay_len); Because of a limitation on NonblockingQueue (from the class description: "A queue may appear empty even though elements have been added and not removed."), it is theoretically possible for the max-entries value to be exceeded. (List is empty, thread1 starts a push but is paused, other threads push lots of entries.) But that will eventually be cleaned up by completion of the initial push and then later draining the list. So I don't think this is a problem in practice, but wanted to note that I'd looked at the question. src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 51: > 49: assert(_tail == nullptr, "precondition"); > 50: } > 51: #endif Why is this being removed? Without some good explanation, I'm disapproving this change. ------------- PR Review: https://git.openjdk.org/jdk/pull/16398#pullrequestreview-1706902214 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1377999358 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1377998043 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1377987280 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1377955872 From coleenp at openjdk.org Tue Oct 31 18:55:35 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 31 Oct 2023 18:55:35 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 18:11:09 GMT, Kim Barrett wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Adress comments >> >> Fix indentation >> Improve tests >> Improve comment >> Remove redundant null check >> Improve naming >> Pop when >, not >= max len > > src/hotspot/share/oops/symbolHandle.hpp line 122: > >> 120: // If the queue is now full, implement a one-in, one-out policy. >> 121: if (Atomic::add(&_cleanup_delay_len, 1, memory_order_relaxed) > _cleanup_delay_max_entries) { >> 122: TempSymbolDelayQueueNode* result = _cleanup_delay.pop(); > > NonblockingQueue's push and pop operations are subject to ABA problems, and > require the client to address that in some fashion. There's nothing here to do > that. I think one possibility would be to wrap the push/pop calls in a > GlobalCounter::CriticalSection and do a GlobalCounter::write_synchronize > before deleting a node. If you have to add more code to wrap NonblockingQueue, please implement it in a .cpp file. I thought NBQ was sufficient for this. Maybe we want some other data structure for this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1378042617 From tschatzl at openjdk.org Tue Oct 31 18:57:32 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 31 Oct 2023 18:57:32 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v4] In-Reply-To: <8BH2UtnHn-DYz3c80Su4v9BF_v0w-N4fHkASCXP_E2c=.70c7ff8f-32e2-4970-87e3-fe22f7b08e6b@github.com> References: <8BH2UtnHn-DYz3c80Su4v9BF_v0w-N4fHkASCXP_E2c=.70c7ff8f-32e2-4970-87e3-fe22f7b08e6b@github.com> Message-ID: On Tue, 31 Oct 2023 18:08:00 GMT, Thomas Schatzl wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Improve TestPinnedOldObjectsEvacuation test Had a discussion with @albertnetymk and we came to the following agreement about naming: "allocation failure" - allocation failed in the to-space due to memory exhaustion "pinned" - the region/object has been pinned "evacuation failure" - either pinned or allocation failure I will apply this new naming asap. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16342#issuecomment-1787818668 From manc at openjdk.org Tue Oct 31 19:01:40 2023 From: manc at openjdk.org (Man Cao) Date: Tue, 31 Oct 2023 19:01:40 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v35] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 04:23:13 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Replace NULL with nullptr LGTM still, thanks! ------------- Marked as reviewed by manc (Committer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1707049281 From tschatzl at openjdk.org Tue Oct 31 19:14:13 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 31 Oct 2023 19:14:13 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v5] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Fix compilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/78cb9df0..e5dfbb73 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=03-04 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From dnsimon at openjdk.org Tue Oct 31 19:31:56 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 31 Oct 2023 19:31:56 GMT Subject: RFR: 8318982: improve Exceptions::special_exception [v2] In-Reply-To: References: Message-ID: <2RV7IhWydVPMPR9hoZO3TedGNLbrIBS3Zx_dx5OEzig=.825296ee-44eb-4e3f-a2a3-7c9da42773d4@github.com> > This PR consolidates the 2 almost identical versions of `Exceptions::special_exception` into a single method. > If a special exception is thrown and `-Xlog:exceptions` is enabled, a log message is emitted and it indicates the special handling. > > Here's an example in the output from running `compiler/linkage/LinkageErrors.java` with `-Xlog:exceptions -Xcomp`: > > [0.194s][info][exceptions] Exception (java.util.Set, java.lang.String, java.util.Set, boolean)' (java.lang.module.ModuleDescriptor$1 and java.lang.module.ModuleDescriptor$Exports are in module java.base of loader 'bootstrap')> (0x0000000000000000) > thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 591] > for thread 0x000000011e18c600 > thread cannot call Java, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} > > > The motivation for this change was work on [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) where it's useful to know when exceptions are thrown on a CompilerThread. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: add missing ResourceMark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16401/files - new: https://git.openjdk.org/jdk/pull/16401/files/98da6cab..f74fa5ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16401&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16401&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16401.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16401/head:pull/16401 PR: https://git.openjdk.org/jdk/pull/16401 From dnsimon at openjdk.org Tue Oct 31 19:31:57 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 31 Oct 2023 19:31:57 GMT Subject: RFR: 8318982: improve Exceptions::special_exception [v2] In-Reply-To: <6NyLOEHFudi76SkNCdDCdtvLPltKcgBNRSHsntmdeIs=.185f7635-6f65-467e-8425-211a01439342@github.com> References: <8fYpTiffRL76fPFWSk34cd2n15q7083XyGMiFonn_Bc=.879fc7d1-b2de-4c1c-8400-3cc91140a44b@github.com> <6NyLOEHFudi76SkNCdDCdtvLPltKcgBNRSHsntmdeIs=.185f7635-6f65-467e-8425-211a01439342@github.com> Message-ID: <_I0sKx5qzcPboC1oHue2T-pv7pN4eka7PX2-Tw0ahg4=.0b27bd88-70af-4e58-99fa-d587676e2152@github.com> On Tue, 31 Oct 2023 17:46:16 GMT, Coleen Phillimore wrote: >> src/hotspot/share/utilities/exceptions.cpp line 111: >> >>> 109: #endif // ASSERT >>> 110: >>> 111: if (!thread->can_call_java()) { >> >> If this method was called from `Exceptions::_throw`, a log message will have already been emitted. I think the duplication is acceptable for these special exceptions. > > Yes, seems fine. > > This code might need a local ResourceMark. Good point: https://github.com/openjdk/jdk/pull/16401/commits/f74fa5ee688558db5917dd2951ced3786410b7fe ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16401#discussion_r1378074316 From aturbanov at openjdk.org Tue Oct 31 19:39:36 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Tue, 31 Oct 2023 19:39:36 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: References: Message-ID: On Sun, 29 Oct 2023 20:39:47 GMT, Doug Simon wrote: >> This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > allow JavaCalls in HotSpotConstantPool.callSystemExit src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 844: > 842: JavaType fieldHolder = lookupType(holderIndex, opcode); > 843: > 844: if (fieldHolder instanceof HotSpotResolvedObjectTypeImpl) { Suggestion: if (fieldHolder instanceof HotSpotResolvedObjectTypeImpl) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16383#discussion_r1378086631 From coleenp at openjdk.org Tue Oct 31 19:59:32 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 31 Oct 2023 19:59:32 GMT Subject: RFR: 8318982: improve Exceptions::special_exception [v2] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 18:04:20 GMT, Doug Simon wrote: >> src/hotspot/share/utilities/exceptions.cpp line 92: >> >>> 90: } else if (h_name == nullptr) { >>> 91: // at least an informative message. >>> 92: vm_exit_during_initialization("Exception", message); >> >> Should there be a space after "Exception"? > > No, based on other usages: https://github.com/search?q=repo%3Aopenjdk%2Fjdk%20vm_exit_during_initialization&type=code ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16401#discussion_r1378104428 From kbarrett at openjdk.org Tue Oct 31 20:23:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 31 Oct 2023 20:23:36 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 12:53:45 GMT, Julian Waters wrote: >> src/hotspot/os/windows/os_windows.cpp line 515: >> >>> 513: // The handler passed to _beginthreadex(). >>> 514: // Called with the associated Thread* as the argument. >>> 515: static unsigned __stdcall thread_native_entry(void*); >> >> This forward declaration is being added for a function that is defined a few lines later, with no intervening >> references. That seems pointless. > > I understand, but what about the useful (at least to me) comment? Should I move it to the definition of the method? Sure, move the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1378124137 From kbarrett at openjdk.org Tue Oct 31 20:23:38 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 31 Oct 2023 20:23:38 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 12:58:45 GMT, Julian Waters wrote: >> I've been perusing the exit bug info, and ugh! But okay. The `return res` might *not* make the compiler happy >> anymore, and might instead be cause for complaint by the compiler, now that `exit_process_or_thread` is marked >> noreturn. I guess use whichever form is needed to keep the compiler from complaining... > > I'm not too sure what to make of this, since I don't know what the exit bug is about (Also, the return res doesn't cause an issue on MSVC under any circumstance, and would only do so on gcc if thread_native_entry was marked noreturn, which it isn't). I simply changed the return value to keep the original semantics of the code unchanged, I guess I should take this to mean I should keep my current changes as is? Yes, keep as is in this area. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16303#discussion_r1378125551 From jsjolen at openjdk.org Tue Oct 31 22:30:13 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Oct 2023 22:30:13 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <4-q_GVxiCUffRIV-UHgLQ05BVfWlC1VlhRvu32W5Msw=.377165bb-2c78-4c1a-abff-a9effed8e826@github.com> Message-ID: On Tue, 31 Oct 2023 14:26:05 GMT, Kim Barrett wrote: >> src/hotspot/share/memory/allocation.hpp line 500: >> >>> 498: return ptr; >>> 499: } >>> 500: >> >> This should probably be a separate RFE. > > I think this change should not be made. Callers should be using global placement new. That's what we > do everywhere else this comes up. (It has always seemed like a bug to me that the operation "global > placement new" is syntactically an allocation function, and so subject to this kind of name lookup > collision, when it doesn't allocate.) Hi, 100% agree. To be frank, I simply forgot that we had access to the global placement new through prepending `::`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378220733 From matsaave at openjdk.org Tue Oct 31 22:51:18 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 31 Oct 2023 22:51:18 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v6] In-Reply-To: References: Message-ID: <_HeaLuC2VvNySQYp4nbSkXjHurHTeJ3MdgeuvbuGRT0=.1f44d2d3-fec2-4816-9a2f-716d94c8baaf@github.com> On Tue, 31 Oct 2023 14:44:43 GMT, Martin Doerr wrote: >> Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Merge branch 'master' into method_entry_8301997 >> - Removed flag arg from prepare_invoke on aarch >> - Fixed bytecode tracer >> - Coleen and Fei comments >> - Merge branch 'master' into method_entry_8301997 >> - Added asserts for getters and fixed printing >> - Removed dead code in interpreters >> - Removed unused structures, improved set_method_handle and appendix_if_resolved >> - Removed some comments and relocated code >> - 8301997: Move method resolution information out of the cpCache > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2374: > >> 2372: >> 2373: // setup registers >> 2374: const Register index = r4; > > Hardcoding is not very nice. Maybe reuse one of the other registers? This exists in x86 as well in each of the `load_resolved_method_entry_...()` methods. Some of these only have three arguments which cannot be reused so there is the option to include `index` as an argument, but this introduces an inconsistency among these similar methods. Should all of these methods take `index` which can be a reused register? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1378232141 From matsaave at openjdk.org Tue Oct 31 23:04:19 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 31 Oct 2023 23:04:19 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v6] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 15:08:35 GMT, Martin Doerr wrote: >> Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Merge branch 'master' into method_entry_8301997 >> - Removed flag arg from prepare_invoke on aarch >> - Fixed bytecode tracer >> - Coleen and Fei comments >> - Merge branch 'master' into method_entry_8301997 >> - Added asserts for getters and fixed printing >> - Removed dead code in interpreters >> - Removed unused structures, improved set_method_handle and appendix_if_resolved >> - Removed some comments and relocated code >> - 8301997: Move method resolution information out of the cpCache > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3293: > >> 3291: void TemplateTable::prepare_invoke(Register recv) { >> 3292: >> 3293: const Register cache = r2; > > Passing `cache` and `flags` is better. See x86 version. As @offamitkumar pointed out, `flags` is unused in aarch64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1378242870